The secret missing Unicode letters?

by Michael S. Kaplan, published on 2011/08/23 07:01 -04:00, original URI:

Longtime reader ReallyEvilCanine asked:


I'm splashing around in the character and glyph separation cesspool I have a question pertaining to initial, medial, final, and isolated forms.

While different code points were encoded as presentation forms for Arabic and Hebrew back in 3.0 for backward compatibility and simplicity (respectively), vowels in a language such as Devenagari, were not supposed to be encoded. Instead the character choice and display was to be left to the renderer.

However it appears that each form of any particular Devenagari vowel now has an assigned codepoint. Was a decision made between Unicode 3 and 5 to encode all possible presentation forms? Have I  stumbled across another "exception to the rule" (which now appears to be the rule rather than the exception)? Is there any language which I can use to demonstrate that a single codepoint may have multiple glyphs which the renderer has to pull from a font?


I'll be honest; I'm at a loss here on what on earth he could be referring to.

Unicode didn't change its encoding mode for the various Indic scripts in general, or Devanagari in particular....

Anyone know what he's talking about?


Steven R. Loomis on 23 Aug 2011 9:11 AM:

Unicode doesn't encode presentation forms.  If they are looking at the glyphs in a particular font, they will see the presentation forms.  But I certainly dont' see all possible presentation forms in - there are a few pieces of marks that can be combined, but not presentation forms, which would need to include half forms, ligatures, and much more.

So, I'm at a loss here too.

Josh on 24 Aug 2011 8:09 AM:

I was thinking he's possibly looking at a font which encodes alternate glyph forms in PUA (there are a few of those laying around!), mistaking PUA codes for actual assigned Unicodes...maybe?

go to newer or older post, or back to index or month or day