by Michael S. Kaplan, published on 2010/07/16 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2010/07/16/10038421.aspx
One of the interesting things about being in India is all of the interesting conversations I end up having in hotels with other visitors to India (typically during the nearly ubiuitous "complementary breakfast buffet" that was at every hotel I have ever stayed at here).
More than a few times these people knew who I was, or at least knew of my blog. Which is kind of cool.
One of those conversation was based on the points I raised in Why don't all the half forms sort right? and Why my IUC31 talks were presented on Vista (even though running on a MacBook Pro), about the different forms of Indic characters. This screenshot (from the latter blog) summaries the point best:
There are two interesing points I really never talked about, and they are both related to the same underlying issue.
But first I'll quote from the Unicode FAQ entry about the use of these Unicode control characters, from the Unicode FAQ on Indic Scripts and Languages and its question #17 (I cannot find on Unicode charts the "half forms" of Devanagari letters (or any other Indic script). These characters are needed to form words such as "patni".):
These three characters will be normally displayed using the single glyph tna ligature "". But it is also possible that they are displayed using a half ta glyph followed by a full na glyph "", or even with a full ta glyph combined with a virama glyph and followed by a full na glyph "".
Which form will be actually displayed is the decision of an underlying software module called a "display engine", which bases this decision on the availability of glyphs in the font.
Now the two interesting points I never discussed:
Now the usage of the three forms is something that everyone who understands how to read the languages in question understands and knows, even though it really isn't discussed or described for everyone else. The easiest way to summarize the preferences is as follows:
The second point about why do things seem to have changed on Microsoft platforms is that the second and third forms still work as described but the first form no longer is -- now instead of the second form or is shown in both cases.
Perhaps the first point explains the second point indirectly; Mangalis clearly not the preferred font to use for रामायण (Ramayana) or the उपनिषद् (Upanishads) or any other classic Sanskrit work. Thus this classical form is perhaps not entirely appropriate for a Windows UI-based font like Mangal. It appears that the glyphs were simply removed from the font?
It is equally possible that the glyphs are still there but are now in an optional OpenType feature related to this alternate, classical usage. Although everyone I talked to could read all three, the middle one that Mangal now seems to prefer is the one people would most commonly use themselves, so it makes sense that it is the one people expect to see and the one that Mangal now seems to prefer, too.
One thing surprises me about this change -- obviously the size of the ligature versus the half form/full form means that document pagination can rather hugely differ as one updates the font choice. Though perhaps the screaming of the form being unexpected was simply larger!
It is that last sentence I quoted in the Unicode FAQ ("Which form will be actually displayed is the decision of an underlying software module called a "display engine", which bases this decision on the availability of glyphs in the font.") explains the good amount of wiggle room one has in what products may do: the font gets to decide what is preferred here by simply not defining (or perhaps differently defining) the forms one does not want to see.
Though it does make the FAQ entry a little unsastisfying for the non-speaker trying to implement fonts since it seems like a fairly important piece of trivia to have handy when one is trying to build a font.
Now Sanskrit's position here is interesting, given that it has its own locale in Windows:
but changing text in Word to mark it with this language does not magically change the form either. But then we know that Word is not yet respecting optional OpenType features so I suppose that is to be expected.
Everyone I talk to about this locale wonder whether there is a locale for Latin too. I mean just what is the meaning for the Sanskrit date format on a computer? :-)
For now the easiest way to do this is to use a font that only has the conjunct form for such classical usage as रामायण and call it a day. Recognition of the fact that modern usage really expects something different from the Unicode FAQ entry just suggests a need to update the FAQ....
referenced by
2010/10/08 Off by one what, exactly?
2010/09/27 Megasupport of multiple ways to display text is the new "megafont"
2010/09/16 Providing more information is the best way to assure correct information is received
2010/07/21 It is easy (and obnoxious) to claim "size doesn't matter" if one has the size everyone wants