by Michael S. Kaplan, published on 2008/04/18 03:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/04/18/8403631.aspx
Content of Michael Kaplan's personal blog not approved by Microsoft (see disclaimer)!
Regular readers should keep in mind that all I said in The End? still applies; the allusion to the X-Files continues for people who understand such references....
This is not a post discussing some kind of geopolitical issue involving Myanmar (Burma) and the Philipines.
You see the other day, regular reader Andrew West, in a comment to my Who forgot the culture?, asked:
Completely off-topic, but I notice that you embed the sponsoring character (U+1831, MONGOLIAN LETTER SHA) in an html font tag specifying "Mongolian Baiti" as the font face. It drives me crazy that IE7 (like IE6 before it) lists Mongolian as a "language script" that you can configure the font for, but it will not populate the font lists with any fonts regardless of how many fonts you have on the system that support Mongolian (including Mongolian Baiti), so it is impossible to actually configure what Mongolian font to use! The good news is that it does display Mongolian using Mongolian Baiti without explicitly specifying the font in the html, but the bad news is that I can't get it to use a Mongolian font other Mongolian Baiti without messing with the html. I just wish someone would fix IE ... or is this one of those Kafkaesque examples like Uyghur where Microsoft can't fix something, however broken, in case it breaks user expectations?
I suspected that I knew what was going on here, but it was really worthy of its own blog and I wasn't sure how quickly I'd get to it so I recommended he put a note in the Suggestion Box just in case it wasn't going to be quick....
Which he did:
I know I'm not going to like the answer, but can you explain how the font configuration dialog in Internet Explorer works, in particular the behaviour for Mongolian (font list is never populated) and Myanmar (font list is populated with fonts that cover Tagalog, but none that cover Myanmar)?
Then I had to cancel the blog that was happening for this slot and I ended up deciding to do it right away instead....
Andrew is right that he probably won't like the answer, but it is something that is fixable, an even technically work-aroundable if a font author is willing to do something that hr or she would ordinarily consider to be very stupid. :-)
Perhaps I should explain.
We'll start in the Tools|Internet Options... Fonts... dialog:
(I guess I have no Tagalog fonts!)
and for good measure we'll include one that has some fonts in it:
Now in the end the information on actual selections is stored in the registry, under
which is clearly an Internet Explorer settings key with SCRIPT ID values 36 (Myanmar) and 39 (Mongolian):
But for the list of potential fonts, that is not IE at all; that is MLang.
Now I blogged this a bit over two years ago in Where are the IE plain text fonts?, and in that blog I mentioned:
Now the actual population of the two lists is happening via MLang, and as Paul points out you could think of the list on the left as being for proportional fonts and the list on the right as being for fixed pitch (monowidth) fonts.
MLang goes through a two step process that I will get into in another post, coming soon. :-)
And since I never did get back to it, I guess Andrew has proof that things often get lost if they aren't put in one of those lists like the Suggestion Box! I am actually happy to have the proof because otherwise I look kind of petty or something with my request....
Anyway, I'll explain it now -- it all works via a Trust; But Verify! mechanism.
The Trust part is where it trusts the font to describe its own Unicode ranges in its own internal FONTSIGNATURE.fsUsb bits, the Unicode Subset bits. That is step one.
The Verify part is where it does a spot check on a specific Unicode code point in th script range, to make sure that the FONTSIGNATURE is not lying. Because FONTSIGNATUREs, like men, lie. Like that bit from the movie Up the Creek an it's fictional typographical version Up the Foundry between Tim Matheson (as the font) and Jennifer Runyon (as the user):
Font: I will tell you about my coverage.
User: You wouldn't lie to me?
Font: Of course I'd lie to you, I'm a font. But I'm not lying now....
In fact, it really relies on that Verify step and perhaps even skips the Trust step a bit, sometimes?
And it spot checks the font CMAP to make sure a specific candidate character is in it.
I mentioned there was as problem here, didn't I?
Here is where the problem sits.
Deep in the heart of MLang, in its mlflink.cpp source file, it has:
And this is why Mongolian never shows up (since it has no explicit character to check for) and Myanmar shows up when your font has Tagalog (since that is the character it looks for).
Which is the essential workaround for Myanmar -- add that one specific Tagalog character to your Burmese font? Totally obnoxious, but until/unless someone fixes MLang....
Let's put all the values in a table so you can see them:
|Script||Script Id||Code point||Character||Character Name|
|Greek||5||U+03ac||ά||GREEK SMALL LETTER ALPHA WITH TONOS|
|Cyrillic||6||U+0401||Ё||CYRILLIC CAPITAL LETTER IO|
|Armenian||7||U+0531||Ա||ARMENIAN CAPITAL LETTER AYB|
|Hebrew||8||U+05d4||ה||HEBREW LETTER HE|
|Arabic||9||U+0627||ا||ARABIC LETTER ALEF|
|Devanagari||10||U+0905||अ||DEVANAGARI LETTER A|
|Bengali||11||U+0985||অ||BENGALI LETTER A|
|Gurmukhi||12||U+0a05||ਅ||GURMUKHI LETTER A|
|Gujarati||13||U+0a85||અ||GUJARATI LETTER A|
|Oriya||14||U+0b05||ଅ||ORIYA LETTER A|
|Tamil||15||U+0b85||அ||TAMIL LETTER A|
|Telugu||16||U+0c05||అ||TELUGU LETTER A|
|Kannada||17||U+0c85||ಅ||KANNADA LETTER A|
|Malayalam||18||U+0d05||അ||MALAYALAM LETTER A|
|Thai||19||U+0e01||ก||THAI CHARACTER KO KAI|
|Lao||20||U+0e81||ກ||LAO LETTER KO|
|Tibetan||21||U+0f40||ཀ||TIBETAN LETTER KA|
|Georgian||22||U+10d0||ა||GEORGIAN LETTER AN|
|Ethiopic||27||U+1300||ጀ||ETHIOPIC SYLLABLE JA|
|Canadian Syllabics||28||U+1401||ᐁ||CANADIAN SYLLBICS E|
|Cherokee||29||U+13a0||Ꭰ||CHEROKEE LETTER A|
|Yi||30||U+a000||ꀀ||YI SYLLABLE IT|
|Braille||31||U+2800||⠀||BRAILLE PATTERN BLANK|
|Runic||32||U+16a0||ᚠ||RUNIC LETTER FEHU FEOH FE F|
|Ogham||33||U+1680||OGHAM SPACE MARK|
|Sinhala||34||U+0d85||අ||SINHALA LETTER AYANNA|
|Syriac||35||U+0710||ܐ||SYRIAC LETTER ALAPH|
|Myanmar||36||U+1700||ᜀ||TAGALOG LETTER A|
|Khmer||37||U+1780||ក||KHMER LETTER KA|
|Thanna||38||U+0780||ހ||THAANA LETTER HAA|
You can probably see the other problem here -- all of the scripts that are missing; perhaps the fix needs to be a bit more than just the two broken ones, in the long run....
Speaking of which -- any NLS testers stirring about who'd like to enter a bug on this small bundle of MLang issues that will also affect IE8 on the next version of Windows if it isn't fixed? :-)
This blog brought to you by ᜀ (U+1700, aka TAGALOG LETTER A)
# Andrew West on 18 Apr 2008 5:17 AM:
Many thanks for the explanation -- it now all makes sense, though as you say it really would be nice to see everything thoroughly updated to cover all the scripts that have since been encoded (at the very least, please add Phags-pa).
Perhaps you will be able to explain the Myanmar font hack to Martin Hosken if you are at the WG2 meeting next week -- he has a wonderful Myanmar font that supports Unicode 5.1 (http://scripts.sil.org/Padauk), and I would love to be able to set it as my preferred Myanmar font rather than "Tagalog Doctrina 1593" (you can download all the Tagalog fonts you need from http://www.mts.net/~pmorrow/fonts.htm).
No hack for Mongolian though :-(
# Andrew West on 18 Apr 2008 5:39 AM:
And why U+1700 TAGALOG LETTER A ?
If you read <http://std.dkuug.dk/JTC1/SC2/wg2/docs/n1883R2-1998.pdf> page 4 "Encoding Position for Myanmar Script" you'll see that Myanmar was originally located at 1700..177F in the Amd.26 PDAM text, but was moved to its current location at the next stage in the ballot process. Looks like someone at Microsoft jumped the gun and coded U+1700 as the character to check before Myanmar had been finalised -- tut tut.
# Michael S. Kaplan on 18 Apr 2008 5:43 AM:
This might even give you a hint as to who actually did in fact jump the gun (the person's email alias is in a comment so I already knew, though I might have guessed had I known about this doc!).
# Andrew West on 18 Apr 2008 6:12 AM:
It doesn't, but there is only one person I can think of who might be responsible, and their name is mentioned in this post.
# Michael S. Kaplan on 18 Apr 2008 7:41 AM:
No, the responsible party is not mentioned in the post....
# John Cowan on 18 Apr 2008 1:17 PM:
Would a Mongolian font be detected if it contained a glyph for U+0000?
# Michael S. Kaplan on 18 Apr 2008 1:26 PM:
Of course the code might treat -0- as uninitialized. Don't all fonts have a glyph for 0x0000 in the CMAP?
# Andrew West on 18 Apr 2008 4:28 PM:
Yes, good idea. Fonts don't normally have a glyph for U+0000, so as a test I added a glyph mapped to U+0000 to one of my fonts that has partial coverage of Mongolian ... and I'm afraid it still didn't show up in the list, even after rebooting :-(
# Michael S. Kaplan on 18 Apr 2008 4:31 PM:
Well, at lest there is the workaround for Myanmar, one that might help foster relationships between the Philippines and Burma! :-)
# Andrew West on 18 Apr 2008 4:40 PM:
Of course the flip side is that I cannot see the sponsoring character of this post in IE, even though I have five Tagalog fonts installed (yet another frowny omitted).
# Okisan on 1 Sep 2008 1:50 AM:
Thanks for the post.
For those who wanna to read in Burmese translation of this post, please see http://myanmaritpros.com/profiles/blog/show?id=1445004%3ABlogPost%3A44910
But note, we use de fecto standard UTF-8 font (Zawgyi). :)
Myanmar IT Professionals
# Auto on 29 Sep 2009 5:55 PM:
If you install the right font, firefox 3.5.3 will be able to use it without do anything else. On the contrary, if you use IE8 you won't see nothing yet. I must confess that firefox is ahead in this respect to IE8.
Please Michael Kaplan, find a fix for this !!!
I red your book about Internationalization of VB6: great book !!!
2010/03/24 The fonts are there, but IE can't see 'em?
go to newer or older post, or back to index or month or day