The importance of Tagalog to Burmese, aka "Of course I'd lie to you, I'm a font!"

by Michael S. Kaplan, published on 2008/04/18 03:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/04/18/8403631.aspx


Content of Michael Kaplan's personal blog not approved by Microsoft (see disclaimer)!
Regular readers should keep in mind that all I said in The End? still applies; the allusion to the X-Files continues for people who understand such references....

This is not a post discussing some kind of geopolitical issue involving Myanmar (Burma) and the Philipines.

You see the other day, regular reader Andrew West, in a comment to my Who forgot the culture?, asked:

Completely off-topic, but I notice that you embed the sponsoring character (U+1831, MONGOLIAN LETTER SHA) in an html font tag specifying "Mongolian Baiti" as the font face. It drives me crazy that IE7 (like IE6 before it) lists Mongolian as a "language script" that you can configure the font for, but it will not populate the font lists with any fonts regardless of how many fonts you have on the system that support Mongolian (including Mongolian Baiti), so it is impossible to actually configure what Mongolian font to use! The good news is that it does display Mongolian using Mongolian Baiti without explicitly specifying the font in the html, but the bad news is that I can't get it to use a Mongolian font other Mongolian Baiti without messing with the html. I just wish someone would fix IE ... or is this one of those Kafkaesque examples like Uyghur where Microsoft can't fix something, however broken, in case it breaks user expectations?

I suspected that I knew what was going on here, but it was really worthy of its own blog and I wasn't sure how quickly I'd get to it so I recommended he put a note in the Suggestion Box just in case it wasn't going to be quick....

Which he did:

I know I'm not going to like the answer, but can you explain how the font configuration dialog in Internet Explorer works, in particular the behaviour for Mongolian (font list is never populated) and Myanmar (font list is populated with fonts that cover Tagalog, but none that cover Myanmar)?

Then I had to cancel the blog that was happening for this slot and I ended up deciding to do it right away instead....

Andrew is right that he probably won't like the answer, but it is something that is fixable, an even technically work-aroundable if a font author is willing to do something that hr or she would ordinarily consider to be very stupid. :-)

Perhaps I should explain.

We'll start in the Tools|Internet Options... Fonts... dialog:

(I guess I have no Tagalog fonts!)

and for good measure we'll include one that has some fonts in it:

Now in the end the information on actual selections is stored in the registry, under

    HKCU\Software\Microsoft\Internet Explorer\International\Scripts

which is clearly an Internet Explorer settings key with SCRIPT ID values 36 (Myanmar) and 39 (Mongolian):

But for the list of potential fonts, that is not IE at all; that is MLang.

Now I blogged this a bit over two years ago in Where are the IE plain text fonts?, and in that blog I mentioned:

Now the actual population of the two lists is happening via MLang, and as Paul points out you could think of the list on the left as being for proportional fonts and the list on the right as being for fixed pitch (monowidth) fonts.

MLang goes through a two step process that I will get into in another post, coming soon. :-)

And since I never did get back to it, I guess Andrew has proof that things often get lost if they aren't put in one of those lists like the Suggestion Box! I am actually happy to have the proof because otherwise I look kind of petty or something with my request....

Anyway, I'll explain it now -- it all works via a Trust; But Verify! mechanism.

The Trust part is where it trusts the font to describe its own Unicode ranges in its own internal FONTSIGNATURE.fsUsb bits, the Unicode Subset bits. That is step one.

The Verify part is where it does a spot check on a specific Unicode code point in th script range, to make sure that the FONTSIGNATURE is not lying. Because FONTSIGNATUREs, like men, lie. Like that bit from the movie Up the Creek an it's fictional typographical version Up the Foundry between Tim Matheson (as the font) and Jennifer Runyon (as the user):

Font: I will tell you about my coverage.
User: You wouldn't lie to me?
Font: Of course I'd lie to you, I'm a font. But I'm not lying now....

In fact, it really relies on that Verify step and perhaps even skips the Trust step a bit, sometimes?

And it spot checks the font CMAP to make sure a specific candidate character is in it.

I mentioned there was as problem here, didn't I?

Here is where the problem sits.

Deep in the heart of MLang, in its mlflink.cpp source file, it has:

And this is why Mongolian never shows up (since it has no explicit character to check for) and Myanmar shows up when your font has Tagalog (since that is the character it looks for).

Which is the essential workaround for Myanmar -- add that one specific Tagalog character to your Burmese font? Totally obnoxious, but until/unless someone fixes MLang....

Let's put all the values in a table so you can see them:

Script Script Id Code point Character Character Name
Greek 5 U+03ac ά GREEK SMALL LETTER ALPHA WITH TONOS
Cyrillic 6 U+0401 Ё CYRILLIC CAPITAL LETTER IO
Armenian 7 U+0531 Ա ARMENIAN CAPITAL LETTER AYB
Hebrew 8 U+05d4 ה HEBREW LETTER HE
Arabic 9 U+0627 ا ARABIC LETTER ALEF
Devanagari 10 U+0905 DEVANAGARI LETTER A
Bengali 11 U+0985 BENGALI LETTER A
Gurmukhi 12 U+0a05 GURMUKHI LETTER A
Gujarati 13 U+0a85 GUJARATI LETTER A
Oriya 14 U+0b05 ORIYA LETTER A
Tamil 15 U+0b85 TAMIL LETTER A
Telugu 16 U+0c05 TELUGU LETTER A
Kannada 17 U+0c85 KANNADA LETTER A
Malayalam 18 U+0d05 MALAYALAM LETTER A
Thai 19 U+0e01 THAI CHARACTER KO KAI
Lao 20 U+0e81 LAO LETTER KO
Tibetan 21 U+0f40 TIBETAN LETTER KA
Georgian 22 U+10d0 GEORGIAN LETTER AN
Ethiopic 27 U+1300 ETHIOPIC SYLLABLE JA
Canadian Syllabics 28 U+1401 CANADIAN SYLLBICS E
Cherokee 29 U+13a0 CHEROKEE LETTER A
Yi 30 U+a000 ꀀ YI SYLLABLE IT
Braille 31 U+2800 BRAILLE PATTERN BLANK
Runic 32 U+16a0 RUNIC LETTER FEHU FEOH FE F
Ogham 33 U+1680 OGHAM SPACE MARK
Sinhala 34 U+0d85 SINHALA LETTER AYANNA
Syriac 35 U+0710 ܐ SYRIAC LETTER ALAPH
Myanmar 36 U+1700 TAGALOG LETTER A
Khmer 37 U+1780 KHMER LETTER KA
Thanna 38 U+0780 ހ THAANA LETTER HAA
Mongolain 39 -0-  

You can probably see the other problem here -- all of the scripts that are missing; perhaps the fix needs to be a bit more than just the two broken ones, in the long run....

Speaking of which -- any NLS testers stirring about who'd like to enter a bug on this small bundle of MLang issues that will also affect IE8 on the next version of Windows if it isn't fixed? :-)

 

This blog brought to you by(U+1700, aka TAGALOG LETTER A)


# Andrew West on 18 Apr 2008 5:17 AM:

Many thanks for the explanation -- it now all makes sense, though as you say it really would be nice to see everything thoroughly updated to cover all the scripts that have since been encoded (at the very least, please add Phags-pa).

Perhaps you will be able to explain the Myanmar font hack to Martin Hosken if you are at the WG2 meeting next week -- he has a wonderful Myanmar font that supports Unicode 5.1 (http://scripts.sil.org/Padauk), and I would love to be able to set it as my preferred Myanmar font rather than "Tagalog Doctrina 1593" (you can download all the Tagalog fonts you need from http://www.mts.net/~pmorrow/fonts.htm).

No hack for Mongolian though :-(

# Andrew West on 18 Apr 2008 5:39 AM:

And why U+1700 TAGALOG LETTER A ?

If you read <http://std.dkuug.dk/JTC1/SC2/wg2/docs/n1883R2-1998.pdf> page 4 "Encoding Position for Myanmar Script" you'll see that Myanmar was originally located at 1700..177F in the Amd.26 PDAM text, but was moved to its current location at the next stage in the ballot process. Looks like someone at Microsoft jumped the gun and coded U+1700 as the character to check before Myanmar had been finalised -- tut tut.

# Michael S. Kaplan on 18 Apr 2008 5:43 AM:

This might even give you a hint as to who actually did in fact jump the gun (the person's email alias is in a comment so I already knew, though I might have guessed had I known about this doc!).

# Andrew West on 18 Apr 2008 6:12 AM:

It doesn't, but there is only one person I can think of who might be responsible, and their name is mentioned in this post.

# Michael S. Kaplan on 18 Apr 2008 7:41 AM:

:-)

No, the responsible party is not mentioned in the post....

# John Cowan on 18 Apr 2008 1:17 PM:

Would a Mongolian font be detected if it contained a glyph for U+0000?

# Michael S. Kaplan on 18 Apr 2008 1:26 PM:

Good question!

Of course the code might treat -0- as uninitialized. Don't all fonts have a glyph for 0x0000 in the CMAP?

# Andrew West on 18 Apr 2008 4:28 PM:

Yes, good idea. Fonts don't normally have a glyph for U+0000, so as a test I added a glyph mapped to U+0000 to one of my fonts that has partial coverage of Mongolian ... and I'm afraid it still didn't show up in the list, even after rebooting :-(

# Michael S. Kaplan on 18 Apr 2008 4:31 PM:

Well, at lest there is the workaround for Myanmar, one that might help foster relationships between the Philippines and Burma! :-)

# Andrew West on 18 Apr 2008 4:40 PM:

Of course the flip side is that I cannot see the sponsoring character of this post in IE, even though I have five Tagalog fonts installed (yet another frowny omitted).

# Okisan on 1 Sep 2008 1:50 AM:

Thanks for the post.

For those who wanna to read in Burmese translation of this post, please see http://myanmaritpros.com/profiles/blog/show?id=1445004%3ABlogPost%3A44910

But note, we use de fecto standard UTF-8 font (Zawgyi). :)

Best,

Okisan

Myanmar IT Professionals

# Auto on 29 Sep 2009 5:55 PM:

If you install the right font, firefox 3.5.3 will be able to use it without do anything else. On the contrary, if you use IE8 you won't see nothing yet. I must confess that firefox is ahead in this respect to IE8.

Please Michael Kaplan, find a fix for this !!!

I red your book about Internationalization of VB6: great book !!!


referenced by

2010/03/24 The fonts are there, but IE can't see 'em?

2010/01/20 It isn't always Internet Explorer's fault, dammit!

2008/12/22 Internet Explorer and Windows might not be optimized for this Blog

go to newer or older post, or back to index or month or day