The evolving Story of Locale Support, part 18: Two scripts that share ten digits can be trouble

by Michael S. Kaplan, published on 2012/02/15 07:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2012/02/15/10268235.aspx


Previous blogs from this series:

Today, I'll highlight one of the weaknesses of the way some of the work to extend locales shows our reach exceeding our grasp.

Now I have talked about digit substitution so much in the past that most regular readers are probably tired of hearing about it....

Frankly, I don't blame them!

So I'm not gonna be all about digit substitution this time.

Since that feature is locale based anyway, and the languages I am going to talk about here have none, it isn't relevant anyway....

But beyond that, Tai Le (languages using it include Dehong Dai) has an additional problem -- they use the Myanmar digits!

We worked around the fact of wondering how to make sure digits get seen in some cases is by adding the Myanmar Digits (U+1040 to U+1049) to both the Microsoft Tai Le font that shipped previously and the new Myanmar Text font:

Character Map, showing MYNMAR DIGIT NINE in the Microsoft Tai Le font, note the nine characters preceding it.     Character Map, showing MYNMAR DIGIT NINE in the Myanmar Text font; note the nine characters preceding it.

You'll notice that they look slightly different, reflecting two entirely different traditions, as mentioned in the old proposal N2372:

 

Two slightly different traditions for Myanmar digits, displayed graphically

In addition to these differences, there qare also slight size differences between glyphs in the two fonts.

Which were desgned by two different typographers

In two different styles.

To support two different scripts.

Plus it appears they didn't even completely capture the distinctions mentioned above -- I need to find out if that's a bug or not (perhaps the alternate glyphs are in font but only available using adavnaced OpenType features, which as I mentioned previously few technologies do).

Just ten little digits:

၀၁၂၃၄၅၆၇၈၉

၀၁၂၃၄၅၆၇၈၉

And of course Myanmar Text also has the Myanmar Shan Digits in it, which are not in the Microsoft Tai Le font update:

Character Map, showing MYNMAR SHAN DIGIT NINE in the Microsoft Tai Le font; note the nine characters preceding it.

Kind of funny how Unicode decided to capture those differences but not the others.

႐႑႒႓႔႕႖႗႘႙

Not funny "ha ha", if you know what I mean.

I guess the Tai Le differences weren't different enough....

But leaving the Shan digits aside, lets consider the two ways of looking at the standard Myanmar digits (U+1040 to U+1049) in these two fonts.

Uniscribe and the like have several options fo how to display these digits:

At the time Tai Le support addition was discussed in Unicode (pre-Unicode 4.0), this very issue and also the different forms were discussed, and almost led to both sets of digits being defined in Unicode in the two different blocks, though the theoretical nature of the first problem (Microsoft wouldn't add support until years later in Windows 7) and the fact that the second problem was widely treated as a minor typographic issue kept one set being used by both scripts.

And to our current troubles....

Now I don't want to imply that either Tai Le or Myanmar are not good sharers or that they both need a time out.

Well, not exactly.... :-)

There are many reasonable language experts and font developers who will consider the last two bullet points above as genuine bugs that beak their support that used to work, in prior versions.

As bad as the third point is, imagine incompletely re-applying the font in the fourth case -- a problem that many people might recognize from longstanding problems with Japanese text partially rendered with Chinese fonts!

Those people aren't wrong; there are just disadvantages to working on the edges of languages, of locales, that we start to support....

And of course in the case of issues in Word and RichEdit, there are disadvantages to not carefully dealing with the well-intended (though in my opinion somewhat flawed) designs of some programs and controls.

It is in its own way somewhat ironic that the default behavior doesn't point in the direction of the script and language used for Chinese minority language support. Oops!

But we can just keep that our little secret, right? :-)


Simon Buchan on 15 Feb 2012 5:01 PM:

This seems fairly in-line with Unicode precedent though, notably Han unification. It always seems to cause controversy though! Perhaps Unicode should define a set of language/glyph control codepoints?


referenced by

2012/10/26 The evolving Story of Locale Support, part 28: We finally fixed that 'Install New Languages' thing!

2012/10/02 The evolving Story of Locale Support, part 27: No, the T and the H aren't silent...

2012/08/20 The evolving Story of Locale Support, part 26: Hey Windows 8, there's someone on the phone for you.

2012/07/11 The evolving Story of Locale Support, part 25: Something old, something new, something repurposed, and something...

2012/06/07 The evolving Story of Locale Support, part 24: I Adar you! Hell, I Double Adar you! (Windows 8 ed.)

2012/06/05 The evolving Story of Locale Support, part 23: Tamazight? Outta sight!

2012/04/12 The evolving Story of Locale Support, part 22: Digit Substitution 2.0

2012/03/08 The evolving Story of Locale Support, part 21: The Windows 8 Hijripalooza extraordinaire!

2012/03/02 The evolving Story of Locale Support, part 20: Yes, it's Bangla. Not Bengali!

2012/02/21 The evolving Story of Locale Support, part 19: In honor of International Mother Language Day...

go to newer or older post, or back to index or month or day