The two most important components of letting go and moving on are (1) letting go and (2) moving on

by Michael S. Kaplan, published on 2009/11/07 14:31 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2009/11/07/9919141.aspx


It was Friday afternoon when Santhosh (Santhosh Pillai, aka THE Santhosh, the guy who helped us with the collation story for Malayalam way back when) was asking a question. The question was:

Hi:

Is there an updated version of this page http://www.microsoft.com/globaldev/keyboards/kbdinmal.htm available now that Malayalam has Atomic Chillus in Unicode 5.1?

Thanks
Santhosh

Interesting.... though of course the real underlying question should be more about the keyboard layout(s) in Windows -- the website is just a description of the set of layouts that are installed in the operating system.

Updating the web site (the actual website, I mean - not the old one we're talking about here!) is a separate matter, but that can't ever happen until/unless the thing the site is meant to be described is updated.

Now back in the old days, there was a time that I was one of the people Microsoft sent to Unicode Technical Committee meetings, one of the people who came back from those meetings working on how and when to make updates to Windows, sometimes the person who made the actual updates to the keyboard layouts, and always the person who checked in the final layouts to the product.

In those days, answering this question would have simply been an act of recollection -- remembering the salient details of

but now things are different.

The Malayalam Chillu debate was going on strong while I was still involved with Unicode, though no final decisions had been made. And whether or not a need existed to include atomic characters for these entities was a fairly central question that would have to be solved before anyone discussed what product changes would be needed and when.

But other people were minding those stores, I was doing other things.

So to answer Santhosh's question, I did it the old-fashioned way - I looked at the product to see what was there.

First, I started in Character Map. I knew the fonts would be updated (Peter Constable was the one who explained to me how the Chillus worked way back when this all first started in Unicode years ago), so I wanted to look and see if there were any characters that were in the font but not in the Charmap list of names, like this one:

Indeed, there are 17 of them if you include the atomic Chillus, the ones added for Sanskrit, the symbols, the signs, and so on:

Then I handed each one to LCMapString one at a time. None of them had weight in Vista but all of them have an assigned weight in Windows 7 (some as numbers, some as symbols, some as letters -- kine of what you might expect by looking at the list).

Okay, good so far -- just no updated character list in Character Map. Unfortunate, but hardly tragic, as the sadness over not seeing the name in the lower left hand corner of that dialog is quickly mitigated by the character's presence in the font itself! :-)

The keyboard story was less fortunate.

I loaded up the one and only keyboard layout in MSKLC:

took a quick look at the keyboard, and then saved it out as a KLC file looking at code points in case I missed anything.

They aren't there.

Oops.

My first reaction was that somebody must have messed up, been asleep at the switch, etc.

But then I realized that was how everyone felt whenever they came to me because of something they perceived as an omission or bug. Knowing more of the underlying infrastructure does not make me any more psychic than the people who used to come to me -- I cannot read the minds or intents of the owners.

Maybe the update was not so easy to do. The Character Map thing is an obvious omission, but that is just a small bug on someone to get it updated.

The keyboard layout is the complicated one, of course. The layout is based on the INSCRIPT standards coming out of India, and although adding the letters would not have been unreasonable, there are two sides to that story and there may well have been reasons not to add them, too.

Collation beyond the "some weight" question is an interesting one; ideally it would be handled with equivalences the way we did Romanian with the comma below/cedilla below.

Grabbing the table from the Unicode 5.1 update:

Table 1. Atomic Encoding of Chillus


 
Visual Representation in 5.0 and Prior Preferred 5.1 Representation
1 CHILLU_NN.png NNA, VIRAMA, ZWJ
(0D23, 0D4D, 200D)
0D7A MALAYALAM LETTER CHILLU NN
2 CHILLU_N.png NA, VIRAMA, ZWJ
(0D28, 0D4D, 200D)
0D7B MALAYALAM LETTER CHILLU N
3 CHILLU_RR.png RA, VIRAMA, ZWJ
(0D30, 0D4D, 200D)
0D7C MALAYALAM LETTER CHILLU RR
4 CHILLU_L.png LA, VIRAMA, ZWJ
(0D32, 0D4D, 200D)
0D7D MALAYALAM LETTER CHILLU L
5 CHILLU_LL.png LLA, VIRAMA, ZWJ
(0D33, 0D4D, 200D)
0D7E MALAYALAM LETTER CHILLU LL
6 k undefined 0D7F MALAYALAM LETTER CHILLU K

Ok, looking at the weights of the first five entries in that table:

old way weight new way weight
U+0d23 U+0d4d U+200d 3a 77 01 01 01 01 00 U+0d7a 3a 72 01 01 01 01 00
U+0d28 U+0d4d U+200d 3a 8b 01 01 01 01 00 U+0d7b 3a 86 01 01 01 01 00
U+0d30 U+0d4d U+200d 3a a7 01 01 01 01 00 U+0d7c 3a a6 01 01 01 01 00
U+0d32 U+0d4d U+200d 3a af 01 01 01 01 00 U+0d7d 3a aa 01 01 01 01 00
U+0d33 U+0d4d U+200d 3a b3 01 01 01 01 00 U+0d7e 3a ae 01 01 01 01 00

They don't match. I'd have to see what else is in the Malayalam table to know if it is only the equivalence that wasn't done (there might be actual ordering issues also) but I can't tell for sure (I have my hands full trying to learn Tamil and Bengali!). Offhand the weights never look to far from each other, so perhaps it was just a conscious decision to not support the equivalance....

I honestly don't know the answer to any of the questions I posed above, but I can probably ask a question or two of some people next week (post re-org I'm not 100% sure owns all this stuff now, so it could take me some time to track down who to ask!).

But either way there are at a mimimum a few bugs that I found in all this; I'll talk to some testers I know down the hall about those ones even sooner.

As I said in the title, the two most important components of letting go and moving on are (1) letting go and (2) moving on. But I'm likely to get curious now and again about how things are going....


# PavanajaU B on 24 Nov 2009 4:05 AM:

Which version of Unicode is used/supported by Windows 7?

Rgds,

Pavanaja


Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day