The T's are crossed, but not all of the I's are dotted...

by Michael S. Kaplan, published on 2007/03/06 03:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/03/06/1812744.aspx


Localizer Michel from down the hall and one floor down had been asked about a potential bug, so after that provided me with some screenshots and asked:

Hi Michael,

Not sure if this is a bug or by design (legacy keyboard).

Is there an alias to report and investigate this bug?

This is obviously not linked to Turkish Vista SW localization as I first thought it was.

Thanks,
Michel

You can get to the same place as the screenshot would take you if you have Vista. Just open up the Text Services and Input Languages dialog (ignore the Russian keyboard under Ukrainian and the Ukrainian keyboard under Russian, that is for a different bug!):

It is a little known but undeniably cool fact that either double-clicking on a keyboard name of clicking the Properties... button while one is selected will launch a preview window, like the two shown below for the Turkish Q and Turkish F keyboards:

You may notice the problem now, or maybe not if you don't know the Turkish keyboards all that well. I'll give you the MSKLC versions in the base, shift, and CAPS LOCK state to see it helps.

First the Turkish Q keyboard:

  

 

Then the Turkish F keyboard:

Did you see it? There is something really wrong going on with the I (dotless and/or otherwise!). Just on the previews, though....

The problem has a lot to do with the fact that the way the preview dialog works is (instead of getting the characters by using the actual data in the keyboard layout like MSKLC does) to use the same info that feeds the GetKeyNameText function, which itself depends on the data that feeds the MapVirtualKeyEx function. In fact, when dealing with the function itself GetKeyNameText calls MapVirtualKeyEx twice:

The resulting string is then uppercased, though unfortunately the uppercasing is done without using the LCID attached to the KLID and the LCMAP_UPPERCASE | LCMAP_LINGUISTIC_CASING flags.

This means that Turkic casing is not done.

In the generic casing table, U+0131 (LATIN SMALL LETTER DOTLESS I) uppercases to itself, and U+0069 (LATIN SMALL LETTER I) uppercases to U+0049 (LATIN CAPITAL LETTER I)....

So we are left with previews of mostly capitalized keyboards that contains cased variants of the dotless I....

Okay, now picking this apart and complaining about it is easy. But deciding on the easiest way to address it is a bit harder.

As you can see, the code can't pick any one shift state with the actual keyboard data as the results they want do not exactly match any of them. Even the CAPS LOCK won't always work -- try out the Hebrew keyboard layout with the CAPS LOCK in MSKLC if you don't believe me:



So perhaps just making sure to do linguistic casing with the appropriate LCID is the easiest way to go and always return the results one would expect. And until then having a bit of a problem with the T's being crossed but the I's not being dotted is a small price to pay for doing so well in every other case.

(Now coming up with the best way to create such a preview would probably not make an interesting interview question, even with someone who claimed to have good knowledge of these functions in USER, given how even the experts have to really work this one all the way through and either test it out or look at source!)

 

This post brought to you by ı (U+0131, a.k.a. LATIN SMALL LETTER DOTLESS I)


no comments

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2008/06/25 Seeing the tears, my heart went out to her as I asked her "Why the Long S?"

2007/07/26 Report of blank previews of unknown etiology

2007/03/26 Flirting with a strange keyboard, will you remember its language in the morning?

go to newer or older post, or back to index or month or day