Fight the Future? (#1 of ??), aka The inappropriate nature of getting the Feh out of Uighur

by Michael S. Kaplan, published on 2008/04/04 03:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/04/04/8355413.aspx


Content of Michael Kaplan's personal blog not approved by Microsoft (see disclaimer)!
Regular readers should keep in mind that all I said in The End? still applies; the allusion to the X-Files continues for people who understand such references....

So it was just within the last couple of weeks that comments end up on the Vista Team Blog, like one from Shanghai software engineer Abdusalam, aka VistaUyghur, in this post:

Greetings,

Well, this seems to be the place to send feedbacks on Vista SP1 RTM for now, right?

Then let's start.

I am an unofficial (non-Microsoft Connect user) tester for Windows Vista SP1 RTM now. After the first relaese of Vista RTM, we did a test on it.  As a result, we found a very serious problem, that was/is the IME issue related to the KEYBOARD LAYOUT of Uyghur (aka Uighur in Vista) language.  Afterwards, we sent a feedback to Microsoft China, and they DID AGREE that they will fix this through the coming service pack, SP1.  However, the problem still exists in the SP1 RTM for Vista.  I'm not sure if this issue will be fixed through later hot fixes.  I think this blog site is apparently not the place to provide too much details.  Here I would like to know how I can send this important (for us, maybe also for you) feedback info to you or Microsoft, DIRECTLY.

Thanks.

Or the one done several days prior and then repeated one day prior by someone with the handle Uyghur in this post,l which has the advantage of spelling out what the reported problem is:

Dear Mr. Nick White,

We are Uyghur (Uighur) and Microsoft Windows users. We were very excited when Microsoft started to support our language and script in its Windows Vista operating system. When the Beta version was released, we tested it extensively, found many bugs, and reported them to Microsoft. Later we were told that these bugs would be fixed in its official release. We found the same bugs again in its official release and reported them again, and were informed that they would be fixed in SP1. Now the bugs are still there in SP1 and we are very frustrated.

This serious bug is about the Vista's support of our language and script - Uyghur (Uighur). (see
http://en.wikipedia.org/wiki/Uyghur_language for more details about the language and script.)

Our script - Uyghur (Uighur) is an alphabetic script with 32 letters, based on Arabic and written from right to left. In Vista, Microsoft's support of our script comes with a font named MS Uighur, an input method and a keyboard layout.

In Unicode standard, national and local standard, one of our letter, F's unicode number is 0641, but Microsoft have used 06A7 instead, resulting in serious incompatibility issues.

Prior to Vista and until now, we have been processing our script on Windows 98, 2000, and XP with third party fonts, input methods, and keyboard layouts, using unicode character 0641 for our letter F.

With this serious incompatibility problem, we have been in great difficulty in migrating from previous versions to Windows Vista. We hope Microsoft and the Vista Development Team take this issue seriously and help us using Microsoft products easily and comfortably by fixing these bugs in time.

Sincerely,

On behalf of Uyghur (Uighur) people

A Uyghur (Uighur)

First let's look at these two letters:

U+0641    ف    ARABIC LETTER FEH

U+06a7    ڧ    ARABIC LETTER QAF WITH DOT ABOVE

Okay, we know there are some similarities but they are two different letters, clearly.

Let's take a quick gander at that keyboard, particularly the VK_F key, in both the base state and the shifted state:

Yep, there it is.

Now you might be able to see given th similarities how someone might have made a mistake (whether in the subsidiary, in Xinjiang, or in China -- I am not sure where the .KLC file was produced myself, all I can see at the moment is that the file was given to me and I checked it into the Vista project on April 19, 2005 a bit after midnight Redmond time).

And that the keyboard I checked in then had the right character in it.

THEN, on July 21, 2006 at about 2:30 AM, in direct response to a bug report, the change was made to put what is now being called the wrong letter....

Note that this was very late in the cycle for Vista and required some extra information on the justification.

The comments provided by the people who looked into the bug report at the time explained:

The bug is a small problem of the letter assignments in the keyboard, which leads to big usability problems for users trying type of Uighur.

We need to change a couple of code point mappings on the Uighur keyboard layout, since currently some keystrokes produce unexpected results for the user. The keyboard doesn't work.

Doubts were raised at the time but those doubts were overridden by the strong feedback about the usability issues if the bug was not fixed....

Anyway, to answer the questions raised by the people who reported these problems late last month and despaired that they were not addressed in SP1, it seems like there are different forces at work here.

To make it more fun, let's look at these two characters in Tahoma and Microsoft Sans Serif and Microsoft Uighur, blown up to 48pt:

I am starting to understand why a former colleague of mine used to refer to Tahoma as a "crap cartoon font" (mentioned before here and here) for the Arabic script, and I'll go out on a limb and suspect that this kind of thing might have played a part in the [possibly incorrect] feedback?

In any case, the FEH is much more likely to be the right letter here, all things being equal. And the earlier last minute bug report was probably in error.

I am not on the NLS team any more and would not pretend to speak for them here, but if someone in Microsoft China made such a claim about timing as Abdusalam mentioned then that person spoke out of turn as this is a complicated issue to manage and solve and service and maintain -- since of course Microsoft can't change a keyboard layout even if it is provably wrong (ref: here and here), even if in the future the right layout were produced and added to Windows (and such things in service pascks are pretty much unheard of).

In the meantime, MSKLC is a great workaround to get the keyboard layout one wants, and perhaps if U+06a7 is not used in Uighur (as it appears not to be) then some future version fo Windows could fold these two characters together for collation purposes like we did for Romanian and its comma below/cedilla characters! :-)

 

This blog brought to you by ف and ڧ (U+0641 and U+06a7, aka ARABIC LETTER FEH and ARABIC LETTER QAF WITH DOT ABOVE)


# Ted on 4 Apr 2008 10:41 AM:

He's baaaaaaaaaaaaack!

a bit of google trivia - the first "he's baaa...aaack" (using progressively more a's) that doesn't show up in google is  

He's baaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaack

(feel free to ignore this spam)

# Michael S. Kaplan on 4 Apr 2008 12:01 PM:

Haven't you ruined that notion since Google will get around to indexing your comment eventually? :-)

# Mihai on 4 Apr 2008 12:41 PM:

"Romanian and its comma below/cedilla characters"

Nitpicking: Romanian has not cedilla below characters :-)

And I am really glad I have what to nitpick!

(meaning "I am really glad you are back" :-)

# Michael S. Kaplan on 4 Apr 2008 1:32 PM:

Romanian has what can almost be called a legitimate orthography based on 2-3 decades worth of (mostly non-Unicode) computer programs that prefer the cedilla characters -- even if it is not the preferred orthography and never has been. :-)

I am not really back just yet, and may not be. I am fighting the future right now, though....

# Mihai on 4 Apr 2008 1:57 PM:

It's "legitimate orthography" is far from old, or from being legitimate.

The reality on the ground: in the non-GUI times the resolution of a typical character was so bad that there was no difference between a cedilla and a comma. Just two pixels.

In GUI environment most of the people who cared (and especially publishing houses) use hacked fonts with glyphs using comma. Even worse, since there was no Romania Windows until XP, the English one was the most used version. And Win 9x English had so little support for 1250, or Unicode, that the fonts where western fonts hacked with Romanian glyphs. After that, Win was Unicode, but most application where not, so this continued until very recently.

So cedilla was never considered "legitimate orthography" and it's history is probably not longer that 5 years (few years after Win 2000).

You should never assume that if you get some text from Romania using the cedilla codepoints the creator also sees a cedilla :-)

# Michael S. Kaplan on 4 Apr 2008 2:02 PM:

Which gets back to my point that a computer program that does not assume they are equal and deal with it that way might be missing valid data from a user....

This is why Vista treats them as if they are the same!

And this is why a potential long-term solution for the Uighur issue could be to do the same for this "wrong letter" not used in the language (except there were clearly some people using it and clearly will be others using it because of the keyboard).

# Mihai on 4 Apr 2008 7:20 PM:

Oh, no argument there!

I agree that the Vista solution for Romanian is a step in the right direction. It might be a pain for some non-Unicode applications out there.

And remember that my initial comment stated clearly: nitpicking :-D


referenced by

2010/05/28 The report of the need for a Uyghur hotfix may be an overstatement

2010/05/18 All right, guys, uh, listen. This is a locale in Windows, watch me for the changes, and try and keep up, okay?

2010/02/25 The inappropriate nature of getting the Feh out of Uighur, Windows 7 edition

go to newer or older post, or back to index or month or day