On not looking at Uyghur through a Chinese prism

by Michael S. Kaplan, published on 2009/08/17 10:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2009/08/17/9871988.aspx

I just remembered that I have a Suggestion Box!

In it, Abdusalem asked:

Hi Michael,

I would like to ask/or suggest as a topic something about the Uyghur (PRC) locale in Windows 7.

This is a new locale that has been supported by Windows since Windows Vista as you might know. The current default code page of this locale is set to Arabic (Windows-1256) but this code page cannot fully support the script of Uyghur since it lacks some essential Uyghur-specific characters (ې ,ۈ ,ۇ ,ۆ ,ە). These characters are five out of eight vowel letters in Uyghur, not providing support for these five letters means it does not make any sense for Windows-1256 to be the default code page. Does it have to be the default for the Uyghur (PRC) locale anyway?

And there's another issue with this locale. This locale doesn't provide support for GBK (ANSI/OEM 936). The major reason we need for the support for GBK is that most of the applications in PRC are/were written in GBK/GB2312-80. If the system default locale is set to Uyghur (PRC) then the default code page will be Arabic (Windows-1256) and if we try to run GBK/GB2312-80 applications, there will be a messy characters on the UI of the applications since the locale doesn't provide any support for these code pages.

So I'm wondering:

1. why Arabic (Windows-1256) must be set to the default code page for the Uyghur (PRC) locale while this kind of applications are rarely used in the People's Republic of China;

2. if and how the Uyghur (PRC) locale can provide support for GBK/GB2312-80 applications in order to correctly run applications based on these code pages.

I've been playing around with Microsoft Locale Builder to see if the 2nd item is feasible but I didn't make it =P

Perhaps you can't believe that I am starting the week with Uyghurstuff, or with China. If so, I can only say that you don't know me very well....

Actually, I have talked about all of the relevant points here before, actually -- makes my job much easier!

In blogs like this one, which topically remind everyone that code pages are not enough, for anyone.

In Arabic itself, there is no room to sully support the last few characters needed for Farsi (Persian), and there are many many characters needed for Urdu that are not there.

Yet they both still use code page 1256.

That is the way of things; of you think a code page is enough then you are ignoring the rules of governments behind these languages that clearly put in requirements for more characters.

I won't even get started on China here, which takes my argument and proves it times 20,000 when it comes to GB-18030, which supports all of Unicode!

Now clearly the Microsoft model for code pages is not to say "if you can't do everything, then do nothing!" which is what a cp936 selection would have meant for Uyghur.

There are two very good reasons that the "if you can't do everything, then do nothing!" model is not chosen:

1) It disrespects the language completely -- one cannot say that Uyghur, a language written with Arabic letters, will support 0% of the letters in the language;

2) It disrespects Microsoft's customers by giving them intentionally wrong data that their applications can try to use.

Since Microsoft tris to values both languages and customers, the right (or as right as possible) answer is usually best....

If the code page had not been 1256, then Uyghur would have been a Unicode-only locale, just like Tibetan, Mongolian, and Yi were.

The answer was never going to be cp936.

In that case, is the cp1256 choice, the "as good as one can get" choice, truly so bad?

Now Abdusalem's troubles with Locale Builder are explained in Where the hell did Replacement Locales come from?, which explains why the default system code page cannot be updated in a replacement locale.

Even if all of this were not true, then there are the issues mentioned in Can a codepage be changed? How about which codepage a locale points to? -- code pages cannot be changed, and locales c

But for the people who really want the behavior Abdusalem describes -- anyone who feels that their Uyghur experience is better served with a Chinese code page -- has an easy answer:

Uyghur default user locale
Simplified Chinese default system locale

With that, the problem is solved.

Remember that the default system locale is only about the language and some minor details related to font behavior which, to be brutally honest, anyone who prefers the PRC's code page for Uyghur would probably want anyway!

# Abdusalam on 14 Sep 2009 11:47 PM:

Great thanks for the clarifications!

However, if I were the decision maker, I would have chosen the "if you can't do everything, then do nothing!" model that could have made things more clear and straightforward. In this case, that means:

"Uyghur is simply an Unicode-only language/script, Microsoft Windows doesn't provide any code page support for it at all. So if you guys are going to develop applications for it, you had better (to be exactly must) turn to Unicode and don't bother with LCID's and GetLocaleInfo() stuff."

# Michael S. Kaplan on 15 Sep 2009 2:33 AM:

It is mostly what we have done for languages with code pages that did not come close.

But qwhy would we tell people not to bother with LCIDs or GetLocaleInfo? Those are just as relevant for Unicode only locales!

# Abdusalam on 15 Sep 2009 10:21 PM:

Oops, my mistake. I actually wanted to mean the code pages stuff (not locales). I sometimes confuse the two =)

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2011/03/08 Lackng official influence frees me up somewhat to talk about things that interest me

2009/08/24 No Welsh puns here. Ever. Not even a little one. Not at all...

go to newer or older post, or back to index or month or day