Understanding (and explaining) why English is everywhere

by Michael S. Kaplan, published on 2006/10/14 12:20 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/10/14/825404.aspx


 I have mentioned Mike Williams once or twice in previous posts. He actually used to work for Microsoft -- first in Australia where he helped bring out Pocket Access for CE 2.0 and then in Redmond he worked on the Access PM team. The latter is where I met him, so you could say we are former colleagues.

Anyway, he is on the Vista beta and is a palpable force in the beta's private globalization newsgroup (responsible for about 20% of the posts there in the past year). This number may seem impressive (it impressed the crap out of me!) though I must temper that with one qualification -- many of the posts are about one single issue!

The issue is the way that English seems embedded in all of the locales, particularly in the area of keyboards. This is definitely true prior to Vista, where version after version it seemed like no matter what one picked as the locale at setup time, one would have this extra English - US keyboard. Even if one was in some other English locale that has its own keyboard.

Since XP, the exciting feature where changing the user or system locale adds a keyboard made the problem a bit worse, since it seemed that from time to time that English - US keyboard would get added back.

The source for this data in XP was a section of intl.inf that looks something like this:

;
; List of locales.
; <LCID> = <Description>,<OEMCP>,<Language Group>,<langID:HKL pair>,<langID:HKL pair>,.....
;
[Locales]
00000436 = %Afrikaans% ,850 ,1,,0436:00000409,0409:00000409
0000041c = %Albanian% ,852 ,2,,041c:0000041c,0409:00000409
00000401 = %Arabic_Saudi_Arabia% ,720 ,13,,0409:00000409,0401:00000401
00000801 = %Arabic_Iraq% ,720 ,13,,0409:00000409,0801:00000401
00000c01 = %Arabic_Egypt% ,720 ,13,,0409:00000409,0c01:00000401
00001001 = %Arabic_Libya% ,720 ,13,,040c:0000040c,1001:00020401
00001401 = %Arabic_Algeria% ,720 ,13,,040c:0000040c,1401:00020401
00001801 = %Arabic_Morocco% ,720 ,13,,040c:0000040c,1801:00020401
00001c01 = %Arabic_Tunisia% ,720 ,13,,040c:0000040c,1c01:00020401
00002001 = %Arabic_Oman% ,720 ,13,,0409:00000409,2001:00000401
00002401 = %Arabic_Yemen% ,720 ,13,,0409:00000409,2401:00000401
00002801 = %Arabic_Syria% ,720 ,13,,0409:00000409,2801:00000401
00002c01 = %Arabic_Jordan% ,720 ,13,,0409:00000409,2c01:00000401
00003001 = %Arabic_Lebanon% ,720 ,13,,0409:00000409,3001:00000401
00003401 = %Arabic_Kuwait% ,720 ,13,,0409:00000409,3401:00000401
00003801 = %Arabic_UAE% ,720 ,13,,0409:00000409,3801:00000401
00003c01 = %Arabic_Bahrain% ,720 ,13,,0409:00000409,3c01:00000401
00004001 = %Arabic_Qatar% ,720 ,13,,0409:00000409,4001:00000401
0000042b = %Armenian% ,437 ,17,5,042b:0000042b,0409:00000409,0419:00000419
0000042c = %Azeri_Latin% ,857 ,6,5,042c:0000042c,082c:0000082c,0419:00000419
0000082c = %Azeri_Cyrillic% ,866 ,5,6,082c:0000082c,042c:0000042c,0419:00000419
0000042d = %Basque% ,850 ,1,,042d:000040a,0409:00000409
00000423 = %Belarusian% ,866 ,5,,0423:00000423,0409:00000409,0419:00000419
...

and so on....

That HKL pair is actually the KLID, the Keyboard Layout Identifier that I discussed in the post Why are the HKL and KLID of the keyboard different?. You might notice how many of them include 0409:00000409 or in some cases <some lang id>:000004091 -- the former is the source of the active English - US keyboard adding any time it is either a CJK locale or when it is the first item in the list, due to a bug that was keeping the whole list from being added when you changed locales other than CJK ones.

The big changes in Vista in this area are:

  1. Intl.cpl and its entries were removed so the keyboard list was moved to a new GetLocaleInfo LCTYPE,  LOCALE_SKEYBOARDSTOINSTALL2;
  2. Text Services Framework TIPs were add to the syntax so the guids used to identify them could be used rather than KLID values;
  3. The original bug that kept the whole list of layouts from being added when one changed the locale was fixed.

Now it is that third point that is the primary cause of Mike's concern that has led to the huge number of posts about a bug that is getting worse all the time in his opinion. And of course despite entering the issue many times a bug Microsoft does not seem to fix despite saying it is fixed, build after build.

Most of that is just strange communications, and the fact that this list of keyboards is actually one that is worked out with and approved by the various subsidiary contacts around the world. So the cases where the US - English keyboard (0409:00000409) is added are actually intentional and requested.

I wrote a little code very quickly to enumerate all of that keyboard info in Vista: 

using System;
using System.Text;
using System.Runtime.InteropServices;

public class Test {
    [DllImport("kernel32.dll", CharSet=CharSet.Unicode, ExactSpelling=true)]
    internal static extern bool EnumSystemLocalesEx(EnumLocalesProcEx lpfn,
                                                    uint dwFlags,
                                                    IntPtr lParam,
                                                    IntPtr lpReserved);

    [DllImport("kernel32.dll", CharSet=CharSet.Unicode, ExactSpelling=true)]
    internal static extern int GetLocaleInfoEx( string lpLocaleName, 
                                                uint LCType, 
                                                StringBuilder lpLCData, 
                                                int cchData);

    public delegate bool EnumLocalesProcEx(IntPtr lpLocaleString, uint dwFlags, IntPtr lParam);

    public static uint LOCALE_WINDOWS = 0x00000001;
    public static uint LOCALE_SUPPLEMENTAL = 0x00000002;

    public static uint LOCALE_SKEYBOARDSTOINSTALL = 0x0000005e;

    public static uint CountOfEnglish = 0;
    public static uint CountOfEnglishAny = 0;
    public static uint CountOfAll = 0;

    public static bool MyEnumLocalesProcEx(IntPtr lpLocaleString, uint dwFlags, IntPtr lParam) {
        string stLocaleString = Marshal.PtrToStringUni(lpLocaleString);
        int cch = GetLocaleInfoEx(stLocaleString, LOCALE_SKEYBOARDSTOINSTALL, null, 0);
        if(cch > 0) {
            StringBuilder sb = new StringBuilder(cch);
            cch = GetLocaleInfoEx(stLocaleString, LOCALE_SKEYBOARDSTOINSTALL, sb, sb.Capacity);
            if(cch > 0) {
                string stKeyboards = sb.ToString();
                if(stKeyboards.IndexOf("0409:00000409", StringComparison.Ordinal) > -1) {
                    CountOfEnglish++;
                }
                if(stKeyboards.IndexOf("00000409", StringComparison.Ordinal) > -1) {
                    CountOfEnglishAny++;
                }
                CountOfAll++;
                Console.WriteLine(stLocaleString + " --> " + sb.ToString());
            }
        }
        return true;
    } 

    public static void Main() {
        if(EnumSystemLocalesEx( EnumLocalesProcEx(MyEnumLocalesProcEx),
                                LOCALE_WINDOWS | LOCALE_SUPPLEMENTAL,
                                IntPtr.Zero,
                                IntPtr.Zero)) {

            Console.WriteLine(); 
            Console.WriteLine("# with 0409:00000409: " + CountOfEnglish);
            Console.WriteLine("# with 00000409: " + CountOfEnglishAny);
            Console.WriteLine("# of locales, total : " + CountOfAll);
        }
    }
}

You can compile it and run it on a Vista machine quite easily. I'll skip the huge list and just show what the summary at the end pointed out:

# with 0409:00000409: 124
# with 00000409: 149
# of locales, total : 205

 So, out of 205 different locale entries 124 or 60.5% of them include the US - English keyboard in the list and will see it added because the people charged with determining what is best for each locale both in Redmond and locally in the subsidiary have determined that this keyboard as an option is a good idea. It is possible that some of these cases could be mistakes, but by and large it is incredibly unlikely that they all could be -- which means that in most cases the "bug" is actually the intended and expected design.

Anyway, I hope this resolves some of the confusion about the US - English keyboard being so many places, or if nothing else will at least help the bug reports be more properly aimed at issues that can be addressed rather than at underlying issues that are intentionally the way that they are!

 

This post brought to you by (U+0d9e, a.k.a. SINHALA LETTER KANTAJA NAASIKYAYA)

 

1 - Some may wonder why there are cases (such as in Afrikaans) that have entries like 0436:00000409;0409:00000409 in Vista, and 0436:00000409,0409:00000409 in prior versions. This is so that spell checkers and language tagging in programs like Microsoft Word can be influenced by the keyboard layout choices, which is the most common reason that one might want to have multiple keyboard layouts that use the same underlying layout....

2 - As an interesting bit of trivia, there was a typo in the name of the original checkin and the constant was defined as LOCALE_SKEYBOARDSTOINSTALLL. I believe it was changed before it ever hit an external beta, but it was pretty strange to think of a constant with a speech impediment!


# ReallyEvilCanine on 16 Oct 2006 10:09 AM:

I can't tell you how annoying it is that the EN-US keyboard keeps getting added back. I remove it from a German- or French-only machine and within a few weeks, it's back. This happens in XP and Server 2K3. On my main XP box I <i>can not</i> log in aymore with a German keyboard layout -- it <i>must</i> be in US. I have a few symbols in my password and now can't use any umlauts or "ß"es.

Is there some reg key I can change so that the layout I want is available on boot? Ideally it would be my modified layout which maps almost every Western European character (e.g., æ=AltGr+a, Æ=AltGr+A, ç=AltGr+c, etc.).

Grrr...!!

# Michael S. Kaplan on 16 Oct 2006 10:16 AM:

Did you look at this post for the answer to the login dialog keyboard selection?

# ReallyEvilCanine on 16 Oct 2006 1:04 PM:

Yep. I like the doc. It didn't work.

# Michael S. Kaplan on 16 Oct 2006 1:41 PM:

What precisely didn't work?

# Mike Williams on 21 Oct 2006 6:55 PM:

You fail to mention the idiotic comments made by triagers in these bugs, who close it out NOT REPRO, or BY DESIGN because they think there is a separate version of Windows for other English language countries.

Your post fails to state why it would be useful for an end-user to have two active keyboard languages (not layouts), especially when the wrong one becomes default from time to time. Of course Australia uses the US layout, but what is the point of adding a second US layout keyboard with US language?

And why isn't any of this reasoning EVER supplied in any of the bug reports? It seems rather disingenuous to post publicly about me here, instead of addressing the issue in the bug reports submitted. In fact I would go so far as to say it is rather unprofessional to do as you have done, considering that I theoretically cannot post about private beta content publicly?

# Michael S. Kaplan on 21 Oct 2006 7:32 PM:

The issues with the US keyboard layout being included actually spans multiple releases of Windows and is both requested by SPMs and by design when it is requested. It is an issue that is easy to get confused by and pretty much demands an explanation in both existing, shipping versions of Windows and in the upcoming Vista, where the problem becomes "worse" for the reasons I state in the post.

The post is "written to" the person who has complained about the issue the most, but is really intended for the wider audience of people who know you have been bringing it up both in public and private forums for the last few years, and is intended to explain why this issue exists, why it is more acute in Vista, and why large parts of it are by design and continually reporting it can actually distract from real issues outsude of it that are brought up from time time....

As to why the layout has been requested in so many different locales? The individual contacts in the various locales would have to answer for sure, but my offhand guess for their reasoning would be the times that people actually need spell checking (which is where this is most often used) to follow particular conventions for specific situations. But that is just a guess, I can't put myself in the mind of every Windows SPM of every region. They have the opportunity to review this data and are invited/encouraged to do so throughout the product cycle.

# Mike Williams on 22 Oct 2006 2:34 AM:

Why do you keep referring to the US "layout" rather than the US language keyboard? I don't ask why the US layout is used in many locales. I ask again "Of course Australia uses the US layout, but what is the point of adding a second US layout keyboard with US language?". This did not happen before XP SP2.

There is no need to add an additional keyboard to do spell-checking, since (for example) programs in the Office Suite allow you to mark any text-range as any language.

# Michael S. Kaplan on 22 Oct 2006 2:45 AM:

124/205 have it this way, and that seems like a fairly significant figure, doesn't it? I mean, you could imagine someone making a mistake, but that many people across that mant locations?

It is valuable to have the keyboard as part of the change, as by default people can use the layout to put the preference in rather than tagging the text explicitly. And this is much easier for many people....

# Mike Williams on 22 Oct 2006 5:57 AM:

I'm not sure what you mean by "as part of the change". This happens at machine/account setup time, and didn't happen before XP-SP2. If I remove the extra keyboard, it always comes back within 2 reboots. Always.

124/205 locales are NOT English locales. I don't now what's going on in the non-English locales, but none of what you have written explains why a locale that already defaults to a US layout has to get a second US layout keyboard installed.

Having a second keyboard installed that is marked as US language  causes a number of problems:

a) apart from the imposition of the Language Bar, which is even uglier in Vista because it flies up across the screen every time you get a UAC dialog;

b) the US language keyboard often sets itself as default; which means that

c) users are typing into documents with text marked incorrectly so spell-checking doesn't work properly. It's not obvious when this happens becuase both EN-US and EN-AU look like EN on the taskbar.

If users want to type multi-lingually then they can manually add the extra keyboard languages as an alternate to using the app-specific features.

It looks like this could be having an effect on the way that Word's auto-language detection feature works as well. Say you type in EN-AU, then do some French and maybe some Spanish (this is the scenario where I discovered this little problem). When you type in English again, LAD switches you to EN-US rather than EN-AU even though you started in EN-AU and your doc defaults may be EN-AU. It seems to be seeing the extra EN-US keyboard and using that as a flag to switch to the wrong language. The keyboard language always overrides the document settings.

So, put yourself into the mind of someone from an English locale other than the US. Why would they want an additional keyboard foisted onto them, particularly if that addition is not something that type in daily, and which may override their national default language? Why is it necessary to do it automatically when the small subset of such users who need to write or proof text in multiple languages have always been able to install a second keyboard manually? "An SPM says so" is not a reason. It's passing the buck.

# Michael S. Kaplan on 22 Oct 2006 6:39 AM:

124 locales out of the 205 entries in Vista include 0409:00000409 -- you can run the code and get the full list if you don't believe me. Now in addition there are several that have 00000409 without the en-US atop it, but that is a much smaller number.

Respecting the opinions of people in the subsidiary is not (in my opinion) passing the buck, and I have never considered that to be the case -- their work and their opinions are way too important to Microsoft to consider them to scapegoats for anything. They are literally on the front lines here.

As often as I will put myself in the mind of customers throughout the world, I won't make my opinion more important than theirs on a matter like this, even if it means that I appear in your eyes to be passing the buck. :-(

# Mike Williams on 22 Oct 2006 10:59 AM:

You haven't properly addressed any of the issues I raised, but keep going back to what I think is a strawman argument about 124 mostly non-English locales.

Aside from that, your argument still comes down to "SPM opinion" which in and of itself would count for very little in product design unless a specific set of pros and cons were martialled to argue for it. Prior to working in Redmond, I spent 5 years in subsidiary front-line roles (often getting steam-rollered by folks in Redmond who literally confused Australia and Austria on a regular basis), but I don't ever remember seeing an unsubstantiated opinion driving product design.

# Michael S. Kaplan on 22 Oct 2006 2:10 PM:

Ok, I guess for now we are at an impasse. So I will simply point that I am giving the actual technical reason that you are seeing a particular behavior, why it has been around since Windows XP, why it is in Vista.

Since it is something that you have strong feelings about, knowing why it is happening from a technical standpoint has some value all on its own, independent of the next question, which is why it is happening from a justification standpoint.

On a more personal note, I myself have never "steam-rollered" anyone from the subsidiary and I am glad that I not only never have, but that I have been involved with a great deal of enabling of features and addressing of issues important to the subsidiaries, and to the customers they serve.

The questions you raise here will be reported back to the people who have actually made the decisions (for philosophical, practical, or whatever reasons they had; they were not all the same) and one day I can maybe report back on those reasons, and in the meantime you have been armed with knowledge of the specific means by which it is happening....


referenced by

2008/10/01 What do you get when you put a Hebrew on top of a Russian? (aka What lies beneath can bite you on the ass)

go to newer or older post, or back to index or month or day