It's 2012! NT 3.1 just called, they want their [circa 1993] methodology back!

by Michael S. Kaplan, published on 2012/08/03 07:01 -04:00, original URI:

Some questions confuse me even when I know the language in which they are asked fluently.

In my experience, the most common cause is when the person asking the question is trying to connect two things together that lack a direct, useful connection.

In such cases, understanding what the true question involves kicking the living crap out ofdisproving the "wrong" question, and help the right one emerge.

Take for example the following question sent the other day:

Is there a table that shows what the delta is between [the list of supported input languages] and [the list of system locales]?  Hindi is one of the examples, and I am interested in a comprehensive list.


Okay, we'll start by defining the two terms:

[the list of supported input languages] - Mostly you can go to HKLM\SYSTEM\CurrentControlSet\Control\Keyboard Layouts\ and enumerate the subkey under it. This will give you  list of the keyboards.

IMEs are not included, but all of the existing IMEs fall in two categories:

Thus, all of them are either Unicode-only or Unicode-only enough for current purposes that they may as well all be treated that way.

For the keyboards under that registry key, every subkey contains a KLID; if converted to  hexadecimal number, then LOWORD(<KLID>) is usually but not always a LANGID.

You can pass those LANGID values to GetLocaleInfo(LOWORD(<klid>), LOCALE_IDEFAULTANSICODEPAGE, ..) and if you get back a 1 then you have a Unicode-only locale.

Note that if LOWORD(<klid>) is 0x0c00, then it's either one of those locale-less keyboards we added in Windows 8 (most but not all of which are Unicode-only) or a custom keyboard created by MSKLC based on a custom locale (which does not have a known code page).

Of the rest of the keyboards that do have ACP values, many of them contain characters outside of the corresponding default system code page.

Now while a LOCALE_IHASLETERSOUTSIDETHECP would potentially quite useful, we don't have that. And it may not match the keyboard anyway.

Not to mention that there is no intrinsic queryable property or attrbute of a KLID that can be used to easily identify what a keyboard supports.

SUMMARY: for almost every keyboard, you cannot find out whether the keyboard corresponds to a code page.

Any code page.

Popping the stack from this disaster for a moment, let's o back to the original question.

Now [the list of system locales] is a bit less messy to get.

Just EnumSystemLocalesEx will let you get that list, and if needed you can use GetLocaleInfoEx(<enumerated name, LOCALE_IDEFAULTANSICODEPAGE,...) not returning 1 but returning any other number to mean it's a valid potential system locale.

Easy, and even supports custom locales!

Of course they all have some overlap.

In the end, they only needed the second part anyway for what they had in mind - the real question was about working with characters off the default system codepage; keyboards were the easiest way to repro the problem and thus the repro overcomplicated things.

But now that you wade through either list, and even try to match them up where they overlap, a reasonable person can come to just one conclusion.

It isn't 1993 anymore.

If you depend on code pages then...

Joshua on 20 Aug 2012 9:09 AM:

Fix the console or I don't care.

Michael S. Kaplan on 20 Aug 2012 6:46 PM:

We did -- it's called the PowerShell ISE! :-)

Joshua on 21 Aug 2012 3:44 PM:

That's just swapping one set of bugs for another (Unicode display works, but keyboard-interactive console programs don't).

Michael S. Kaplan on 21 Aug 2012 9:54 PM:

Old style interactive keyboards are not enabled PowerShell scripts that know how to interact....

Joshua on 22 Aug 2012 9:43 AM:

I am not going to retool hundreds of programs to have two completely independent input paths. I might as well link against cygwin's LIBC instead and use their UTF-8 terminal.

go to newer or older post, or back to index or month or day