Some keyboarding terms

by Michael S. Kaplan, published on 2004/11/27 03:08 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2004/11/27/270931.aspx

This posting will try to clear up some of the problems in documentation and info regading keyboards, since there is plenty left in those things to be confusing and there is no need to throw bewildering terms into the mix. Future posts will build on this one, so if you already know about keyboards you might be able to skip it (though I would not advise it!). It is not really a glossary since it is not alphabetized; the order is arbitrary based on either when I thought of a term to add or when I thought dramatic effect could be increased.

LCID -- a locale identifier. Traditionally pronounced like "El-Sid". It is a value that has no real meaning at all to keyboards but some people who ought to know better seem to think it does. Those people are actually thinking of LANGID, another entry in the glossary. LCIDs are only 32 bits, a fact that sucks for reasons I'll talk about another day, but it's twice as much as keyboards use, anyway.

LANGID - a languge identifier. Traditionally not pronounced like "Lan-Gid", instead "Lang-Eye-Dee" is preferred. This is the bottom WORD of the DWORD that makes up an LCID and essentially represents a number that signifies a unique language/region/script combination. This combination is what is needed for keyboards, which are intended to be typical input methods for such combinations. Obviously, looking ahead to the world of custom cultures ans custom locales, this may not be the best design architecture. But it's a little too late to change now....

Layout ID -- a layout identifier. This has no special pronunciation, probably because calling something a "Lid" would sound dumb. These numbers are used to help the USER SUBSYSTEM manage the situation when multiple keyboards use the same LANGID (something that happens for many keyboards that ship with Windows). Each keyboard layout using the same LANGID after the first must (1) have one and (2) it must not duplicate one that is already assigned on the system. Failure on either of these points will cause a layout to not be properly selected.

KLID -- a keyboard layout identifier. Traditionally pronounced "Kay-El-Eye-Dee" because some people in the USA get very uptight about certain homonyms (you can catch me slipping on this point from time to time). It's also sometimes called the input locale identifier since the name for HKL has been updated (see the HKL definiteion for info on why that is incorrect since the HKL is for something different). The KLID can be retrieved for the currently selected keyboard layout in a thread through the GetKeyboardLayoutName API (note the pswzKLID parameter), though that is not true of any other selected or installed keyboard layout. Every keyboard layout on the system has one of these. Each KLID is 32 bits (thus 8 hex digits), and they can all be found in the registry as the subkeys under HKLM\SYSTEM\CurrentControlSet\Control\Keyboard Layouts\. The bottom half of the KLID is a LANGID, and the top half is something device-specific. By convention, the first hex digit is usually as follows:

0 -- Most keyboard layouts
A -- Keyboard layouts defined by MSKLC
D -- Some non-CJK input methods that have been defined by the Text Services Framework (note: reported to me; I have never seen one of these!)
E -- CJK input methods, also known as IMEs

HKL -- a handle to a keyboard layout, traditionally pronounced "Āch-Kay-El", the terminiology folks have pretty aggressively tried to call this an "input locale identifier" despite the obvious problem that it has nothing to do with locales and that it is not the same value as the actual identifier (the KLID). The HKL in actuality is the handle to an input method. Althought defined as a handle, only the lower 32 bits are currently used. Of those 32 bits, the bottom 16 bits represent a LANGID, and the top 16 bits represent a value defined by the USER SUBSYSTEM which helps to uniquely identify an installed keyboard layout. This is crucial since any keyboard layout can be installed more than once (by installing it under different languages, which helps user operations like spell checking).

MKLC -- see MSKLC. The only people who call it MKLC are the ones who object to MSKLC since it's not a true acronym (Microsoft being a single word). None of the user interface or documentation calls it this, so it's unbelievable that it is getting such a large entry in this glossary, but people outside of Microsoft use it all the time.

MSKLC -- the Microsoft Keyboard Layout Creator, traditionally prounounced "Em-Es-Kay-El-See" by people not taken in by the MKLC arguments. It is a tool released by Microsoft which allows someone to build a custom keyboard layout and build a setup package to install it on Windows NT 4.0, 2000, XP, or Server 2003. The help file contains a ton of information about the best way to design keyboard layouts that work well on Windows. I am the developer on it and love to hear feedback any time people have it, since there is always room for improvement.

IME -- Input method editor, traditionally pronounced "Eye-Em-Eee". An IME is an engine that converts keystrokes into phonetic or ideographic characters. It is a commonly used abstraction that allows a keyboard with only 100 or so keystrokes to be able to support character sets that contain up to 20,000 or more ideographs. There are old samples in the Platform SDK using the Input Method Manager (IMM) APIs, but today most IMEs written by Microsoft use the much more approachable Text Services Framework.

Supported keyboard layout -- this is an odd terminology that actually means a keyboard layout is defined on the system. It may not be currently selectable by a user (e.g., if it's a Thai keyboard layout and Thai/complex script support is not enabled). It can also be an IME or a speed-to-text converter, so DEFINED INPUT METHOD might have been a better term. This terminology is slowly being removed from documentation and it's not entirely clear what is replacing it.

Installed keyboard layout -- another odd bit of terminology, it means a keyboard layout that a user has selected. It can also be an IME or a speed-to-text converter, so SELECTED INPUT METHOD might have been a better term. This terminology is slowly being removed from documentation and it's not entirely clear what is replacing it.

Scan code -- The numeric value given to each physical key on a keyboard; the scan code is a hardware-dependent number that identifies the key. Scan codes have a fixed position on the physical keyboard, irregardless of the keyboard layout chosen by the user.

Virtual Key -- Also called the VK, the code that is given by the Windows USER subsystem to represent a keystroke. It is mapped from a scan code by using the keyboard layout definition and is thus entirely dependent on the user's chosen layout. The reason for this is the [unfortunate, IMHO] choice to have e.g. VK_A to be used for the 'A' key, which meant that on keyboards that put 'A' in a different position the VK would have to be moved.

That's all I can think of at the moment, but I am sure I will be updating this topic any time I think of something else.

# Norman Diamond on 21 Dec 2004 6:21 PM:

If Layout ID and KLID are two separate identifiers, how come Windows still suffers from the forcing a single value to misrepresent both the keyboard layout and IME together? This is the root cause of some of the problems I've mentioned in feedback to some of your later postings. The physical keyboard has a physical layout of what is shown on its keys, and the IME is a separate layer on top of that. Why can't users (of either a real IME or "Global" IME) independently say what their real keyboard layout is and say what IME they wish to use on top of it?

# Michael Kaplan on 21 Dec 2004 10:18 PM:

Your issue has nothing to do these two things being separate. I'll work at trying to help with the other issues you bring up in a different post though as this one is unrelated.

# Norman Diamond on 21 Dec 2004 11:13 PM:

Well, I used to think that Microsoft had one single identifier into which they forced the combination of real keyboard layout and IME, resulting in the described problems. I used to think that those problems could be avoided by using two separate identifiers. Then in your posting I learned that there ARE two separate identifiers. So why didn't one identifier get one of these meanings and the other identifier get the other meaning? Why did both of them still get the combination of the same two meanings squashed together?

I understand that you will bring this up in another post, I thank you and I look forward to it. But meanwhile it still looks to me like my question was on-topic.

# Michael Kaplan on 21 Dec 2004 11:40 PM:

They are different meanings, and the meaning relate to important tracking issues within the USER subsystem and the architecture of keyboards.

Your interest in adding yet a third meaning is a separate issue. So the reason that that meaning was not captured here is that they were busy with the other stuff? :-)

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

go to newer or older post, or back to index or month or day