Getting all you can out of a keyboard layout, Part #10a

by Michael S. Kaplan, published on 2006/04/22 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/04/22/581107.aspx

We have come a long way in this series, haven't we? :-)

Look at all that we have covered:

Part 0: An enumeration with all of the Virtual Key values defined in winuser.h;
Part 1: Getting Scan codes, Virtual Keys, and a single character in one shift state;
Part 2: Making sure to not unload the keyboard if the user already had it loaded;
Part 3: Finding dead keys and ligatures;
Part 4: Moving SC/VK code determination that is static per layout out of the inner loop;
Part 5: Character detection for the easy shift states;
Part 6: Getting numeric keypad assignments;
Part 7: Getting the base and combining characters for all of the (previously detected dead keys);
Part 8: Detecting usage of the CAPS LOCK key (both SGCAPS and persistent shifting)
Parts 9a and 9b: Character detection of the harder shift states

What a long, strange trip it has been!

There is really only one item left on that list of things to do that I mentioned originally, and that is dealing with chained dead keys, another feature that MSKLC does not support.

It is a slightly more interesting feature to discuss given that (unlike the harder shift states) does not really seem to exist in any keyboard layout that ships with Windows. So unless someone has been working with the DDK to build such a keyboard, there is no way to readily test what is being done.

I'll try not to let the somewhat theoretical nature of this post dissuade me too much. In order to dissuade you, the reader, I will try to point out more immediately relevant items as they come up. :-)

Anyone who has both used MSKLC and also been following along with this series will notice that I have completely ignored the connection between the dead key assignments and the tables that exist, one per dead key. This connection (which is a definite part of the MSKLC UI) is completely ignored in the code that this series has put together.

This omission is intentional; no direct connection actually exists. The code here in this series has it right!

If you think about the consequences of that notion -- that you explicitly tell a key that it is a dead key and it will then look up its own individual dead key table on the next keystoke. Such an architecture goes a long way to explaining the reason why you must have a valid character at each stage of a chained dead key -- because once you jump to a new "dead key table" there is no state information about the old "dead key table". And since the dead key tables only allow a single UTF-16 code unit for the base character and one more for the combined character, there is simply nowhere to store the knowledge or the need for additional characters.

This may cause people to ask why MSKLC was designed the way it was -- it was rather intensely discussed at the time, and ther was just no intuitive way we could find to show this "disconnected" model where dead key tables were not more directly tied to their dead keys.

Given how uncommon the scenario of putting the same dead key into multiple keys is, it would be hard to really notice the problem (for example, to date no one has actually ever reported the issue!).

Clever people who are following this discussion can probably come up with a bug or two in MSKLC if they put some thought to it. These would basically be known limitations, but I won't give any more hints about that in case there are people who wanted to try to spot a bug. :-)

Could this all be changed? Well, obviously any architecture that blocks a particular feature admits to a single last-resort workaround: re-architecting how the code works. But that would cause all kinds of other problems like breaking backcompat with any keyboard already created, not to mention taking code that is very stable and putting it in play again. And that does not even get into needing to create multiple versions of every keyboard layout so that you could install on the old and the rearchitected code. Code that does not even belong to our team, so we'd have to convince another team to do this work.

The price is just a bit too high, sorry. :-(

Anyway, getting back to the chained dead keys.

All you would need to do is change the code in a few places:

In the DeadKey class, add the notion of saying whether the combined character is itself a dead key;
In the ProcessDeadKey procedure, allow it to have knowledge of multiple dead keys that would need to be applied when it scrolls through every other character;
In the ProcessDeadKey procedure, when the rc of the call to ToUnicodeEx is -1, check to make sure the dead key is not identical to either the one in process or any of the ones already processed; if it is neither of these, then use that new DeadKey class feature and call ProcessDeadKey again, recursively.

One more problem that I did not really take care of initially (I admit I was waiting to see if someone would ask about it -- no one is getting any jobs this time around!) is the poor use of the ArrayList class to store:

The collection of DeadKey objects in the keyboard layout;
The collection of Base characters in each DeadKey object.

I mean, the characteristics of both of these collections are:

a somewhat unbounded (or at least unknown) size;
the items within them that would act as keys must be unique;
keys are the size of one UTF-16 code unit, basically a ushort;
must be able to easily look up the members of the collection by the key;
must be able to dump out all of the members of the collection.

Currently, the code in both cases scrolls through the entire collection to look for duplicates, since the ArrayList class that is so well suited for the first and fifth of these items is so piss poor at the second, third, and fourth. Certainly there are data structures that are better suited here, right? :-)

The Hashtable class is obviously a better choice, I think -- using (ushort)char for the key values.

This last change is not required, obviously. But it would save our performace a bit. Not to mention would keep us from needing to shudder as I have for the last few revs of our code. :-)

Of course without a keyboard layout to test this new code on, the primary goal will be to make sure the existing scenarios do not regress. I'll post up the new code soon in Part 10b of the series.

and perhaps how to create these sorts of keyboards another day....

This post brought to you by "A" (U+0041, LATIN CAPITAL LETTER A)
A Unicode character that is in the very small family of those whose VK value is the same as it's code point, also used for the hexidecimal version of the number 10!

# dk273 on 4 Jun 2008 7:13 PM:

I am totally ignorant on C# (I'm the guy who likes IBM mainframe assembler and can just about do VB6 and C++ with MFC - not even .Net yet!) However, I am looking to see if anyone has been able successfully to convert Michael's good code here (well in the part 9b) into good old C / MFC code?

If so, would they be willing to post it?

Many thanks,

David

# Tom on 28 Oct 2008 3:56 PM:

Michael,

This is a great series -- it's really helped us. We're trying to make some onscreen keyboard keys that display the correct labels given the user's current language/keyboard settings. It works on our development machines. We're nervous about it working with the many keyboard setups out in the real world. My question has to do with scan codes. Our current design has a scan code in each key we're trying to draw and uses a table like the one you built in this series to map those scan codes on to vk_ codes and ultimately characters so we can display appropriate labels. The problem is that scan codes are described in various authoritative places as "hardware dependent." Does that mean that I cannot assume that the second key on the fourth row of the user's keyboard will generate a scan code of 0x1E? If I can't assume this, I don't see a way around maintaining our own table of the physical locations of the different virtual keys on all different keyboard/language combinations and refering to those whenever the settings change. I hope I'm missing something!

Thanks very much.

# Michael S. Kaplan on 29 Oct 2008 2:45 PM:

They are hardware dependent, and different bits of hardware in different parts of the world will often have slightly different locations for a few keys (e.g. no 102 key, differently shaped backspace, differently shaped enter key, moved around other OEM keys, the extra key on the Japanese keyboard, etc.). And that is before I get into things like laptop keyboard layouts.

This will always be true, because all of these keyboards are slightly different. But there are some major groupings that will cover all of them....

# Tom on 29 Oct 2008 5:45 PM:

Thanks -- thats what we were afraid of. I suppose we could load some known HKL and test the scan code <-> VK_ mapping against what we expect it to be and then create a translation table if necessary that maps the user's scan codes to "our" scan codes and vice versa. Probably not the ugliest thing anybody has ever had to do...

# Michael S. Kaplan on 29 Oct 2008 8:49 PM:

I would actually recommend getting a few of the different layouts so you will see the differences -- in most cases,the scan code are there but in different places. If you have the target keyboards you will have a better sense of what you want to "detect" here...

# Tom on 31 Oct 2008 7:26 AM:

Good call. I'm particularly interested in scan code changes that are just the result of an OEM's decision to map scan codes differently from the standard way and compensate in their driver software. I'm less concerned about key placement differences than I am about different scan code -> VK_ mappings for a given HKL. We can handle key placement differences as exceptions in our layout code.

Michel K on 25 Oct 2011 6:23 AM:

Hi, nice blog you have here!

But where is part 10b ??

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2012/05/03 The missing blog Part #10b?

2011/04/16 Chain Chain Chain, Chain of Dead Keys

2008/09/03 Need to know the VK for A, ay?

2008/08/16 Optimus: from science fiction to fiction to frustration to geek porn, in just 24 months

2008/07/27 Doing virtual simulationary keyboard stuff, only for real

2008/02/11 Who assigns the VK_OEM_* values in keyboards?

2006/09/10 Sometimes you *want* to interfere with the keyboard's state buffer

go to newer or older post, or back to index or month or day