by Michael S. Kaplan, published on 2005/01/19 02:07 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/01/19/355870.aspx
Dead keys? As my friend Cathy likes to say -- they're not dead, and they aren't keys.
A little over a month ago, I pointed out that Dead keys are not intuitive. And nothing has changed since then -- they are still not very intuitive. And they're not dead. and they're not keys.
Since then, a man named Naushad asked me a very good questions about dead keys, and I thought I would try to answer it. I'll do it here so that if anyone else has the same question, he or she will get help, too. :-)
Naushad asked....
I am working on a unicode keyboard. using MSKLC. On top of that I'm using a hook application that modifies the behavior of only a few keys for post processing.
What I am experiencing is when I call ToUnicode() in the hook proc (WH_GETMESSAGE) the dead keys don't work anymore. ToUnicode() works perfect as long as I am not pressing any dead key combinations.
My questions are
1. How to extract unicode chars from a WndProc? I am doing it in WM_KEYDOWN using GetKeyboardState(), MapVirtualKeyW() and ToUnicode(). It breaks the Dead Keys
2. Is there a special way ToUnicode() is to be used with Dead keys?
This is actually the same problem I faced when I was doing the development work on MSKLC!
You see, MSKLC has a feature that has the job of loading an existing keyboard layout on the system. It requires extensive use of the ToUnicodeEx API (which works identically to the ToUnicode API except it lets you pass in the HKL to use), and I found that ToUnicodeEx did not seem to be handling dead keys very well.
The problem is that the APIs work with the state of the keyboard buffer. So when the user types a dead key and you use the ToUnicode API, the two are combined and thus the next keystroke the user types will no longer work with the given dead key at all. It is working on its own. In MSKLC this meant thousands of calls to ToUnicodeEx that would step all over each other any time there was a deadkey.
There are two ways to work around this:
1) You can keep calling ToUnicode with the same info until it is cleared out and then call it one more time to put the state back where it was if you had never typed anything, or
2) You can load all of the keyboard info ahead of time and then when they type information you can look up in your own info cache what the keystrokes mean, without having to call APIs later.
(Note that solution #2 requires a whole bunch of solution #1 be done ahead of time -- in the extreme case that MSKLC was supporting every dead key had to be tested when combined with every other possible key stroke, to load all possible combinations. But since I was trying to load the whole keyboard layout in order to display it, this was needed anyway -- YMMV.)
One interesting thing about this vis-a-vis MSKLC is that on a keyboard with a half dozen dead keys I was easily making thousands of calls to ToUnicodeEx. But even on slower machines, this code runs pretty fast!
Not a great (well, not an easy) answer, but in the end there is no version of ToUnicode or ToUnicodeEx that does not have this kind of "context sensitive" issue. You have to consider every call to it in the context of other calls (and of key stroke presses) that have (a) happened before and that (b) will happen after.
Think of it as yet another reason that dead keys are troublesome -- they are so non-intuitive that it pollutes surrounding technology and makes various keyboarding APIs behave non-intuitively!
(Naushad, if you are reading this, then the answer to your second question is that you can't, and that the fact that you think you can is a temporary situation that you should not try to pursue since you are creating a legacy of barely searchable text that can have a bad impact on collation and rendering. There is no "magic character" to use here to get what you are looking for. You will really want, therefore, to abandon the scheme and move into a healthier direction that does not involve dead keys. Truly!)
This post brought to you by "Ą" (U+0104, a.k.a. LATIN CAPITAL LETTER A WITH OGONEK)
daniel on 15 Aug 2008 6:05 PM:
Hi,
Could you post a sample of [pseudo]code that implements solution #1 (keep calling tounicode). I don't quite understand what is changed when you call tounicode on a deadkey and why it gets screwed up.
Thanks for the informative post! :)
Michael S. Kaplan on 15 Aug 2008 8:04 PM:
Actually, I already have, in blogs like this one and other ones like the series it points to...
daniel on 20 Aug 2008 1:22 AM:
Thanks for the linky! I'll take a look :)
referenced by