What's wrong with this code?

by Michael S. Kaplan, published on 2007/02/12 03:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/02/12/1658006.aspx

Ok, another one of those 'what's wrong with this code?' posts, for your coding pleasure....

I won't even say what the code is supposed to do, but it should be fairly obvious what is being attempted (unsuccessfully) here, on several levels:

The one hint I will give is that the actual code one would need to accomplish what this code is trying to do is pretty complex, if you want to cover all supported scenarios. So I am not asking anyone to try to write the full. correct code here. I am mainly looking for people to point out the bad practices and flaws as if one was reviewing this code either on any version of Windows or on any non-Win9x based version (your choice!).

I might do the full correct version of the code in a future post, if there is interest and if I can get over my fear of supporting a budding industry of keyloggers. :-)

This post brought to you by é (U+00e9, a.k.a. LATIN SMALL LETTER E WITH ACUTE)

There are so many problems, I don't even know which one do you want to single out :-)

Here is round two:

1. The keyboard state is a "per task" thing.

So the code calls GetKeyboardState to retrieve the keyboard state of the current thread, then uses it using the GetKeyboardLayout of another thread.

2. ToUnicodeEx can return more than 2 characters, so the buffer is too small.

3. The buffer filled by ToUnicodeEx is might be non-null terminated. Doc: "However, this buffer may be returned without being null-terminated even though the variable name suggests that it is null-terminated."

4. The return -1 from ToUnicodeEx is not treated. From the doc "If possible, even with Unicode keyboard layouts, the function has written a spacing version of the dead-key character to the buffer specified by pwszBuff."

The worst problem: if you call this in a subclass procedure when processing WM_KEYDOWN (which is only reasonable), with an international key which is a dead char and is obtained with AltGr (Ctrl+Alt), such as the Swedish '~' key, this will screw up the dead char state.

For this not to happen, you need to call ToUnicode() twice (!!).

Yes, I know, ToUnicode() is not suppose to know about the deadchar state - well, not only it knows it, but it changes it. I've been just debugging my keyboard handling code (I sell a vi/vim emulator for Visual Studio, SQL Server, Word and Outlook) because a poor fellow Swedish programmer couldn't write any more destructors, and tracked it down to this, on XP SP2.

Ah, how I hate the windows input model :) I wrote a blog post about this a while ago, titled "What's broken in the WM_KEYDOWN/WM_CHAR input model", which is #2 on Google both for WM_CHAR and WM_KEYDOWN (don't ask me why):

http://blog.ngedit.com/2005/06/13/whats-broken-in-the-wm_keydownwm_char-input-model/

Exercise for the reader: write code to determine, when receiving a WM_KEYDOWN or WM_SYSKEYDOWN, whether this will be accompanied by a WM_CHAR afterwards or not. It seems simple, doesn't it? This is necessary if you want to provide reliable user-configurable remapping of all keypresses that doesn't depend on timers.

Believe me, this is the start of the road to hell. On US keyboards, it's only "very complex". If you add intl keyboards into the mix, you will suffer without limits, and end up discovering the dirty secrets of ToUnicode() and much more...

I'm preparing another article called "The Ultimate WM_CHAR/WM_KEYDOWN Chart from Hell", I'm sure it will be very popular. And maybe I can get that spot #1 before MSDN on both keywords.

Michael, congratulations on a great blog and a lot of useful info. Great that someone at Microsoft cares about the stuff, but I must say it certainly seems you are one of a very few.