Will the real Unicode character message please stand up?

by Michael S. Kaplan, published on 2007/03/25 03:05 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/03/25/1945659.aspx

Over in the microsoft.public.win32.programmer.international newsgroup, Norman Diamond asks:

*  The WM_UNICHAR message is equivalent to WM_CHAR, but it uses Unicode
*  Transformation Format (UTF)-32, whereas WM_CHAR uses UTF-16. It is
*  designed to send or post Unicode characters to ANSI windows

So both WM_UNICHAR and WM_CHAR use Unicode (though different varieties), but only one of these posts Unicode characters to ANSI windows?  The other one posts a non-Unicode Unicode, or what?

*  If wParam is not UNICODE_NOCHAR, return FALSE. The Unicode DefWindowProc
*  posts a WM_CHAR message with the same parameters and the ANSI
*  DefWindowProc function posts either one or two WM_CHAR messages with the
*  corresponding ANSI character(s).

So maybe the non-Unicode Unicode is one or two WM_CHAR messages with ANSI character(s) instead of UTF-16?

*  The WM_CHAR message uses Unicode Transformation Format (UTF)-16.

So the WM_CHAR message doesn't use ANSI.  Or applications aren't supposed to expect ANSI from WM_CHAR, they're only supposed to get surprised if WM_UNICHAR was handled by DefWindowProc and resulted in ANSI?

I'll admit the documentation could be clearer here, but the behavior Norman was most confused about is not too hard to unravel:

However, the text here around the wParam has some real problems too, though in defense of the doc writers for this case, the BEHAVIOR is quite confusing here. The full text is:


Specifies the character code of the key.

If wParam is UNICODE_NOCHAR and the application processes this message, then return TRUE. The DefWindowProc function will return FALSE (the default).

If wParam is not UNICODE_NOCHAR, return FALSE. The Unicode DefWindowProc posts a WM_CHAR message with the same parameters and the ANSI DefWindowProc function posts either one or two WM_CHAR messages with the corresponding ANSI character(s).

 And the return value info is:

An application should return zero if it processes this message.

Now all of this resembles English but it is a bit more complicated. :-)

First of all, the wParam info talks about returning FALSE or TRUE when the return value info keeps you focused on the fact that the return is going to be 0 or not.

Second of all, if it is a supplementary character, a Unicode WNDPROC will get two WM_CHAR message, not one.

But what is the point of the rest of the text?

What is is really trying to say is that the only time it is okay to return 0 (FALSE) is when what was passed was not a character (wParam is UNICODE_NOCHAR). Otherwise, you should always return TRUE for this function.

But how easy is it to glean that from the text given? Not very....

1 - To be perfectly honest, it is unclear to me how often this will be true for non-CJK supplementary characters, since the text that goes through user/userk keyboard layout dlls is using two UTF-16 code points in the form of a "keyboard ligature" and whether it is smart enough to make those two UTF-16s into one UTF-32 is an unknown....



This post brought to you by 𐠁 (U+10801, a.k.a. CYPRIOT SYLLABLE E)

no comments

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2012/05/21 Whither WM_UNICHAR in Windows 7 (and 8!)

go to newer or older post, or back to index or month or day