It's not just returning 1; it's returning -1

by Michael S. Kaplan, published on 2007/03/05 11:25 -05:00, original URI:

A not-so-uncommon question people ask is why ToUnicode and ToUnicodeEx return -1 when the string that they return is a dead key.

I mean, especially since the return value in all other cases gives you an actual size of the buffer that is returned. Doesn't it seem kind of weird or unsafe to not do that correctly?

Well, not so much. It is okay.

The problem is the same one that causes WM_CHAR and WM_DEADCHAR to be two different messages -- the fact that they mean two very different things.

When a dead key has been pressed, no actual input is really ready for entry anywhere -- on Windows you see nothing at all when you press the dead key, so you really want whatever might tell you about the input to differentiate it from the times that there is something that should be showing up.

I always thought of the negative value as meaning that there is ONE character, but it is NOT visible. And in the current implementation there is only one WCHAR returned. Though the text is a bit optimistic in the documentation:

The specified virtual key is a dead-key character (accent or diacritic). This value is returned regardless of the keyboard layout, even if several characters have been typed and are stored in the keyboard state. If possible, even with Unicode keyboard layouts, the function has written a spacing version of the dead-key character to the buffer specified by pwszBuff. For example, the function writes the character SPACING ACUTE (0x00B4), rather than the character NON_SPACING ACUTE (0x0301).

Notice that claim in the last sentence? While completely true of most Microsoft-provided keyboards, it is not something that is enforced; it is not even currently a validation warning in MSKLC (though now that I think about it maybe it should be in a future version, if for no other reason than the documentation is making such a bold claim about the matter).

Now imagine one day there was some kind of attempt to address that whole limitation on chained dead keys I talked about previously. Maybe you would want ToUnicode and ToUnicodeEx to return everything in the "state buffer" that was pending, in which case it could sometimes need to return -2 or -3 or whatever.

On the other hand that would probably break everyone using the function, right?

Well, unless some special wFlag value was passed to indicate the behavior change was wanted of course....

For now, we'll assume that no one in a position to accomplish the impossible is actually going to attempt to do so, and just live with that -1. And not try to come up with a changed model definition of why it work the way it does, We'll just shake our heads at the weird wFlag description that is worded as if there was a plan to extend the function some day and move along.

Now if we could just get a "don't muck with the state" flag so that calling the function could be optionally used to probe the contents without actually affecting the buffer. :-)


This post brought to you by  ̛ (U+031b, a.k.a. COMBINING HORN)

no comments

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2011/04/16 Chain Chain Chain, Chain of Dead Keys

go to newer or older post, or back to index or month or day