by Michael S. Kaplan, published on 2011/04/28 16:26 +02:00, original URI: http://blogs.msdn.com/b/michkap/archive/2011/04/28/10159111.aspx
The Sally Kellerman Addition To The Dead Keys Conundrum: An Encyclopedia Brown Mystery
After I wrote up The Dead Keys Conundrum: An Encyclopedia Brown Mystery and Solution: The Dead Keys Conundrum: An Encyclopedia Brown Mystery (in response to issues first raised in Chain Chain Chain, Chain of Dead Keys), then in most cases in the world of Encyclopedia Brown the mystery would be solved and it would be on to the next mystery, in the next chapter (they usually seemed to come bundled in tens, if memory serves).
However, in at least one case, Sally Kellerman, would actually outsmart Encyclopedia Brown occasionally, and be able to shed a bit of additional light when he was stumped by something.
He never minded this, since they were friends. She had already beaten up the bully who tried to beat Encyclopedia Brown up (Bugs Meaney), so if she occasionally proved herself to also be smart then he didn't mind.
In this particular mystery, The Dead Keys Conundrum, Van Anderson has (perhaps unintentionally) filled the Sally Kellerman role in the mystery, in a comment to the "solution" blog:
You say the only option you don't have is to throw away the keystroke itself, but would it not work to define all of your garbage sequences to NULL? I may be wrong - it wouldn't be the first time - but defining all your composites as U+0000 should leave no mark in the text stream, right?
I was initially skeptical -- not because it was a bad idea (since after all it wasn't), but due to fear of the undocumented and the fact that perhaps it would be a bad idea to depend on the behavior if it were to change (as undocumented behavior is occasionally wont to do).
But that answer feels a little unsatisfying. So I decided to dig a little.
I was doing a build of a depot and I was only 6% through refreshing my publics, ao any developer from Microsoft could tell you had a few moments.
First step was to create a keyboard with some NULL (U+0000) characters defined -- both as regular keys and as dead key results.
When I did this and tested out the keyboard, I found that in both cases:
Sound familiar, doesn't it? :-)
I'll give you a hint -- take a look at Short-sighted text processing #1: Uniscribe filters nothing.
This behavior of inserting nothing and beeping is how the behavior incorrectly attributed to Uniscribe is accomplished -- the EDIT control code replaces the text in the stream with a U+0000, which the underlying system refuses to insert, and beeps.
To be strictly accurate, the valid text is always null terminated, and thus in this case what is seen is a string of zero length which is ordinarily not expected. But the behavior is the same so that detail is not strictly necessary here. :-)
Changing this behavior would be a significant potential backcompat problem, and although the literal cause of the behavior (inserting a NULL into the input text stack) is not specifically documented, there is at a minimum some behavior that is using this undocumented underlying implementation detail to support a documented feature.
Supporting the documented behavior while changing the details of the undocumented behavior is problematic and likely not feasible. Plus you could unintentionallky break someone else's assumptions.
All of my prior, other complaints about the behavior I described in Short-sighted text processing #1: Uniscribe filters nothing (e.g. that text you display but didn't type that does show up) would not apply here -- we are talking only of keyboards.
It is true that MSKLC has problems with loading the keyboard I created in MSKLC to test the behavior above:
but this makes sense -- I had no idea about the behavior and so the code was never expecting this thing that I had no reason to expect. Had I known it then I probably would have not only recalled it when the mystery blogs unfolded but I would likely have written the behavior up years ago.
I never knew about it.
Now in the case of invalid dead keys, it is (admittedly) slightly unwieldy (as I mentioned in Solution: The Dead Keys Conundrum: An Encyclopedia Brown Mystery, you would have to define every type-able character on the keyboard in each dead key table), but the behavior of beeping and typing nothing is infinitely preferable to inserting the wrong character (or even worse two wrong characters) into text, when the user has no way to connect what they typed with what was inserted.
If anyone ever did their own input stack and they didn't handle the NULL the same way, then it is possible that they will break many keyboards beyond the fancy chained dead key one -- like the four Thai keyboards, for example. I'd be curious what WPF controls hosted in WinForms using Uniscribe do, for example. I assume they aren't screwing up the text or inserting random NULLs or someone would have reported that bug by now. :-)
Perhaps some future version of MSKLC could perhaps fix all of these problems/limitations/bugs):
Mentally I have halfway worked through how I might approach all of the above; if I thought there was a chance any of it would happen I'd write it up for either me to do or someone else to do. But that seems pretty unlikely (chained dead keys are just pretty esoteric, so even if there were MSKLC plans I'd imagine these to not be seriously considered).
Perhaps they could put MSKLC on CodePlex (something I had several people suggest to me last week) -- I'd likely contribute, in that case. I doubt they'd mind. :-)
In any case, I hope Van is not too offended by my analogy, since I really did appreciate his "Sally Kellerman" role here that pushed me to give the better answer.
Which I think this often can be!
referenced by