The Sally Kimball Addition To The Dead Keys Conundrum: An Encyclopedia Brown Mystery

by Michael S. Kaplan, published on 2011/04/28 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2011/04/28/10159111.aspx


The Sally Kimball Addition To The Dead Keys Conundrum: An Encyclopedia Brown Mystery

After I wrote up The Dead Keys Conundrum: An Encyclopedia Brown Mystery and Solution: The Dead Keys Conundrum: An Encyclopedia Brown Mystery (in response to issues first raised in Chain Chain Chain, Chain of Dead Keys), then in most cases in the world of Encyclopedia Brown the mystery would be solved and it would be on to the next mystery, in the next chapter (they usually seemed to come bundled in tens, if memory serves).

However, in at least one case, Sally Kimball, would actually outsmart Encyclopedia Brown occasionally, and be able to shed a bit of additional light when he was stumped by something.

He never minded this, since they were friends. She had already beaten up the bully who tried to beat Encyclopedia Brown up (Bugs Meaney), so if she occasionally proved herself to also be smart then he didn't mind.

In this particular mystery, The Dead Keys Conundrum, Van Anderson has (perhaps unintentionally) filled the Sally Kimballrole in the mystery, in a comment to the "solution" blog:

You say the only option you don't have is to throw away the keystroke itself, but would it not work to define all of your garbage sequences to NULL? I may be wrong - it wouldn't be the first time - but defining all your composites as U+0000 should leave no mark in the text stream, right?

I was initially skeptical -- not because it was a bad idea (since after all it wasn't), but due to fear of the undocumented and the fact that perhaps it would be a bad idea to depend on the behavior if it were to change (as undocumented behavior is occasionally wont to do).

But that answer feels a little unsatisfying. So I decided to dig a little.

I was doing a build of a depot and I was only 6% through refreshing my publics, ao any developer from Microsoft could tell you had a few moments.

First step was to create a keyboard with some NULL (U+0000) characters defined -- both as regular keys and as dead key results.

When I did this and tested out the keyboard, I found that in both cases:

Sound familiar, doesn't it? :-)

I'll give you a hint -- take a look at Short-sighted text processing #1: Uniscribe filters nothing.

This behavior of inserting nothing and beeping is how the behavior incorrectly attributed to Uniscribe is accomplished -- the EDIT control code replaces the text in the stream with a U+0000, which the underlying system refuses to insert, and beeps.

To be strictly accurate, the valid text is always null terminated, and thus in this case what is seen is a string of zero length which is ordinarily not expected. But the behavior is the same so that detail is not strictly necessary here. :-)

Changing this behavior would be a significant potential backcompat problem, and although the literal cause of the behavior (inserting a NULL into the input text stack) is not specifically documented, there is at a minimum some behavior that is using this undocumented underlying implementation detail to support a documented feature.

Supporting the documented behavior while changing the details of the undocumented behavior is problematic and likely not feasible. Plus you could unintentionallky break someone else's assumptions.

All of my prior, other complaints about the behavior I described in Short-sighted text processing #1: Uniscribe filters nothing (e.g. that text you display but didn't type that does show up) would not apply here -- we are talking only of keyboards.

It is true that MSKLC has problems with loading the keyboard I created in MSKLC to test the behavior above:

but this makes sense -- I had no idea about the behavior and so the code was never expecting this thing that I had no reason to expect. Had I known it then I probably would have not only recalled it when the mystery blogs unfolded but I would likely have written the behavior up years ago.

I never knew about it.

Now in the case of invalid dead keys, it is (admittedly) slightly unwieldy (as I mentioned in Solution: The Dead Keys Conundrum: An Encyclopedia Brown Mystery, you would have to define every type-able character on the keyboard in each dead key table), but the behavior of beeping and typing nothing is infinitely preferable to inserting the wrong character (or even worse two wrong characters) into text, when the user has no way to connect what they typed with what was inserted.

If anyone ever did their own input stack and they didn't handle the NULL the same way, then it is possible that they will break many keyboards beyond the fancy chained dead key one -- like the four Thai keyboards, for example. I'd be curious what WPF controls hosted in WinForms using Uniscribe do, for example. I assume they aren't screwing up the text or inserting random NULLs or someone would have reported that bug by now. :-)

Perhaps some future version of MSKLC could perhaps fix all of these problems/limitations/bugs):

Mentally I have halfway worked through how I might approach all of the above; if I thought there was a chance any of it would happen I'd write it up for either me to do or someone else to do. But that seems pretty unlikely (chained dead keys are just pretty esoteric, so even if there were MSKLC plans I'd imagine these to not be seriously considered).

Perhaps they could put MSKLC on CodePlex (something I had several people suggest to me last week) -- I'd likely contribute, in that case. I doubt they'd mind. :-)

In any case, I hope Van is not too offended by my analogy, since I really did appreciate his "Sally Kimball" role here that pushed me to give the better answer.

Which I think this often can be!


Van on 28 Apr 2011 2:03 PM:

Oooh! Uniscribe even BEEPS at you! Never in my wildest dreams would I have dared to imagine...

As for the Sally Kellerman reference, I am not offended; I have never read an Encyclopedia Brown mystery, so I really have no idea whether it is a reference to be offended by.

Having checked on Wikipedia, it appears that Sally Kellerman was the actress who played "Hot Lips" Houlihan in the movie M*A*S*H. Sally /Kimball/ is the compatriot of Encyclopedia Brown.

Michael S. Kaplan on 28 Apr 2011 3:19 PM:

Ah yes, Sally Kimball! I guess I should not do my pop culture refs from memory....

Marc Durdin on 28 Apr 2011 7:21 PM:

Personally, I would expect this to go badly wrong in many applications which implement their own edit controls which may not do the same level of validity checking on WM_CHAR messages as the EDIT control does.

Michael S. Kaplan on 28 Apr 2011 11:32 PM:

Ah, but they would be bad now, because such messages exist today!


Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2011/11/09 The evolving Story of Locale Support, part 6: Behind the Cherokee Phonetic layout in Windows 8

2011/08/15 If you change the behavior of typing sequences you should never type, is it a bug?

go to newer or older post, or back to index or month or day