by Michael S. Kaplan, published on 2011/04/23 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2011/04/23/10157359.aspx
Yesterday I wrote The Dead Keys Conundrum: An Encyclopedia Brown Mystery about how I figured out a way to solve several longstanding problems with keyboard layouts that had been considered by design limitations years before I was working on keyboards.
In the blog, in the style of Encyclopedia Brown the teenage detective, I provided all of the clues that led me the solution in the hopes that one the 1000 or so page views would represent a reader who (upon knowing there was a solution and knowing the clues to the solution were there) would provide the solution.
Just like how we all knew that Encyclopedia Brown's cases always had a solution and thus if we could figure out the clues then we too could solve the mystery.
No one rose to the bait and provided even a guess, so perhaps my goals were unrealistic. But in any case, I will now explain the solution....
From the third problem listed in The keyboard does not do what I tell it to!:
One more -- similar to the last one but with a happier ending
This has been bugging me for months. I am not sure when it started, but any time I try to put an apostrophe into a document, nothing happens. Then if I hit the key again I get two of them.
I have to hit the backspace key to get what I wanted. So it takes three keystrokes to get me what should have taken one. Is this some sort of virus? Help!
Ah, no virus this time. However, it turns out that this person had installed the "United States - International" keyboard layout. This layout has the apostrophe as a dead key for an acute accent. And as I have said before, dead keys are not intuitive. In his case either the apostrophe and a space or uninstalling the layout were both okay options. He chose the latter since he did not need the international layout....
The dead key table of the APOSTROPHE on the US International keyboard is:
so when you hit APOSTROPHE nothing happens but it waits to see if you type in one of the character in that BASE column; if you do then the character in the second column appears. If you do not, then you get the two characters that didn't go together at once -- next to each other. Thus for the United States - International keyboard, you get two apostrophes.
If you wanted to fix this problem of the two characters appearing, all you have to do is remember the principle: this only happens if the keystrokes are undefined. Thus all you have to do is add an entry here to convert it (in this case by perhaps adding U+0027 as both Base and Composite characters -- so typing two apostrophes in this case gives you one apostrophe.
Now perhaps in this case it is just a workaround, but in other common cases where the user might expect a combo to work, you can make it work right -- it's a fix.
Another example might be in order.
Let's take a keyboard that provides the GRAVE ACCENT as a dead key for A/E/I/O/U.
The beginning of the dead key table is obvious, but then perhaps you don't want GRAVE ACCENT + LETTER Q to show up a `Q, and so on.
You can then set up the table like the following:
to have the last character you typed be the only character that shows up (as if you were filtering out the illegal combination by removing the bogus diacritic.
Or you could go in a different direction, such as converting the bogus combinations into a space:
Or you could be really outrageous, and make it a backspace:
The only option you don't have is to throw away the keystroke itself (the backspace is a very aggressive approach since it removes the previous letter -- a nice user hostile interface. :-)
Anyway, you get the point -- if you don't like two characters popping up on bogus combinations, then all you have to do is define the behavior for the bogus combinations.
Kind of makes sense when you think about it -- and an entirely natural thought progression (the way to guard against "undefined behavior" is to define it).
Now in the end this piece of it is a parlor trick. I mean, even the ugly behavior isn't entirely strange if you ignore the case where you don't know it was a dead key. I mean, the two characters you typed are right there -- so maybe the old behavior isn't so bad!
So let's move into an area that is slightly more interesting, shall we?
Let's say you live in Finland and you are a huge advocate of the Finnish standard keyboard they created. The one that not only lets you type names from any EU language but also lets you type in other languages like Vietnamese (reportedly due to the immigrant population in country).
Now if you wanted to type a letter like
ậ
aka U+1ead (LATIN SMALL LETTER A WITH CIRCUMFLEX AND DOT BELOW)
Perhaps you would want to be able to type it like
CIRCUMFLEX + DOT BELOW + LATIN SMALL LETTER A
because you live in Finland and have been using dead keys for as long as you have been typing.
Well now that you have read Chain Chain Chain, Chain of Dead Keys, you know how to chain together the two dead keys to get to the one character.
But then we hit the problem -- you need there to be a CIRCUMFLEX + DOT BELOW character to stick in the table. And there isn't one in Unicode.
Perhaps you jumped into the idea of just using as PUA character. I mean, you convince yourself that you'll be adding the 150+ valid combinations and since you are defining all of those valid sequences no one will ever see the PUA characters standing in as pseudo characters for HORN + GRAVE and CIRCUMFLEX + TILDE and so on.
After defining the many dead keys, even solving the problem of Getting intermediate forms problem of canonically reordered sequences with above and below diacritics entered in the wrong order by automatically mapping them both to the right character, you then remember that any time you type an undefined sequence, the UTF-16 code points that you defined in the table get inserted.
And these pseudo characters you added as PRIVATE USE AREA characters might get inserted too.
I am definitely not a fan of putting random PUA into the world -- especially to define things that the user did not define themselves.
But didn't we just solve the problem of dealing with how combination not defined in the dead key table are represented? Yes, we did!
For the cost of adding every single character in every shift state of the keyboard to the dead key table of every single dead key, you can create a humongous keyboard layout that guards against PUA leakage completely!
In fact that only time the user has a hint something is going on is if they are watching WM_DEADCHAR message. But since you define the name of the key (remember how I told you to always define it!), you can make sure that a really inquisitive mind trying to understand the WM_DEADCHAR results will get their explanation from GetKeyNameText.
Now of course this still doesn't resolve the Vietnamese/Finnish problem completely, given the Harder intermediate forms of characters that are still going to be out there, that require more than one code point since no entirely precomposed character exists. but thankfully these cases are very rare (and not supported by the bulk of the various Vietnamese code pages, either).
In any case, in the less extreme cases you can now use chained dead keys when you need to in order to get the result you want....
Thanks Encyclopedia Brown, for solving another mystery!
Van on 23 Apr 2011 4:37 PM:
You say the only option you don't have is to throw away the keystroke itself, but would it not work to define all of your garbage sequences to NULL? I may be wrong - it wouldn't be the first time - but defining all your composites as U+0000 should leave no mark in the text stream, right?
Michael S. Kaplan on 23 Apr 2011 9:06 PM:
We don't really document whether or not that might ever insert a NULL in the stream, which would be very bad. So I'm not sure whether it would work or not.
Though if it does it does add an even better answer for many cases....
Michael S. Kaplan on 24 Apr 2011 7:16 PM:
Hey Van --
I checked with some people and it looks like it might be okay, perhaps even VERY okay. I will write up my findings in a day or three.....
Van on 25 Apr 2011 2:16 PM:
Thanks for the update, Michael. I had absolutely no doubt whatsoever that "we don't really document the behaviour" was going to be the end point for you; You've given us too much experience with delving into the bowels of an issue to believe you would - or could - do any less.
referenced by