Can you tell me how Microsoft keyboard layouts define a ligature and by the way can it be changed?

by Michael S. Kaplan, published on 2015/08/07 13:57 +00:00, original URI:

Friend and colleague from my Unicode days Doug Ewell wrote to me via email:


Below is the latest salvo in an ongoing battle on the Unicode public list.

Does any version of Windows, from 3.1 to 10, support keyboard drivers with ligatures of more than four UTF-16 code points?

If so, where is this documented?


-- Doug Ewell | | Thornton, CO 🇺🇸

Now to be perfectly honest, I would have told almost anyone else to just  get bent, assuming I bothered to respond at all and this is what the lawyers would object since the question is so very asked and answered . But I have known Doug for a long time, and I know he is one of the good guys, fighting the good fight.

So I decided to answer it in a blog post, the very next day. Here goes..

The Happy Hacking Keyboard makes for a great metaphor...

Short answer NO. Rude answer HELL NO. Explanations here, which references here. More insight into why follows....

A long time ago, a developer colleague I know and respect who is incidentally still at Microsoft wrote a header file that defined what a ligature was. He defined it as an sequence of 2-4 arbitrary UTF-16 code points. However, he realized that it was not strictly speaking how a ligature would be defined by a linguist. So he added a // CONSIDER to the code, suggesting the NLS team might need a function that could provide a more appropriate definition.

As it turns out, there was no such function or plans to add one, because it would have been a terrible idea, since neither languages nor keyboards really work that way. Thus, we are spared an even more arbitrary limitation than the one than Doug is talking about here.

But this suggests that perhaps the ceiling could be raised in a future version of Windows.

To this I have a few suggestions as to why this will never ever happen in basic keyboard layouts:

• there are no more future versions of Windows;

• even if there were, such a keyboard layout would not be compatible with any prior version of Windows;

• allowing such a new thing would require significant changes to user32.dll and the console, and likely lead to many regressions throughout Windows, Office, and other companies that depend on the current behavior;

• Microsoft may have a tendency to be kinda dumb sometimes, but it isn't stupid.

I am therefore reasonably certain that the limitation is not ever going to change, and people should just give up on it happening in regular keyboard layouts.

People who want input methods capable of handling more than four UTF-16 code points really need to look into IMEs (Input Method Editors) which are all now run through TSF (the Text Services Framework), a completely different system of input that allows such things, admittedly at the price of a lot of complexity.

I will now wait a few hours and then send Doug email in case he isn't following the SiaO2 Blog these days.... ;-)

no comments

go to newer or older post, or back to index or month or day