When complex scripts are not too complex

by Michael S. Kaplan, published on 2005/05/31 03:46 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/05/31/423343.aspx


Raymond Chen did a post yesterday entitled You can't simulate keyboard input with PostMessage.

He did touch on the complicated language issues, saying:

First of all, keyboard input is a more complicated matter than those who imprinted on the English keyboard realize. Languages with accent marks have dead keys, Far East languages have a variety of Input Method Editors, and I have no idea how complex script languages handle input. There's more to typing a character than just pressing a key.

This is a subject I have covered before a little bit. The fact is that these APIs in the USER subsystem (from ToUnicode to SendInput and so on) all keep a certain amount of state. not just the state of shift keys like Raymond mentioned, but state related to when you have typed a dead key (so that it knows when you type the next key whether the dead key table contains the combination you have just typed). There is actually even an MSDN topic that helps describe some of the complex process entitled About Keyboard Input.

And once you get into IMEs, the complex rules related to state that the IME must keep really boggle the imagination. It is probably easier to use the Input Method Manager (IMM) APIs to try to get input through an IME than to try and fake keystrokes....

Luckily, the one issue that is not really all that complex is complex scripts. Because the actual issuses that make it complex (bidirectional text, contextual shaping, line breaking, and illegal sequence checking) are all related to what happens to the text after you have typed the keystrokes -- the font linking and the rendering.

Well, that and knowing what to type, of course! Being able to have words even look like they belong together in languages like Thai and Hindi and Tamil really requires either knowing the language or memorizing keystrokes. Which is the same as when dealing with IMEs (in my book I had a chapter that talked about keystroke combinations you could use to test IMEs, it was a lot of fun and I still get positive feedback, enough that I may start posting examples of stuff like that soon.

(Let me know if that sort of thing seems like it might be interesting -- I never know what people will find engaging here!)

 

This post brought to you by "" (U+0b82, a.k.a. TAMIL SIGN ANUSVARA)
(which is annotated in Unicode as "not used in Tamil" though several people who deal with Sanskrit in Tamil would beg to differ!)


# CN on 31 May 2005 11:47 AM:

This was interesting, it made me get around and test what happens if you press a dead-key + Alt-Tab + normal key in another window, or go back to your original window. Of course, this makes perfect sense if the state is kept in the same way as Shift. (BTW, that state, or worse, the Control state, getting out of sync isn't fun.)

# Ambarish Sridharanarayanan on 31 May 2005 5:02 PM:

Michkap wrote:
> (Let me know if that sort of thing seems like it might be interesting -- I never know what people will find engaging here!)

Oh, most certainly! Since I use Windows regularly to generate Devanagari and Tamil text, I'd love any dicsussion on this front. Topics I'd love to hear about include Indic languages IMEs shipping in Windows. The Tamil/Devanagari keyboard layouts Windows ships with are close to useless for someone using US keyboards, and http://www.bhashaindia.com/Downloads/downloads.aspx doesn't quite have enough polish, IMHO.

# Suzanne McCarthy on 4 Jun 2005 7:04 PM:

I too find that I can't generate Tamil text from a Microsoft keyboard. This may not be surprising since I don't speak Tamil but I work with children who do.

I prefer to use Kuralsoft which offers visual order input or even better the online syllabic editor. Here is my school website and Tamil links for both.

http://livingstone.vsb.bc.ca/multiliteracy/languagelinks/tamillinks.htm

# Suzanne McCarthy on 6 Jun 2005 2:38 AM:

Hi, Just wanted to let you know that I have posted a couple of entries at

http://abecedaria.blogspot.com/

in reponse to yours since keyboard input for complex scripts and CKJ IME's are something I have been trying out with children for a couple of years.

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day