Type my name, type my name\ You actin' kinda geeky, that IME is freaky\ Would your input change?

by Michael S. Kaplan, published on 2008/09/05 10:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/09/05/8925966.aspx

I admit I was tempted to really do up the Destiny's Child lyrics completely just like the title, but I decided if I ain't going to write the IME below that it would be kind of obnoxious to put that much time into a Yankovikian tribute like that....

Less than half a day ago, conradoplg asked in the Suggestion Box:

In one of your blogs you explain a method to type random Unicode points with an unicode IME.

But this is driving me nuts for some time now. Is there a way to enter random Unicode points by their *names*? I can't possibly know the hex code of all the characters!

What I'm talking about is some kind of IME where you would type a character name and it would show a list of candidates where you could choose.

Is there such a thing? Or more importantly, could one (who doesn't work for Microsoft :) ) implement it? Is it possible to write an IME?


That just inspired me, so I pushed a bit of the rotation foward for it. :-)

Now there is no built-in way to do this, no.

But the Unicode Character Database, and more specifically the ~1.1mb UnicodeData.txt, is available.

Writing an IME can be really difficult. But there is that simple Table Driven Text Service mechanism I document at length in this series.

I don't know if its underlying code ever been tested with strings as long as some of the official character names in the UCD, and who knows what the performance would be like, but perhaps it would be worth generating the input method based on the names to see if it would work.

Worth a try at least, maybe. :-)

Now I don't know how useful it would be in practice given what so many of the names actually look like, but as long as we are keeping this all theoretical....


This blog brought to you by(U+fbf9, aka ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA ISOLATED FORM, what I think off the top of my head is the longest name in the UCD)

Michael Madsen on 5 Sep 2008 10:51 AM:

Having just downloaded the file and checked, you're right. :)

Top 5:

John Cowan on 5 Sep 2008 12:59 PM:

An alternative possibility is to use the W3C's consolidated list of XML entity names (drawn from the HTML, MathML, and various ISO sets).  This is still in working-draft state, but the sources are pretty stable, and most of the names are short and fairly mnemonic.  It won't let you type text in arbitrary scripts, but it does handle a good many symbols.

See http://www.w3.org/TR/xml-entity-names for the writeup; the current actual list is at http://www.w3.org/2003/entities/2007/w3centities-f.ent .

conradoplg on 5 Sep 2008 4:21 PM:

Thanks for the answer!

I'll try to make something with the Table Driven Text Service, probably with just a few characters that I use often.

go to newer or older post, or back to index or month or day