Typing in random Unicode code points redux

by Michael S. Kaplan, published on 2005/07/18 17:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/07/18/439946.aspx


It was about two months ago when I pointed out a method for typing in random Unicode code points using the Unicode IME.

Well, Andrew over at http://www.fileformat.info (the cool provider of my Unicode character links!) has been getting feedback on this issue for a long time, such that his How to enter Unicode characters into Microsoft Windows page is by his own report the paage that recieves the most feedback.

Anyway, he wrote a small program that will make the entry easier -- the UnicodeInput Utility.

Handier in a lot of ways than that IME and with an easy mechanism for launch, it is a good solution for a generic answer to the quewtion of how to enter potentially random code points. Check it out! :-)


# ivan on 19 Jul 2005 9:44 AM:

> (the cool provider of my Unicode character links!)
and it runs such an uncool os ;-)

# Michael S. Kaplan on 19 Jul 2005 10:04 AM:

I am an open-minded kind of guy. It is nice to know that people find the things that I say to be relevant whether they are running Windows or not....

Plus the fact that what the ISP uses to host your site has little to do with what someone runs on their own machines. :-)

# Ben Bryant on 19 Jul 2005 10:25 AM:

Just press Alt-GreyPlus, type your hex code point and it sends your character to whatever was active, very cool.
Michael, I'd like to point you to my free editor which allows you to type in any code point and convert it to the relevant character ("Convert Char Ref"), it also lets you right click on a character to see the code point and Unicode range ("Char Unicode"), and if you select an ANSI encoding via Tools Encoding menu (it sets an XML Declaration, but your document doesn't have to be XML) it will show you the corresponding ANSI hex byte encoding too. Unicode can be copied and pasted into and out of this editor all the way back to Win 95. It is not full featured, but anyway it is at firstobject.com/dn_editor.htm any comments welcome.

# Andrew on 19 Jul 2005 10:54 PM:

Of course, the first feedback I got on UnicodeInput was how to enter characters greater than U+FFFF.

All I'm doing is calling SendInput with wScan set to the codepoint. How should I do it for codepoints greater than 0xFFFF?

# Andrew on 19 Jul 2005 11:02 PM:

> and it runs such an uncool os ;-)
Heh. Like most of the world, I run multiple OS's. While the main server is (correctly) identified as Linux, it is a pretty safe bet that the dotNet-specific information for each Unicode character isn't coming from there.

# Michael S. Kaplan on 19 Jul 2005 11:32 PM:

Hi Andrew!

Two options:

1) You can take the high surrogate followed by the low surrogate -- this probably works now.

2) You can take a UTF-32 code point and convert it to UTF-16 and then do two records in a SendInput call.

Obviously #2 is cooler than #1. :-)

# Simon on 20 Jul 2005 6:03 AM:

I was the guy who asked Andrew about characters above U+FFFF. Will the application receive two keyboard events in both your options? I've found while experimenting with surrogate pairs in MSKLC that having one character split between two events can be a problem in some situations.

That might be a good interview question, no? What problems might that cause and how would you solve them?

# Michael S. Kaplan on 20 Jul 2005 10:05 AM:

Well, it is not like a keyboard can redo the way that Windows thinks about events, right? :-)

I have seen keyboards created for Old Italic, Deseret, and many other scripts where surrogate pairs were done in MSKLC, and never had problems rported with them, though. What kind of problems are you referring to, Simon?

# Simon on 20 Jul 2005 10:35 AM:

I was thinking of situations like incremental search where the first event could result in trying to process a buffer ending with a high surrogate pair, or forms with auto-tabbing where you could end up with the surrogate pair split between two entry fields.

# Michael S. Kaplan on 20 Jul 2005 10:51 AM:

Well, that would be an application bug, one that would also fail for composite characters in languages. But in neither the case you bring up nor the composite one would be it be a keyboard bug....

# Simon on 20 Jul 2005 10:58 AM:

Did I say it would be a keyboard bug? Application bugs are where I am coming from here, and finding them is what I intend to use Andrew's excellent utility for.

# Michael S. Kaplan on 20 Jul 2005 12:16 PM:

Fair enough, Simon. :-)

referenced by

2008/06/26 Using one Unicode input method at a time (Using the Unicode IME on XP)

2007/11/29 The UK Extended keyboard -- over-extended? Or weirdly extended?

2005/12/15 Handling [Unicode] input in the console

go to newer or older post, or back to index or month or day