Um…I've noticed you around…I find you very attractive…um…would you conjoin with me?

by Michael S. Kaplan, published on 2010/07/11 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2010/07/11/10033336.aspx


No, this blog with the title alluding to hilarious and fun Touch & Go song is not a blog about this last Saturday (when I was enjoying the night life in Hyderabad). Or the Saturday before that (when I was enjoying the night life in Bangalore). No conjoining happened either of those nights by the way, I just drank and danced and had fun. But not that kind of fun. This blog is about something else entirely!

Not too long ago, Diane asked via the Contact link:

Good Day Sir, Sorry for the disturbance, but I have tried a lot of forums, but nobody seems to know the answer. I'm using WinXP and Korean Input System (IME 2002), my problem is that it gives me Hangul Compatibility Jamo (3130-318F) instead of regular Hangul Jamo (1100-11FF). Now I need regular Jamo because compatibility Jamo won't nomalize correctly (programmatically using NormalizeString and NormalizeKC). I really need this for a project. What IME should I use or is there a way to set my current IME to output regular Jamo? btw, I'm not Korean, and I don't really understand the settings, but I tried (clicking various options) in vain. Thank you for reading.

No disturbance, but this is actually kind of by design.

You see, the IME team specifically designed the IME to work this way, so that even though a Korean user may not specifically feel any disdain for any of the conjoining Jamo that the Korean IME provided by Microsoft should not ever emit the conjoining Jamo.

The IME team did not make the decision to behave that way on their own; it was based on the specific request from the Microsoft Korean subsidiary.

And the Microsoft Korean subsidiary did not make this request lightly, or with no input; it was based on the specific requirement of the Korean government.

You see, they don't care so much for the conjoining Jamo to be associated with the modern Hangul syllables.

And the Korean IMEs that have been shipping with all versions of Windows do not handle Old Hangul, which is the one case where they are okay with the use of conjoining Jamo.

For a somewhat irreverent take on the standards side of this, see You think herding cats is hard? Try herding CATS. Hell, try herding KATS!.

But anyway, all hope is not lost, Diane!

First there are some blogs about these other ways of doing Hangul, like One more thing about Korean...., for example.

Now in theory you could normalize the individual compatibility Jamo (like U+3131) to either normalization forms KC or KD, they will be converted to the conjoining Jamo (in the case of U+3131, to U+1100). The reason you would want to do the conversion in individual characters is that if you do full strings and use Form KC and there are obviously cases that could construct Hangul syllables....

However this is is not the best solution, since you are going to find that the IME's compatility Jamo forms don't tend to distinguish between lead consonants and trailing consonants (thus in the above example both when you want U+1100 HANGUL CHOSEONG KIYEOK and U+11a8 HANGUL JONGSEONG KIYEOK you will only have that same U+3131 HANGUL LETTER KIYEOK to work with. Which means you would have to know whether you were dealing with the leading of the trailing consonant yourself).

As I point out in We're off on the road to Korea! We certainly do get around..., the underlying keyboard does not distinguish between leading and trailing consonants -- all of that is done in the IME. So if you choose to not use its tricks you have to come up with tricks of your own!

The easiest answer is to let the IME do its composition work so you get full Hangul syllables, then decompose them. At each stage, by grabbing the character too early, you will just be getting the wrong thing a lot of the time.

Note this only applies to IMEs that work this way like Microsoft's, where its own "smarts" (and its design to always be constructing something when it is asked) get in the way of the scenario.... and you have to replace with your own smarts. An IME that emitted the conjoining Jamo in these interim phases would give what Diane is looking for, though I do not know of any that do that.


Marc Durdin on 11 Jul 2010 1:17 PM:

We have a Korean IME for Keyman and KeymanWeb that outputs the individual Jamo and then combines them if desired - an online version is at www.keymanweb.com/.../korean_rr

Michael S. Kaplan on 11 Jul 2010 4:23 PM:

That must thrill the government folks in Korea... :-)

Marc Durdin on 11 Jul 2010 11:05 PM:

It's just a tool...  Can be used for good or for standards.  It certainly has thrilled a few of our users.  :)


Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day