by Michael S. Kaplan, published on 2007/02/20 03:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/02/20/1722221.aspx
I spent way too much time vacillating between which song title to use for post's title -- whether the Morrissey cred (and my more than moderately transparent desire to appeal to people who would recognize it) really did outweigh the more obviously wide recognizability of the song from the Coke commercials. I finally decided while talking to a colleague that I could go with two titles, for reasons I'll explain in a moment. Other runner-ups for titles were rejected such as one from The Beatles (too sacred) and The Carpenters (probably even fewer people would recognize it!).
Ssang is a Korean word, the Hangul of which is 쌍 (U+c30d, a.k.a. HANGUL SYLLABLE SSANGSIOS A IEUNG). And don't think I didn't notice the fact that SSANG is inside the name for SSANG (HANGUL SYLLABLE SSANGSIOS A IEUNG), either -- it's like a weird form of onomatopoeia or maybe like the old joke "in order to understand recursion, you must first understand recursion." :-)
The meaning of the word is pair as in two. So if we take that Hangul syllable and split it into it's constituent Jamo:
ᄊ U+110a, a.k.a. HANGUL CHOSEONG SSANGSIOS
ᅡ U+1161, a.k.a. HANGUL JUNGSEONG A
ᆼ U+11bc, a.k.a. HANGUL JONGSEONG IEUNG
The word SSANG is used in the Jamo names to handle those doubled letters -- like to say that it isn't just ᄉ (SIOS); it's two of them or ᄊ (SSANGSIOS).
SSANG is used for all of the doubled Jamo currently encoded in Unicode:
ᄁ U+1101, a,k,a. HANGUL CHOSEONG SSANGKIYEOK
ᄄ U+1104, a.k.a. HANGUL CHOSEONG SSANGTIKEUT
ᄈ U+1108, a.k.a. HANGUL CHOSEONG SSANGPIEUP
ᄊ U+110a, a.k.a. HANGUL CHOSEONG SSANGSIOS
ᄍ U+110d, a.k.a. HANGUL CHOSEONG SSANGCIEUC
ᄔ U+1114, a.k.a. HANGUL CHOSEONG SSANGNIEUN
ᄙ U+1119, a.k.a. HANGUL CHOSEONG SSANGRIEUL
ᅇ U+1147, a.k.a. HANGUL CHOSEONG SSANGIEUNG
ᅘ U+1158, a.k.a. HANGUL CHOSEONG SSANGHIEUH
ᆻ U+11bb, a.k.a. HANGUL JONGSEONG SSANGSIOS
ᇐ U+11d0, a.k.a. HANGUL JONGSEONG SSANGRIEUL
ᇮ U+11ee, a.k.a. HANGUL JONGSEONG SSANGIEUNG
(You may recall when I talked about in Traditional vs. Modern Sorts about how North Korea and South Korea have two entirely different linguistic philosophies about how Hangul should collate that centers on what one would do with Jamo like these.)
I was running into an interesting issue with old Hangul the other day. I used the following Old Hangul Syllable that I contrived from valid Jamo sequences. If you have a font that shapes them it looks something like this:
The name of this syllable would be HANGUL SYLLABLE RIEUL-SSANGKIYEOK A-EU KIYEOK-NIEUN.
Now in trying to construct this syllable one sees that there are no Jamo for the Leading (RIEUL-SSANGKIYEOK), Vowel (A-EU) or Trailing (KIYEOK-NIEUN) Jamo, but there is one for part of the lead Jamo -- that SSANGKIYEOK part. So couldn't one use the sequence:
It turns out, unfortunately, that this will render with that first Jamo left out:
In order to get it to render correctly, you have to not use the SSANG version of the Jamo; instead you have to specify the two Jamo separately:
I was unable to find a source for the reason behind this specific issue, perhaps it is something specific to Microsoft's implementation. And since Unicode does not specify any sort of equivalence between a SSANG Jamo and the Jamo just put in twice, in its own way I guess it is a good thing that they don't look the same. I am curious whether there was specific logioc behind the decision or not.
Though of course with names like HANGUL SYLLABLE RIEUL-SSANGKIYEOK A-EU KIYEOK-NIEUN to consider, perhaps deciding it the other way might have at least made name construction easier.
On the other hand I don't know of all that many utilities that do Old Hangul name construction; perhaps the algorithm behind that would make an interesting interview question?
At the moment there are not too many fonts out there that take advantage of the Microsoft OpenType implementation's features here, and not a whole lot of the other implementations seem to be out there, either. The latter might be why there haven't been too many people complaining about the song Microsoft is ssanging, and the latter might be why there is little in the way of suggested implementation standards out there right now. None of the fonts listed here for example, which do support the range, do any shaping with the Jamo in the range.
The Microsoft implementation is based on the idea of specific known Old Hangul sequences (defined in the appendix here), so presumably if other sequences were determined to be valid as more Old Hangul syllables were identified (and there is a new proposal from South Korea to WG2 that even ignoring all its flaws does seem to suggest there are at least 8-10 such sequences that are not in this appendix), then they can always be added....
I'll talk about collation and the impact on it another day.
So, does anyone want to take a stab at the name building algorithm? :-)
This post brought to you by 쌍 (U+c30d, a.k.a. HANGUL SYLLABLE SSANGSIOS A IEUNG)
Brian on 30 Oct 2007 10:42 PM:
What do ᄼᄽᄾᄿᅎᅏᅐᅑᅔᅕ mean??????????????????????????????????????????????????????
쌍이응 on 30 Oct 2007 10:46 PM:
Michael S. Kaplan on 30 Oct 2007 11:01 PM:
Brian, what do you mean? Those are conjoining Jamo....
韓國 on 3 Nov 2007 4:18 PM:
There are also a few obsolete jamo(ᅀ,ᅌ,ᅙ,ᄝ,ᄫ,ᅗ,ᄔ,ᅇ,ᅘ,ᄬ,ᄕ,ᇇ,ᇈ,ᇎ,ᇗ,ᇙ,ᇜ,ᇝ,ᇟ,ᄞ,ᄠ,ᄧ,ᄩ,ᇧ,ᄮ,ᄯ,ᄲ,ᄶ,ᅅ,ᅆ,ᇌ,ᇓ,ᄢ,ᄣ,ᆞ,ᆡ,ᆄ,ᆅ,ᆈ,ᆑ,ᆒ,ᆔ).
Michael S. Kaplan on 3 Nov 2007 4:31 PM:
Obsolete is a relative term here, with the government of South Korea wanting them used for Old Hangul...
쌤똍 on 3 Nov 2007 4:43 PM:
ᆝ=HANGUL JUNGSEONG I-ARAEA
ㆇ=HANGUL LETTER YO-YA
Michael S. Kaplan on 3 Nov 2007 4:47 PM:
I hear what you are saying, but the government has asked for something different, and 10646 has given it to them.
Michael S. Kaplan on 3 Nov 2007 4:55 PM:
Note that ㆇ (U+3187, HANGUL LETTER YO-YA) has a compatibility decomposition to ᆄ (U+1184, a.k.a. HANGUL JUNGSEONG YO-YA), and Old Hangul support works with the latter, not the former.
쌤뗼 on 3 Nov 2007 5:07 PM:
Bothᆝandㆇare now obsolete
HANGUL JAMO SSANGHIEUH on 3 Nov 2007 9:10 PM:
ᇰ=HANGUL JONGSEONG YESIEUNG
ᄼ=HANGUK CHOSEONG CHITUEUMSIOS
ᅑ=HANGUL CHOSEONG CEONGCHIEUMSSANGCIEUC
Michael S. Kaplan on 3 Nov 2007 9:32 PM:
So, your question in particular was???
uhgj on 3 Nov 2007 9:47 PM:
initial medial initial initial 2nd
final medial 1stmed med
Michael S. Kaplan on 3 Nov 2007 9:55 PM:
I am going to turn off comments to this post if people don't stop with the cryptic stuff....
2008/09/19 Sorting the DPRK all Out
2007/05/19 Ssang. Ssang. Come Back!
go to newer or older post, or back to index or month or day