by Michael S. Kaplan, published on 2008/09/19 03:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/09/19/8958026.aspx
This blog post is not about trying to "sort out" the political issues in the DPRK -- just the sort, ma'am!
In true SiaO fashion, this blog is more information about something Microsoft (and most companies) are not really able to support!
Probably the first time I blogged about North Korean was back in March of 2006, in Traditional versus modern sorts. I contrasted the way that Hangul is collated when you compare the DPRK (Democratic People's Republic of Korea, aka North Korea) and ROK (Republic of Korea, aka South Korea).
I principally talked about how the biggest differednce as that the "SSANG'ed" (doubled) Jamo were placed at the end rather than after the single Jamo that they were the double of.
And yes, the preposition at the end of the previous sentence is intentional, and something I have decided to be proud of! :-)
I did always intend to come back to the topic, but I have been busy.
I was only recently reminded about it again after Richard Ishida's tutorial at the recent IUC when I had the practically once in a life time opportunity to know something that about language/script that he did not -- the North Korean/South Korean collation difference! :-)
Many sources talk about the issue, though perhaps the clearest is in Chapter 9 (Information Processing Techniques) of Ken Lunde's CJKV Information Processing (the red emphasis added by me):
An example that illustrates different sorting requirements for the same writing system s Korean hangul. North and South Korea (DPRK and ROK, respectively), although they use the same set of jamo for constructing hangul, sort them differently. Table 9-17 illustrates the sequence in which jamo are sorted in the two Korean locales, subcategorized by the position in which they appear in hangul: initial (consonants), medial (vowels), and final (consonants).
Table 9-17 Korean Jamo Sorting Sequences
Initial DPRK ᄀᄂᄃᄅᄆᄇᄉᄌᄎᄏᄐᄑᄒᄁᄄᄈᄊᄍᄋ ROK ᄀᄁᄂᄃᄄᄅᄆᄇᄈᄉᄊᄋᄌᄍᄎᄏᄐᄑᄒ Medial DPRK ᅡᅣᅥᅧᅩᅭᅮᅲᅳᅵᅢᅤᅦᅨᅬᅱᅴᅪᅯᅫᅰ ROK ᅡᅢᅣᅤᅥᅦᅧᅨᅩᅪᅫᅬᅭᅮᅯᅰᅱᅲᅳᅴᅵ Final DPRK ᆨᆪᆫᆬᆭᆯᆰᆱᆲᆳᆴᆵᆶᆷᆸᆹᆺᆼᆽᆾᆿᇀᇁᇂᆩᆻ ROK ᆨᆩᆪᆫᆬᆭᆯᆰᆱᆲᆳᆴᆵᆶᆷᆸᆹᆺᆻᆼᆽᆾᆿᇀᇁᇂ
In general, North Korean sorts double consonants after all other consonants. The vowels, in medial positions, are also sorted differently.
Let's ignore the vowels for a moment, I'll talk about those another time (I have different linguistic theories to draw in for them!).
Should we call them Chosŏn'gŭl instead of Hangul snce we'e talking about North Korean? We can't change the character names to use the more neutral term urigeul, though that would have probably been a good idea, in retrospect. :-)
One could wonder whether the repositioning of small number of Jamo could really make such a difference.
But remember, this relatively small number of Jamo are the component pieces of 11172 Jamo.
if you take the first 28 Jamo the block (you'll see why I chose 28 in a second):
Hangul USV Choseong USV Jungseong USV Jongseong USV Name 가 0xac00 ᄀ 1100 ᅡ 1161 Hangul syllable Kiyeok A 각 0xac01 ᄀ 1100 ᅡ 1161 ᆨ 11a8 Hangul syllable Kiyeok A Kiyeok 갂 0xac02 ᄀ 1100 ᅡ 1161 ᆩ 11a9 Hangul syllable Kiyeok A Ssangkiyeok 갃 0xac03 ᄀ 1100 ᅡ 1161 ᆪ 11aa Hangul syllable Kiyeok A Kiyeoksios 간 0xac04 ᄀ 1100 ᅡ 1161 ᆫ 11ab Hangul syllable Kiyeok A Nieun 갅 0xac05 ᄀ 1100 ᅡ 1161 ᆬ 11ac Hangul syllable Kiyeok A Nieuncieuc 갆 0xac06 ᄀ 1100 ᅡ 1161 ᆭ 11ad Hangul syllable Kiyeok A Nieunhieuh 갇 0xac07 ᄀ 1100 ᅡ 1161 ᆮ 11ae Hangul syllable Kiyeok A Tikeut 갈 0xac08 ᄀ 1100 ᅡ 1161 ᆯ 11af Hangul syllable Kiyeok A Rieul 갉 0xac09 ᄀ 1100 ᅡ 1161 ᆰ 11b0 Hangul syllable Kiyeok A Rieulkiyeok 갊 0xac0a ᄀ 1100 ᅡ 1161 ᆱ 11b1 Hangul syllable Kiyeok A Rieulmieum 갋 0xac0b ᄀ 1100 ᅡ 1161 ᆲ 11b2 Hangul syllable Kiyeok A Rieulpieup 갌 0xac0c ᄀ 1100 ᅡ 1161 ᆳ 11b3 Hangul syllable Kiyeok A Rieulsios 갍 0xac0d ᄀ 1100 ᅡ 1161 ᆴ 11b4 Hangul syllable Kiyeok A Rieulthieuth 갎 0xac0e ᄀ 1100 ᅡ 1161 ᆵ 11b5 Hangul syllable Kiyeok A Rieulphieuph 갏 0xac0f ᄀ 1100 ᅡ 1161 ᆶ 11b6 Hangul syllable Kiyeok A Rieulhieuh 감 0xac10 ᄀ 1100 ᅡ 1161 ᆷ 11b7 Hangul syllable Kiyeok A Mieum 갑 0xac11 ᄀ 1100 ᅡ 1161 ᆸ 11b8 Hangul syllable Kiyeok A Pieup 값 0xac12 ᄀ 1100 ᅡ 1161 ᆹ 11b9 Hangul syllable Kiyeok A Pieupsios 갓 0xac13 ᄀ 1100 ᅡ 1161 ᆺ 11ba Hangul syllable Kiyeok A Sios 갔 0xac14 ᄀ 1100 ᅡ 1161 ᆻ 11bb Hangul syllable Kiyeok A Ssangsios 강 0xac15 ᄀ 1100 ᅡ 1161 ᆼ 11bc Hangul syllable Kiyeok A Ieung 갖 0xac16 ᄀ 1100 ᅡ 1161 ᆽ 11bd Hangul syllable Kiyeok A Cieuc 갗 0xac17 ᄀ 1100 ᅡ 1161 ᆾ 11be Hangul syllable Kiyeok A Chieuch 갘 0xac18 ᄀ 1100 ᅡ 1161 ᆿ 11bf Hangul syllable Kiyeok A Khieukh 같 0xac19 ᄀ 1100 ᅡ 1161 ᇀ 11c0 Hangul syllable Kiyeok A Thieuth 갚 0xac1a ᄀ 1100 ᅡ 1161 ᇁ 11c1 Hangul syllable Kiyeok A Phieuph 갛 0xac1b ᄀ 1100 ᅡ 1161 ᇂ 11c2 Hangul syllable Kiyeok A Hieuh
Aha. So in every block of 28, two of them will be in a different order (corresponding to the two Jongseong). Add to which for each of the five Choseong (six if you count the initial IEUNG at U+110b) there are entire additional blocks of 28 that have to be reordered, and before you know it a huge chunk of the 11172 will end up somewhere differerent. Then when you add the vowels you will again have large blocks that would be repositioned (like I said I'll get more into the vowels another time). In the end, large chunks will be moved, and clearly the ROK sort will look quite wrong to someone expecting the DPRK sort....
As I said way back in Traditional versus modern sorts, both orderings have a kind of a linguistic basis.
Though in my opinion the fact that the 11 non-SSANG doubled Jongseong (and the Jongseong IEUNG unlike the Choseong one) are not ordered differently might hurt the argument a little bit -- there is a clear interest in keeping one type of doubled Jamo interspersed and not the other.
Another regular reader told me that the CLDR did not include either a locale or a UCA tailoring for North Korean, which might be due to the same reasons that Microsoft doesn't have one (DPRK is not a Wassenaar Arrangement member).
I do wonder what happens in North Korea (which presumably has pirated copies of lots of software) of for expatriate North Koreans -- is their preferred collation being shaken out of them due to all of these other matters?
I guess we're in politics again. :-)
This blog brought to you by 까 (U+ae4c, aka HANGUL SYLLABLE SSANGKIYEOK A)
Gwyn on 19 Sep 2008 4:59 AM:
I'm viewing this page in chrome and all I see is a bunch of square boxes. Google has a ways to go huh
Dean Harding on 19 Sep 2008 9:55 AM:
I don't want to get into a political discussion here, but are you sure it has to do with DPRK not being a member of the Wassenaar Arragement? According to [1], there's only 41 participating states... Surely Microsoft exports to more than just those 41 states, right? For example, China, Mexico, Indonesia, in fact most of South America, Africa and Asia are missing from that list.
[1] http://www.wassenaar.org/participants/index.html
I should have said more on this -- as a belligerent state that threatens to use such weapons that is not on a permanent member of the UN Security Council, it gets special status here for how nervous it makes everyone else!
(PS Sorry if there's multiple posts, IE8 is giving me trouble... delete my previous ones if they show up)
Done. I'm still not ready to dip a toe in the sea of IE again, just yet. :-)
Jeffrey L. Whitledge on 19 Sep 2008 12:01 PM:
Did you mean ᄍ or ᄊ in the Initial DPRK list?
Whoops! Fixed now. :-)
Mihai on 19 Sep 2008 12:12 PM:
@Gwyn: All well in my Chrome (default install, not tweaking).
@Michael: "An example that illustrates different sorting requirements for the same writing system"
Like finding examples is a problem :-) Just take French/Spanish (both Latin languages), or German/Swedish (both Germanic languages)
John Cowan on 19 Sep 2008 12:58 PM:
I think you need to insert some fillers in the chart before your medials and finals. Firefox (or is it Linux?) displays them as a big hairy blob of overlapped ink.
Michael S. Kaplan on 19 Sep 2008 12:58 PM:
Hey Mihai! For a book on CJKV? This is actually the most interesting example! :-)
Michael S. Kaplan on 19 Sep 2008 1:02 PM:
Hey John -- must be Linux, 'cause Firefox looks good here! :-)
referenced by
2015/04/16 Context > collation, when coming out of North Korea....
2008/09/21 Sorting the Vowels all Out