Sorting the DPRK all Out

by Michael S. Kaplan, published on 2008/09/19 03:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/09/19/8958026.aspx


This blog post is not about trying to "sort out" the political issues in the DPRK -- just the sort, ma'am!

In true SiaO fashion, this blog is more information about something Microsoft (and most companies) are not really able to support!

Probably the first time I blogged about North Korean was back in March of 2006, in Traditional versus modern sorts. I contrasted the way that Hangul is collated when you compare the DPRK (Democratic People's Republic of Korea, aka North Korea) and ROK (Republic of Korea, aka South Korea).

I principally talked about how the biggest differednce as that the "SSANG'ed" (doubled) Jamo were placed at the end rather than after the single Jamo that they were the double of.

And yes, the preposition at the end of the previous sentence is intentional, and something I have decided to be proud of! :-)

I did always intend to come back to the topic, but I have been busy.

I was only recently reminded about it again after Richard Ishida's tutorial at the recent IUC when I had the practically once in a life time opportunity to know something that about language/script that he did not -- the North Korean/South Korean collation difference! :-)

Many sources talk about the issue, though perhaps the clearest is in Chapter 9 (Information Processing Techniques) of Ken Lunde's CJKV Information Processing (the red emphasis added by me):

An example that illustrates different sorting requirements for the same writing system s Korean hangul. North and South Korea (DPRK and ROK, respectively), although they use the same set of jamo for constructing hangul, sort them differently. Table 9-17 illustrates the sequence in which jamo are sorted in the two Korean locales, subcategorized by the position in which they appear in hangul: initial (consonants), medial (vowels), and final (consonants).

Table 9-17 Korean Jamo Sorting Sequences

Initial  DPRK
 ᄀᄂᄃᄅᄆᄇᄉᄌᄎᄏᄐᄑᄒᄁᄄᄈ
   ROK 
 ᄀᄁᄂᄃᄄᄅᄆᄇᄈᄉᄊᄋᄌᄍᄎᄏᄐᄑᄒ
Medial  DPRK
 ᅡᅣᅥᅧᅩᅭᅮᅲᅳᅵᅢᅤᅦᅨᅬᅱᅴᅪᅯᅫᅰ 
 
 ROK
 ᅡᅢᅣᅤᅥᅦᅧᅨᅩᅪᅫᅬᅭᅮᅯᅰᅱᅲᅳᅴᅵ
Final  DPRK 
 ᆨᆪᆫᆬᆭᆯᆰᆱᆲᆳᆴᆵᆶᆷᆸᆹᆺᆼᆽᆾᆿᇀᇁᇂᆩᆻ 
   ROK
 ᆨᆩᆪᆫᆬᆭᆯᆰᆱᆲᆳᆴᆵᆶᆷᆸᆹᆺᆻᆼᆽᆾᆿᇀᇁᇂ

In general, North Korean sorts double consonants after all other consonants. The vowels, in medial positions, are also sorted differently.

Let's ignore the vowels for a moment, I'll talk about those another time (I have different linguistic theories to draw in for them!).

Should we call them Chosŏn'gŭl instead of Hangul snce we'e talking about North Korean? We can't change the character names to use the more neutral term urigeul, though that would have probably been a good idea, in retrospect. :-)

One could wonder whether the repositioning of small number of Jamo could really make such a difference.

But remember, this relatively small number of Jamo are the component pieces of 11172 Jamo.

if you take the first 28 Jamo the block (you'll see why I chose 28 in a second):

 Hangul 
 USV 
 Choseong 
 USV 
 Jungseong 
 USV 
 Jongseong 
 USV 
 Name
0xac00 1100 1161

 Hangul syllable Kiyeok A
0xac01 1100 1161 11a8
 Hangul syllable Kiyeok A Kiyeok
0xac02 1100 1161 11a9  Hangul syllable Kiyeok A Ssangkiyeok 
0xac03 1100 1161 11aa  Hangul syllable Kiyeok A Kiyeoksios
0xac04 1100 1161 11ab  Hangul syllable Kiyeok A Nieun
0xac05 1100 1161 11ac  Hangul syllable Kiyeok A Nieuncieuc
0xac06 1100 1161 11ad  Hangul syllable Kiyeok A Nieunhieuh
0xac07 1100 1161 11ae  Hangul syllable Kiyeok A Tikeut
0xac08 1100 1161 11af  Hangul syllable Kiyeok A Rieul
0xac09 1100 1161 11b0  Hangul syllable Kiyeok A Rieulkiyeok
0xac0a 1100 1161 11b1  Hangul syllable Kiyeok A Rieulmieum
0xac0b 1100 1161 11b2  Hangul syllable Kiyeok A Rieulpieup
0xac0c 1100 1161 11b3  Hangul syllable Kiyeok A Rieulsios
0xac0d 1100 1161 11b4  Hangul syllable Kiyeok A Rieulthieuth
0xac0e 1100 1161 11b5  Hangul syllable Kiyeok A Rieulphieuph 
0xac0f 1100 1161 11b6  Hangul syllable Kiyeok A Rieulhieuh
0xac10 1100 1161 11b7  Hangul syllable Kiyeok A Mieum
0xac11 1100 1161 11b8  Hangul syllable Kiyeok A Pieup
0xac12 1100 1161 11b9  Hangul syllable Kiyeok A Pieupsios
0xac13 1100 1161 11ba  Hangul syllable Kiyeok A Sios
0xac14 1100 1161 11bb  Hangul syllable Kiyeok A Ssangsios
0xac15 1100 1161 11bc  Hangul syllable Kiyeok A Ieung
0xac16 1100 1161 11bd  Hangul syllable Kiyeok A Cieuc
0xac17 1100 1161 11be  Hangul syllable Kiyeok A Chieuch
0xac18 1100 1161 11bf  Hangul syllable Kiyeok A Khieukh
0xac19 1100 1161 11c0  Hangul syllable Kiyeok A Thieuth
0xac1a 1100 1161 11c1  Hangul syllable Kiyeok A Phieuph
0xac1b 1100 1161 11c2  Hangul syllable Kiyeok A Hieuh

Aha. So in every block of 28, two of them will be in a different order (corresponding to the two Jongseong). Add to which for each of the five Choseong (six if you count the initial IEUNG at U+110b) there are entire additional blocks of 28 that have to be reordered, and before you know it a huge chunk of the 11172 will end up somewhere differerent. Then when you add the vowels you will again have large blocks that would be repositioned (like I said I'll get more into the vowels another time). In the end, large chunks will be moved, and clearly the ROK sort will look quite wrong to someone expecting the DPRK sort....

As I said way back in Traditional versus modern sorts, both orderings have a kind of a linguistic basis.

Though in my opinion the fact that the 11 non-SSANG doubled Jongseong (and the Jongseong IEUNG unlike the Choseong one) are not ordered differently might hurt the argument a little bit -- there is a clear interest in keeping one type of doubled Jamo interspersed and not the other.

Another regular reader told me that the CLDR did not include either a locale or a UCA tailoring for North Korean, which might be due to the same reasons that Microsoft doesn't have one (DPRK is not a Wassenaar Arrangement member).

I do wonder what happens in North Korea (which presumably has pirated copies of lots of software) of for expatriate North Koreans -- is their preferred collation being shaken out of them due to all of these other matters?

I guess we're in politics again. :-)

 

This blog brought to you by(U+ae4c, aka HANGUL SYLLABLE SSANGKIYEOK A)


Gwyn on 19 Sep 2008 4:59 AM:

I'm viewing this page in chrome and all I see is a bunch of square boxes. Google has a ways to go huh

Dean Harding on 19 Sep 2008 9:55 AM:

I don't want to get into a political discussion here, but are you sure it has to do with DPRK not being a member of the Wassenaar Arragement? According to [1], there's only 41 participating states... Surely Microsoft exports to more than just those 41 states, right? For example, China, Mexico, Indonesia, in fact most of South America, Africa and Asia are missing from that list.

[1] http://www.wassenaar.org/participants/index.html

I should have said more on this -- as a belligerent state that threatens to use such weapons that is not on a permanent member of the UN Security Council, it gets special status here for how nervous it makes everyone else!

(PS Sorry if there's multiple posts, IE8 is giving me trouble... delete my previous ones if they show up)

Done. I'm still not ready to dip a toe in the sea of IE again, just yet. :-)

Jeffrey L. Whitledge on 19 Sep 2008 12:01 PM:

Did you mean ᄍ or ᄊ in the Initial DPRK list?

Whoops! Fixed now. :-)

Mihai on 19 Sep 2008 12:12 PM:

@Gwyn: All well in my Chrome (default install, not tweaking).

@Michael: "An example that illustrates different sorting requirements for the same writing system"

Like finding examples is a problem :-) Just take French/Spanish (both Latin languages), or German/Swedish (both Germanic languages)

John Cowan on 19 Sep 2008 12:58 PM:

I think you need to insert some fillers in the chart before your medials and finals.  Firefox (or is it Linux?) displays them as a big hairy blob of overlapped ink.

Michael S. Kaplan on 19 Sep 2008 12:58 PM:

Hey Mihai! For a book on CJKV? This is actually the most interesting example! :-)

Michael S. Kaplan on 19 Sep 2008 1:02 PM:

Hey John -- must be Linux, 'cause Firefox looks good here! :-)


referenced by

2015/04/16 Context > collation, when coming out of North Korea....

2008/09/21 Sorting the Vowels all Out

go to newer or older post, or back to index or month or day