Using a character proposal for a 'repertoire fence' extension

by Michael S. Kaplan, published on 2007/02/24 03:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/02/24/1751301.aspx


If you tune in here to read from time to time, you may recall some previous posts where I talked about repertoire fences, basically situations where one standard or one proposal (in those cases for code pages) was being used to get a better understanding of a different process.

This time I am going to extend that idea a little bit into some different areas.

First we will take ISO/IEC JTC1/SC2/WG2 N3168 (A Proposal to add new Hangul Jamo extended characters to BMP of UCS) -- before you click there, keep in mind this is an 83 page, 10.6 MB document!

Now this proposal is for 117 new Hangul Jamo to be used for a set of over 5000 Old Hangul found in an extensive review of many sources (the proposal specifically mentions that "We guess that there won't be many Old Hangul complex letters to be found in the future.", presumably due to the fact that the review was reportedly so extensive.

There are some technical issues in this current proposal which have been communicated to the Korean National Body to address. This would be for some future WG2 meeting, and I am actually not really going to talk about the proposal itself (which can wait for if and when an updated proposal would be submitted).

In the meantime, in looking at the proposal's 34 Choseong, 28 Jungseong, and 55 Jongseong new Jamo, it is clear that whether you want to encode these new complex Jamo or represent them as the separate, existing ones, that the presumably valid claim is that they all exist in historical data.

Now looking at Appendix B to the OpenType Hangul specification, of these 117 Jamo there are nine that are not included in the Appendix as sequences:

N3168 ID value Korean Name English Name Conjoining Jamo
HX205 한글 중성 오-얘 HANGUL JUNGSEONG O-YAE U+1169 U+1164 (ᅩᅤ)
HX209 한글 중성 요-애 HANGUL JUNGSEONG YO-AE U+116d U+1162 (ᅭᅢ)
HX213 한글 중성 유-애 HANGUL JUNGSEONG YU-AE U+1172 U+1162 (ᅲᅢ)
HX214 한글 중성 유-오 HANGUL JUNGSEONG YU-O U+1172 U+1169 (ᅲᅩ)
HX217 한글 중성 으-에 HANGUL JUNGSEONG EU-E U+1173 U+1166 (ᅳᅦ)
HX220 한글 중성 이-얘 HANGUL JUNGSEONG I-YAE U+1175 U+1164 (ᅵᅤ)
HX222 한글 중성 이-예 HANGUL JUNGSEONG I-YE U+1175 U+1168 (ᅵᅨ)
HX228 한글 중성 아래아-에 HANGUL JUNGSEONG ARAEA-E U+119e U+1166 (ᆞᅦ)
HX349 (UX361) 한글 종성 옛이응-미음 HANGUL JONGSEONG YESIEUNG-MIEUM U+11f0 U+11b7 (ᇰᆷ)

For what it is worth, these same sequences are missing from the Old Hangul collation tables, as well. And none of them currently exist as Jamo containing multiple components, either.

This would mean (assuming the evidence is sufficient for these nine "complex" Jungseong and Jongseong Jamo in the 72 pages of examples in the proposal) is that both the tables in Appendix B to the OpenType Hangul specification and in the Old Hangul collation data could stand to be augmented by these nine sequences, in order to allow for the proper collation and rendering of whatever percentage of the over 5000 Old Hangul syllables depend on them.

In the meantime, the storage of data would work properly, though the collation and rendering would not work as well as all the rest of the Old Hangul syllables.

For many this is a minor point given the current lack of a widely available font that makes use of Uniscribe's Old Hangul shaping engine.

But some people do have the font support, and other people may not want to wait when it comes to cataloging and storing data that makes use of such syllables. Which means storing valid data does not need to wait for the perfect font or for the perfect collation data (the current collation data will provide deterministic results and obviously the fonts will have some behavior, even if it is less than ideal (like in this previous post).

Yet another situation where repertoire fences can actually be used to allow the data in a proposal to assist in an implementation! :-)

 

This post brought to you by  (U+11f0, a.k.a. HANGUL JONGSEONG YESIEUNG)


J. C1 on 27 Jan 2011 9:16 AM:

1. AE (ㅐ) = A (ㅏ) + I (ㅣ), YAE (ㅒ) = YA (ㅑ) + I (ㅣ), E (ㅔ) = EO (ㅓ) + I (ㅣ), and YE (ㅖ) = YEO (ㅕ) + I (ㅣ).

Even though they are named O-YAE, YO-AE, etc, they are equivalent to O-YA-I, YO-A-I, etc.

So HANGUL JUNGSEONG O-YAE would be equivalent to U+1169 (O) U+1163 (YA) U+1175 (I) on the Appendix B, and HANGUL JUNGSEONG YO-AE would be equivalent to U+116D (YO) U+1161 (A) U+1175 (I) on the Appendix B.

This is because Opentype Hangul support did not use any complex jamo.

2. YU-O is on Appendix B.

3. About the jongseong yesieung/ieung issue:

The ieung that is in any jongseong jamo has been replaced with yesieung. (This only applies to jongseong; it doesn't apply to choseong.)

This is because in old Hangul, jongseong ieung was not pronounced (like choseong ieung) and jongseong yesieung was pronounced "ng."

This is why in Unicode 5.1, U+11EC (jongseong ieung-kiyeok) through U+11EF (jongseong ieung-khieukh) have given annotations such as yesieung-kiyeok, yesieung-khieukh, etc.

According to Korean standard KS X 1026-1, jongseong U+11EC through U+11EF are no longer ieung- jamo, but yesieung- jamo; it also says that they should not be sorted under ieung, but sorted under yesieung.

So instead of adding jongseong ieung-mieum, Korea decided to add jongseong yesieung-mieum.

ieung-mieum in Appendix B is replaced with yesieung-mieum and added in Unicode 5.2

ieung-sios in Appendix B is merged into existing yesieung-sios

ieung-hieuh and yesieung-hieuh in Appendix B are merged into yesieung-hieuh and added in Unicode 5.2

yesieung-kiyeok  in Appendix B is merged into already existing ieung-kiyeok, which is given an annotation "yesieung-kiyeok" in Unicode 5.1

yesieung-khieukh  in Appendix B is merged into already existing ieung-khieukh, which is given an annotation "yesieung-khieukh" in Unicode 5.1

J. C1 on 27 Jan 2011 9:26 AM:

For jongseong yesieung/ieung issue, see page 19 on this PDF file:

std.dkuug.dk/.../n3422.pdf

(This document is a translation of Korean Standard KS X 1026-1:2007. However, this English version is not a Korean Standard.)

Though there is one typo:

their order should be after ᅌ (IEUNG) – IEUNG should have been YESIEUNG

(They actually used the yesieung character (ᅌ) and I don't know why they wrote IEUNG. Korean version has it as yesieung.)


referenced by

2010/04/21 If no one supported the OLD Old proposal, jumping in to support the NEW Old proposal may not make sense…

2010/04/20 You can't get this particular bit of proverbial toothpaste back into the tube

2008/09/21 Sorting the Vowels all Out

2008/08/21 A&P of Sort Keys, part 14: The Hangul is really getting OLD

go to newer or older post, or back to index or month or day