Using a character proposal for a 'repertoire fence' extension

by Michael S. Kaplan, published on 2007/02/24 03:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/02/24/1751301.aspx

If you tune in here to read from time to time, you may recall some previous posts where I talked about repertoire fences, basically situations where one standard or one proposal (in those cases for code pages) was being used to get a better understanding of a different process.

This time I am going to extend that idea a little bit into some different areas.

Now this proposal is for 117 new Hangul Jamo to be used for a set of over 5000 Old Hangul found in an extensive review of many sources (the proposal specifically mentions that "We guess that there won't be many Old Hangul complex letters to be found in the future.", presumably due to the fact that the review was reportedly so extensive.

There are some technical issues in this current proposal which have been communicated to the Korean National Body to address. This would be for some future WG2 meeting, and I am actually not really going to talk about the proposal itself (which can wait for if and when an updated proposal would be submitted).

In the meantime, in looking at the proposal's 34 Choseong, 28 Jungseong, and 55 Jongseong new Jamo, it is clear that whether you want to encode these new complex Jamo or represent them as the separate, existing ones, that the presumably valid claim is that they all exist in historical data.

For what it is worth, these same sequences are missing from the Old Hangul collation tables, as well. And none of them currently exist as Jamo containing multiple components, either.

This would mean (assuming the evidence is sufficient for these nine "complex" Jungseong and Jongseong Jamo in the 72 pages of examples in the proposal) is that both the tables in Appendix B to the OpenType Hangul specification and in the Old Hangul collation data could stand to be augmented by these nine sequences, in order to allow for the proper collation and rendering of whatever percentage of the over 5000 Old Hangul syllables depend on them.

In the meantime, the storage of data would work properly, though the collation and rendering would not work as well as all the rest of the Old Hangul syllables.

For many this is a minor point given the current lack of a widely available font that makes use of Uniscribe's Old Hangul shaping engine.

But some people do have the font support, and other people may not want to wait when it comes to cataloging and storing data that makes use of such syllables. Which means storing valid data does not need to wait for the perfect font or for the perfect collation data (the current collation data will provide deterministic results and obviously the fonts will have some behavior, even if it is less than ideal (like in this previous post).

Yet another situation where repertoire fences can actually be used to allow the data in a proposal to assist in an implementation! :-)

1. AE (ㅐ) = A (ㅏ) + I (ㅣ), YAE (ㅒ) = YA (ㅑ) + I (ㅣ), E (ㅔ) = EO (ㅓ) + I (ㅣ), and YE (ㅖ) = YEO (ㅕ) + I (ㅣ).

Even though they are named O-YAE, YO-AE, etc, they are equivalent to O-YA-I, YO-A-I, etc.

So HANGUL JUNGSEONG O-YAE would be equivalent to U+1169 (O) U+1163 (YA) U+1175 (I) on the Appendix B, and HANGUL JUNGSEONG YO-AE would be equivalent to U+116D (YO) U+1161 (A) U+1175 (I) on the Appendix B.

This is because Opentype Hangul support did not use any complex jamo.

2. YU-O is on Appendix B.

3. About the jongseong yesieung/ieung issue:

The ieung that is in any jongseong jamo has been replaced with yesieung. (This only applies to jongseong; it doesn't apply to choseong.)

This is because in old Hangul, jongseong ieung was not pronounced (like choseong ieung) and jongseong yesieung was pronounced "ng."

This is why in Unicode 5.1, U+11EC (jongseong ieung-kiyeok) through U+11EF (jongseong ieung-khieukh) have given annotations such as yesieung-kiyeok, yesieung-khieukh, etc.

According to Korean standard KS X 1026-1, jongseong U+11EC through U+11EF are no longer ieung- jamo, but yesieung- jamo; it also says that they should not be sorted under ieung, but sorted under yesieung.

So instead of adding jongseong ieung-mieum, Korea decided to add jongseong yesieung-mieum.

ieung-mieum in Appendix B is replaced with yesieung-mieum and added in Unicode 5.2

ieung-sios in Appendix B is merged into existing yesieung-sios

ieung-hieuh and yesieung-hieuh in Appendix B are merged into yesieung-hieuh and added in Unicode 5.2

yesieung-kiyeok in Appendix B is merged into already existing ieung-kiyeok, which is given an annotation "yesieung-kiyeok" in Unicode 5.1

yesieung-khieukh in Appendix B is merged into already existing ieung-khieukh, which is given an annotation "yesieung-khieukh" in Unicode 5.1

For jongseong yesieung/ieung issue, see page 19 on this PDF file:

std.dkuug.dk/.../n3422.pdf

(This document is a translation of Korean Standard KS X 1026-1:2007. However, this English version is not a Korean Standard.)

Though there is one typo:

their order should be after ᅌ (IEUNG) – IEUNG should have been YESIEUNG

(They actually used the yesieung character (ᅌ) and I don't know why they wrote IEUNG. Korean version has it as yesieung.)

N3168 ID value	Korean Name	English Name	Conjoining Jamo
HX205	한글 중성 오-얘	HANGUL JUNGSEONG O-YAE	U+1169 U+1164 (ᅩᅤ)
HX209	한글 중성 요-애	HANGUL JUNGSEONG YO-AE	U+116d U+1162 (ᅭᅢ)
HX213	한글 중성 유-애	HANGUL JUNGSEONG YU-AE	U+1172 U+1162 (ᅲᅢ)
HX214	한글 중성 유-오	HANGUL JUNGSEONG YU-O	U+1172 U+1169 (ᅲᅩ)
HX217	한글 중성 으-에	HANGUL JUNGSEONG EU-E	U+1173 U+1166 (ᅳᅦ)
HX220	한글 중성 이-얘	HANGUL JUNGSEONG I-YAE	U+1175 U+1164 (ᅵᅤ)
HX222	한글 중성 이-예	HANGUL JUNGSEONG I-YE	U+1175 U+1168 (ᅵᅨ)
HX228	한글 중성 아래아-에	HANGUL JUNGSEONG ARAEA-E	U+119e U+1166 (ᆞᅦ)
HX349 (UX361)	한글 종성 옛이응-미음	HANGUL JONGSEONG YESIEUNG-MIEUM	U+11f0 U+11b7 (ᇰᆷ)