Yi Syllables are totally Radical, dude!

by Michael S. Kaplan, published on 2006/06/07 13:14 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/06/07/620955.aspx


(This is also not the font post; just hang in there, it will be here soon!)

Yi is one of the minority languages of China. The Liangshan Yi script was devised in the mid 70's and the standard was pushed out to the world in 1980 (a fuller description of the script in Unicode can be seen at this Babelstone article).

It is one of the scripts that is supported in Vista with a locale, an input method, and a font named Microsoft Yi Baiti. Which is very cool. :-)

So anyway, in Unicode the script has these two blocks:

Yi Syllables (U+a000 -- U+a48f)

Yi Radicals (U+a490 -- U+a4Cf)

The first block is the one that is actually used for the language; the second block really has no specific defined use outside of dictionary-type headers or index entries.

Because of this, the two are generally not collated together (with radicals interleaved with syllables) -- similar to the way Latin is not interleaved with Han in Simplified Chinese sorts based on Pinyin and Bopomfo is not interleaved with Han in Traditional Chinese pronunciation sorts used in Taiwan.

Of course there is still room for confusion, if you look across all of both ranges there are a few that look the same (on the left is the Yi Radical, on the right is the Yi Syllable):

U+a49c     U+a0c0 

U+a4a8     U+a132 

U+a49a     U+a1d9 

U+a4bf     U+a259 

U+a494     U+a2cd 

U+a4c2     U+a3b5 

U+a4b0     U+a3c2 

U+a4a7     U+a458 

It is easy to imagine grabbing the wrong one (i.e. the radical rather than the syllable) if it is easy enough to do so.

Now this makes no difference for simply looking at text, but when trying to search within it or sort it, you could run across a real problem -- since in collation (e.g. in the Unicode Collation Algorithm) all of the radicals are put together in a separate weight space from where the syllables are.

Of course one could:

but each of these solutions would come at the price of making some other behavior seem incorrect.

In the end, the key would be to just not use the Yi Radicals when one should be using the Yi Syllables (a solution probably best handled within the input method rather than within the font or the collation).

 

This post brought to you by (U+a015, a.k.a. YI SYLLABLE WU)
(star in an upcoming Unicode character story!)


# Gabe on 7 Jun 2006 2:00 PM:

I'm glad they put a "W" at the top of the Wu syllable to make it easy to remember. Very thoughtful of them...

# Marc Brooks on 7 Jun 2006 5:48 PM:

Not to whine, but where's the font?  I can't see all those clever characters you compare :)

Any chance these Yi are in a downloadable font (I'm on XP)

# Michael S. Kaplan on 7 Jun 2006 6:04 PM:

Hi Marc,

It ships with Vista, and I believe it may also ship with Office 12 (not 100% sure on that last part though). You can always go to the character links to see what they look like, or the code charts. :-)

Andrew West on 8 Jun 2006 10:35 AM:

Yi is already supported in a number of freely available fonts, including SimSun-18030 (http://www.microsoft.com/china/windows2000/downloads/18030.asp), SIL Yi (http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=SILYi_home) and Code2000 (http://home.att.net/~jameskass/code2000_page.htm), so you don't have to wait for Vista to be able to view the Yi script.

I guess the situation with regard to Yi syllables and radicals is no different to that for CJK ideographs and radicals; for example U+2FBA ⾺ [KANGXI RADICAL HORSE] is identical to U+99AC 馬. In fact the situation for CJK is even worse, as every character in the Kangxi Radicals block is duplicated in the CJK Unified Ideographs block, as are most of the characters in the CJK Radicals Supplement block; whereas for Yi only a few of the radicals are identical to Yi syllables. One area where there is real potential for confusion between CJK ideographs and their corresponding radicals is in Ideographic Description Sequences, where both ideographs and radicals may be used in the same context (personally I would avoid using radicals in IDS sequences, but it seems to be perfectly OK to use them if you want). I don't think that there is any analogous situation for Yi, where you would mix syllables and radicals in the same context, so there is far less scope for confusion. And as Michael says, any Yi input method should ensure that you don't mix syllables and radicals.

referenced by

2008/02/22 What's missing from the model

go to newer or older post, or back to index or month or day