by Michael S. Kaplan, published on 2005/06/01 02:31 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/06/01/423711.aspx
You Kana wonder how we order Japanese strings? :-)
Some time yesterday, one of the testers over on the Shell team was curious about how collation works for the Japanese alphabet. The discussion was an interesting one, so I thought I would post the summary of all the infomation we talked about (with some examples for each interesting distinction) here.
Note that this behavior relates to what is done on Windows (as well as SQL Server, Office, Windows CE, Active Dirctory, and every Microsoft product that either calls our APIs or uses our data). Your mileage for other platforms may certainly vary!
Ok, on Windows, the Japanese Kana all sort in an implementation of the GoJuOn order, with the following principles:
When you combine all these rules together, the order you get for the vowels would be:
And then the other important things to note (changes in red 1 June 2005 7:50am):
In other words, everything on the same line below can be made to seem equal; everything on a different line cannot.
HALFWIDTH KATAKANA LETTER SMALL A; KATAKANA LETTER SMALL A; HALFWIDTH KATAKANA LETTER A; KATAKANA LETTER A; HIRAGANA LETTER SMALL A; HIRAGANA LETTER A; CIRCLED KATAKANA A
HALFWIDTH KATAKANA LETTER SMALL I; KATAKANA LETTER SMALL I; HALFWIDTH KATAKANA LETTER I; KATAKANA LETTER I; HIRAGANA LETTER SMALL I; HIRAGANA LETTER I; CIRCLED KATAKANA I
HALFWIDTH KATAKANA LETTER SMALL U; KATAKANA LETTER SMALL U; HALFWIDTH KATAKANA LETTER U; KATAKANA LETTER U; HIRAGANA LETTER SMALL U; HIRAGANA LETTER U; KATAKANA LETTER VU; HIRAGANA LETTER VU; CIRCLED KATAKANA U
HALFWIDTH KATAKANA LETTER SMALL E; KATAKANA LETTER SMALL E; HALFWIDTH KATAKANA LETTER E; KATAKANA LETTER E; HIRAGANA LETTER SMALL E; HIRAGANA LETTER E; CIRCLED KATAKANA E
HALFWIDTH KATAKANA LETTER SMALL O; KATAKANA LETTER SMALL O; HALFWIDTH KATAKANA LETTER O; KATAKANA LETTER O; HIRAGANA LETTER SMALL O; HIRAGANA LETTER O; CIRCLED KATAKANA O
The rules for the flags affect all this? Well....
Now obviously Windows file names are "case insensitive" but we do not consider the "small" Kana and the "regular" Kana to be case pair (no one does, usually including native speakers) -- so you can have both of them in file names in the same directory, but you cannot use both as the same names in (for example) an Active Directory installation (in fact since all four flags are passed for AD, you cannot use any of the letters within the colored groups together in the same AD namespace).
Ignoring something with these flags in this context means "treat them all as equal" -- which means you will have a non-deterministic ordering any time you have a big list with many of these variants comparing as equal. In my opinion, a deterministic order is always better, and not just because I try to be an orderly guy. :-)
But your mileage may vary, of course!
Now the Kanji are not sorted in pronunciation order, because as I mentioned back in December of last year, there is no pronunciation-based sort for Japanese on Windows. But if you have entered the pronunciation information and are sorting by it (the way that for example an addressbook might choose do) then this order will be respected. Note that name readings (nanori'yomi) are sometimes (perhaps often) entirely individual and do not match any of the kun'yomi or on'yomi with which a given ideograph may be commonly associated. So such a feature makes a lot of sense if you know how all the names are pronounced; if not (for example in a large company address book) you may want an alternate way to search for names that you may know only by characters and not by pronunciation.
This post brought to you by "ヰ" (U+30f0, a.k.a. KATAKANA LETTER WI)
# Philip Newton on 1 Jun 2005 7:18 AM:
# Michael S. Kaplan on 1 Jun 2005 7:39 AM:
# Nicholas Allen on 1 Jun 2005 9:52 AM:
# Michael S. Kaplan on 1 Jun 2005 11:26 AM:
# Eusebio Rufian-Zilbermann on 1 Jun 2005 12:37 PM:
# Michael S. Kaplan on 1 Jun 2005 2:30 PM:
2006/09/19 Put in on my Tab, please
2006/01/03 'Acceptable' Japanese sort order?
2005/07/20 More on sort elements
go to newer or older post, or back to index or month or day