by Michael S. Kaplan, published on 2005/12/07 03:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/12/07/500827.aspx
It has nearly been a year since I asked the question Why is there no pronunciation-based sort for Japanese? somewhat rhetorically (since I answered the question!).
But since then I have gotten the question a few more times, as people ask how they can do it.
I kind of answered that in the earlier post:
Given all of that, there is no way to even guess what the most common pronunciation is, even if the data were available, without giving users results that seem wrong or confusing to them. Because even though one could craft an algorithm that could make intelligent guesses at which type of reading is meant, there is no way to make something at least as likely to be correct as the other East Asian languages, especially given that what is probably the most common need for such a sort (lists of names) would require a separate field for the pronunciation.
And this is indeed the best solution for such situation -- a separate field containing the pronunciation. It works quite well, and I would encourage any application that wants to do a pronunciation-based sort to try doing this as a method.
In theory, this is something an application can do when a name is typed when the IME mode is based on pronunciation; this is the one time that the pronunciation information is present without it being queried separately -- during the composition phase. As far as I know, this is not something that is done right now (if I am mistaken feel free to let me know!). It would be exceedingly difficult to do with the IME APIs and Windows messages as they are (and it is nearly impossible in the .NET Framework since the appropriate events are not even exposed).
The best solution is right in there -- have a separate field with the pronunication in it. That pronunciation may be
And then you can do the sorting based on this alternate field rather than the display string.
If one is using a pronunciation-based IME then this will seem inconvient at times (after all, you may have just typed the same string to find the candidate you wanted!) and would definitely be inonvenient when you are actually typing Kana or Hangul, but there are also times that the pronunciation string may be very different and thus the duplication would not be happening.
The real question that comes into play now is how visible the pronunciation string should be in a user interface.
Clearly for the situation where a singla Kanji/Han/Hanja ideograph has multiple well-known pronunciations, an ordered list likely does not necessarily need to include the pronunciation since the context is probably sufficient without it.
In the case of Japanese names where a nanori reading may be completely unrelated to any of the generally known readings, having the pronunciations available and perhaps even visib is likely a lot more crucial.
As I think about address book type user interfaces, the issue of how best to intuitively place that information (and how to not have it around when it would not be useful) becomes interesting. Perhaps a furigana type solution would be the most intuitive for Japanese users?
Which had me wondering whether there were any languages outside of East Asia where a pronunciation sort would be used. Anyone know of any? :-)
This post brought to you by "ㄎ" (U+310e, a.k.a. BOPOMOFO LETTER K)
# anonymous on 7 Dec 2005 4:44 AM:
# Michael S. Kaplan on 7 Dec 2005 8:55 AM:
# Mihai on 7 Dec 2005 12:57 PM:
# Michael S. Kaplan on 7 Dec 2005 1:01 PM:
# Mihai on 7 Dec 2005 2:08 PM:
referenced by