by Michael S. Kaplan, published on 2005/01/12 09:39 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/01/12/351428.aspx
Today we'll talk about U+0138, LATIN SMALL LETTER KRA. It looks like this: "ĸ".
It has many interesting characteristics. For example, Latin letters have case and usually both and upper- and lower-case form, whereas LATIN SMALL LETTER KRA does not have a LATIN CAPITAL LETTER KRA to go with it. This is especially weird since it looks to the unenlightened eye such as mine a bit like a LATIN CAPITAL LETTER K.
Something else that is interesting is the similarity with U+043a (к, a.k.a. CYRILLIC SMALL LETTER KA).
Both of these points get us into the confusing IDN issues I talked about yesterday. One item I did not mention is that a company could want to legitimately use variant forms of letters to give their URL a distinctive look, almost like getting rich text in the address line of the browser. I am not saying it is a good idea, but it seems like an idea that would occur to somebody who does marketing (and I seldom think the the things they do are good ideas, so I could be wrong here, too).
But let us get back to LATIN SMALL LETTER KRA. The next odd thing about is where it is placed in collation. Ken Whistler of Sybase pointed out the reason for this back in 2002 on the Unicode List in response to a question from Peter Constable (formerly at SIL, now at Microsoft) about why it seemed to be sorted near the letter Q rather than the letter K:
The KRA is a Greenlandic convention for the /q/ (uvular stop). It is thus a type of q, rather than a type of k, for the purposes of default sorting. There are some other phonetic usages of the early 20th century (most notably J.P. Harrington) that picked up this use of KRA from Kleinschmidt's old West Greenlandic grammar and ran with it for [q] in transcription. Those usages later shifted over to using 'q' instead, so you find forms expressed in KRA and forms expressed in 'q' for the same words. Should sort near each other.
P.S. Rack up another one for "Every Character Has a Story".
I think I'll do that Ken. Thanks. :-)
Anyway, this runs us into an interesting philosophical quandry in collation where two goals conflict:
These two rules can often conflict, due to both the variance of rules in different languages and the general ignorance of our (collective) intuition. :-)
Obviously with LATIN SMALL LETTER KRA:
Now in this case (for Unicode) rule #1 won, because it is a character that is really not used by most languages. As a special benefit, Greenlandic text will be sorted a bit more appropriately in other locations.
Interestingly (and lest people think I always say Rah Rah Microsoft no matter what!) in Microsoft products rule #2 seems to have won, which is probably pretty disappointing to the folks in Greenland since there is not currently a locale for Greenlandic there. Damn, we may have to look into this a bit more, one way or another, in future versions....
This post brought to you by "ĸ", "к", and "κ" (U+0138, U+043a, and U+03ba, a.k.a. LATIN SMALL LETTER KRA, CYRILLIC SMALL LETTER KA, and GREEK SMALL LETTER KAPPA)
LATIN CAPITAL LETTER K did not feel the need to sponsor this post since it has visible enough position in Microsoft products in relationship to LATIN SMALL LETTER KRA and therefore there is no marketing advantage to support via this blog...
# Marcel on 12 Jan 2005 8:15 AM:
# Jerry Pisk on 12 Jan 2005 8:23 AM:
# Michael Kaplan on 12 Jan 2005 8:35 AM:
# Michael Kaplan on 12 Jan 2005 8:36 AM:
# Dean Harding on 12 Jan 2005 2:28 PM: