Every character has a story #3: U+0138 (LATIN SMALL LETTER KRA)

by Michael S. Kaplan, published on 2005/01/12 09:39 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/01/12/351428.aspx


Today we'll talk about U+0138, LATIN SMALL LETTER KRA. It looks like this: "ĸ".

It has many interesting characteristics. For example, Latin letters have case and usually both and upper- and lower-case form, whereas LATIN SMALL LETTER KRA does not have a LATIN CAPITAL LETTER KRA to go with it. This is especially weird since it looks to the unenlightened eye such as mine a bit like a LATIN CAPITAL LETTER K.

Something else that is interesting is the similarity with U+043a (к, a.k.a. CYRILLIC SMALL LETTER KA).

Both of these points get us into the confusing IDN issues I talked about yesterday. One item I did not mention is that a company could want to legitimately use variant forms of letters to give their URL a distinctive look, almost like getting rich text in the address line of the browser. I am not saying it is a good idea, but it seems like an idea that would occur to somebody who does marketing (and I seldom think the the things they do are good ideas, so I could be wrong here, too).

But let us get back to LATIN SMALL LETTER KRA. The next odd thing about is where it is placed in collation. Ken Whistler of Sybase pointed out the reason for this back in 2002 on the Unicode List in response to a question from Peter Constable (formerly at SIL, now at Microsoft) about why it seemed to be sorted near the letter Q rather than the letter K:

The KRA is a Greenlandic convention for the /q/ (uvular stop). It is thus a type of q, rather than a type of k, for the purposes of default sorting. There are some other phonetic usages of the early 20th century (most notably J.P. Harrington) that picked up this use of KRA from Kleinschmidt's old West Greenlandic grammar and ran with it for [q] in transcription. Those usages later shifted over to using 'q' instead, so you find forms expressed in KRA and forms expressed in 'q' for the same words. Should sort near each other.

P.S. Rack up another one for "Every Character Has a Story".

I think I'll do that Ken. Thanks. :-)

Anyway, this runs us into an interesting philosophical quandry in collation where two goals conflict:

These two rules can often conflict, due to both the variance of rules in different languages and the general ignorance of our (collective) intuition. :-)

Obviously with LATIN SMALL LETTER KRA:

Now in this case (for Unicode) rule #1 won, because it is a character that is really not used by most languages. As a special benefit, Greenlandic text will be sorted a bit more appropriately in other locations.

Interestingly (and lest people think I always say Rah Rah Microsoft no matter what!) in Microsoft products rule #2 seems to have won, which is probably pretty disappointing to the folks in Greenland since there is not currently a locale for Greenlandic there. Damn, we may have to look into this a bit more, one way or another, in future versions....

 

This post brought to you by "ĸ", "к", and "κ"  (U+0138, U+043a, and U+03ba, a.k.a. LATIN SMALL LETTER KRA, CYRILLIC SMALL LETTER KA, and GREEK SMALL LETTER KAPPA)
LATIN CAPITAL LETTER K did not feel the need to sponsor this post since it has visible enough position in Microsoft products in relationship to LATIN SMALL LETTER KRA and therefore there is no marketing advantage to support via this blog...


# Marcel on 12 Jan 2005 8:15 AM:

"One item I did not mention is that a company could want to legitimately use variant forms of letters to give their URL a distinctive look, almost like getting rich text in the address line of the browser."

And getting no visitors as hardly anybody would know how to type that character into the URL line ;-)

# Jerry Pisk on 12 Jan 2005 8:23 AM:

I never thought I would see a post brought to me by the KKK.

# Michael Kaplan on 12 Jan 2005 8:35 AM:

LOL Jerry! Of course how many of them would recognize variant forms of the letter K across different languages? :-)

# Michael Kaplan on 12 Jan 2005 8:36 AM:

Good poin, Marcel. No vistors -- there is that! But that's the marketing folks for you, at least when it comes to international features....

# Dean Harding on 12 Jan 2005 2:28 PM:

Well, seemingly they'd probably register both a standard ASCII-only URL and a "fancy" one, so that you use the fancy one when linking to the site, but users can just type the ASCII one when copying it from a newspaper ad (or whatever they do).

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day