Who is the Hacek Girl?

by Michael S. Kaplan, published on 2006/10/12 03:38 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/10/12/818566.aspx

I do manage to get a lot of random email (between my two main accounts, several hundred a day, not including spam). Like just the other day, I got a question from someone:

Our customer hit an issue in Slovak collation where 'c' and 'č' compared differently with ignore namespace. We found that 'č'is an exception for the Slovak locale that’s why they are not compared the same. I’m just checking that this is by design.

Indeed, this is by design, for the reasons given in You can't ignore diacritics when a language does not give them diacritic weight. The C Caron (or C Hacek, depending on which side of Every character has a story #22: U+0c27 (CARON) you are on!).

But what makes this letter more than just a C with a bit of shmutz on top like you'd get from a dirty monitor? That is the real question here -- what makes it a separate letter?

Well, č accomplishes what  is known as palatalization (you can read the Wikipedia article for more info). In fact, both the word Hacek (pronounced like 'Hat Check' without the t) and 'Hat Check' both have that ch that in English has the distinct phonemic change known as palatalization (thus my silly pun, based on the fact that I went down the hall and asked our linguist who explained it to me, thus becoming the unofficial Hacek Girl, for the day at least).

The issue in collation dicussed at the beginning of the post is an interesting one, as it is for the most part something that the native speaker of a language would never ask (since they know the answer), but anyone who is not a native speaker can easily see as an anomaly. Which I can understand (although it has been a while since I last thought of a diacritic always implying diacritic weight). It is probably one of the more common questions that gets asked as people start getting more involved with internationalizing their applications and testing them with various settings.

I do include the emails that come from that contact link in the count as long as it isn't spam, by the way -- many cool questions that later end up on the blog come from there (also a non-trivial amount of people asking me if I know them, to which the answer is usually NO).


This post brought to you by č (U+010d, a.k.a. LATIN SMALL LETTER C WITH CARON)

Igor on 12 Oct 2006 7:53 PM:

Hey, we use letter č too!

And ć and š and đ and ž.

It is too bad that we have to give up on ;'[]\ and :"{}| to type them though. Not that funny for a programmer.

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day