Even if it makes no sense it has to go somewhere

by Michael S. Kaplan, published on 2007/01/18 06:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/01/18/1488052.aspx


My first exposure to the Cyrillic script was back in the early 80s when I read Heinlein's Expanded Universe.

He was talking about the trip that he and Ginny took to Russia, and about how half of the battle to learn Russian was in learning the script.

He pointed out how words like Ресторан exist which kind of look like almost what they are, and I remember going out and learning a bit about Cyrillic just because the idea fascinated me. :-)

Anyway, I was going to talk about software stuff....

Sometimes internationalization features need to define behavior even if it does not make sense for the behavior to be needed.

For example, there has to be way to sort a word in a list like aaaxxxxjoiae, even though really is no such word. Because if it ever was on a list, people would kind of intuitively know where it would be sorted alphabetically.

Another example is a bug that someone reported several weeks ago related to a new Vista feature (which I will post about another day!). Right now I am just going to tie together the hodge podge of things I just talked about.

The issue came up in relation to the Ukrainian locale, and specifically the sorting of the Ukrainian alphabet. which I have talked about before along with other Cyrillic script languages here.

Now every child learning Ukrainian knows from a young age (and I know from listening to Boomchyk teaching me!) that Ь and ь, the uppercase and lowercase forms of the CYRILLIC SOFT SIGN (U+042c and U+044c) come at the end of the alphabet.

But as that Wikipedia article I pointed to above said, "The soft sign ь has no phonetic value, but indicates softening (palatalization) of a preceding consonant."

Therefore, even though one would not create Ukrainian words that begin with Ь or ь, you have to define some kind of sorting behavior if you are asked to give a sort key for a made up word that begins with one of them. The fact that it is meaningless really is irrelevant to the need to define behavior.

But the question then comes up -- where should it sort?

Is it better to make it affect the preceding letter, acting as if it were some sort of diacritic (these characters also have no real meaning in the front of a word)? Probably not, since people who learn the alphabet expect it to be at the end of the alphabetical order, but it has to be considered since it has no meaning in some contexts.

So in the end the decision is made to put it at the end of the Ukrainian alphabet. With the hopeful understanding that while it is true that a tester may be running into needing to test the behavior at some point, a typical user would never need to. :-)

Anyway, coming up tomorrow, the actual feature (and bug!) that brings the whole issue up, and why it really isn't such a huge problem (I'll have some other examples too).

 

This post brought to you by Ь (U+042c, a.k.a. CYRILLIC CAPITAL LETTER SOFT SIGN)


# roxfan on 19 Jan 2007 5:05 PM:

That's pretty weird... I would expect Ukrainian to use the order of Russian and Belarusian and put Ь before ЭЮЯ.

# Michael S. Kaplan on 20 Jan 2007 1:20 AM:

Every language can have its subtle differences in alphabet...


go to newer or older post, or back to index or month or day