Olive, the other reindeer, gets to Sort it all Out too....

by Michael S. Kaplan, published on 2010/09/13 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2010/09/13/10060986.aspx


So anyway, the other day in Latvian. Genitive. Oops., down in the comments, a bit of Sami conversation started up between me and John Cowan (he was actually commenting about Any Sami speakers reading this blog? :-), technically:

John:  Sami: There are actually six functioning Sami languages, five of which are listed in your post; Kildin Sami is presumably omitted because it's written in Cyrillic.  Speaking of "the Sami language" is a misnomer, and people who do it usually refer to Northern Sami only, which is far and away the biggest and the most thriving of the group (20,000 Northern Sami speakers, 2000 Lule Sami speakers, and a few hundred for the other four; three more Sami languages have 10-20 older adult speakers only).

Me: For Sami, I have talked about why we don't support the Cyrillic Samis (and why there are nine Sami locales in Windows) in the past....

John: You have?  I couldn't find any references to Cyrillic Sami locales with either the blog search or site-limited Google search.

Me: Oh, I hint at it here [in Why do we call w 'double u' -- doesn't it look more like a 'double v' ?], but I guess I never talked about it. Ok, look for an upcoming blog on Sami!

 Now the bit where i hinted was:

Back at the Unicode Conference, after the "Design Principles for A Regional, Multilingual Keyboard"birds-of-a-feather, I had a chance to talk with Klaas Ruppel, who has been helping with the Finnish government standards.

(Among other things, he gave us some data about how the Cyrillic script versions of Sami work to help with our collation efforts. I'll talk more about this another day....)

It indirectly has to do with a particular issue covered in that A&P of Sort Keys series, in particular part 4 (It isn't a race but let's make an EXCEPTION and cross the Finnish line) and part 6 (Relax, be calm, and deCOMPRESS if you are feeling out of sorts) -- the fact that in Microsoft's exception table and compression table pieces of its collation implementation, these features are locale dependant.

It started with an interesting conversation that Cathy Wissink and I were having about locales like Serbian and Bosnian, which had two different scripts (Cyrillic and Latin) that was split between different locales.

The "thought question" that sparked the conversation was that there was no good reason to really split the tables -- which meant that if I had Cyrillic Serbian lists that they would not sort correctly with the Latin Serbian locale. Realistically, it just made sense to just support the alternate sort of the language completely.

So we did that.

The precedent of putting more in a table than the label atop the table might more explicitly imply had now been set. And collation of Windows would never be the same.

For example, when the issue of Sami came up, we found that someone from Microsoft had promised someone from some government that we would support a bunch of different Sami locales. Those kind of promises are often made lightly by important people who don't always know the consequences of the promise in regard to resources required to important people, but in this case it wasn't too much work to do.  So as I mentioned in Lions and tigers and bearsELKs, Oh my!, we added nine different locales for Sami:

At conferences, this list had a built-in joke that always got a laugh (by having nine Sami locales, we were able to have one locale each for Dasher, Dancer, Prancer, Vixen, Comet, Cupid, Donner, Blitzen, and Rudolph! :-)

Now John is correct that in addition to Northern, Lule, Southern, Skolt, and Inari that there is another "major", sixth Sami -- Kildin. And the interested subsidiary contacts in Norway, Sweden, and Finland really didn't have Klldin as a priority.

But with Klaas Ruppel right there and talking about various linguistic issues, I was able to get from him the information on the differences in collation that would be needed to take a Cyrillic script language like Russian (in our default table) and make it look right for Kildin Sami. And in short order he had for us the information....

And then, with the blessing of program management and test, we slid the 16 letters (well, eight letters, upper and lower case for each) into the sorts for the locales covering the other five Sami languages.

This has been a pretty hidden feature. Since it was first added (in Vista) and ignoring that previous oblique reference of mine I quoted earlier, it is mentioned nowhere.

I have only ever had one customer mention it -- a university student who was doing something related to Kildin who was forwarded to me by someone else. She wanted to know if we were doing anything special for sorting in the Kildin Sami language when the Inari Sami locale was specified, since it seemed like we were but she couldn't find it in the documentation.

That was pretty awesome. :-)

I really had always meant to say something about in a blog, but had never gotten around to it. With Kildin Sami only being spoken by about 600 people, a locale obviously isn't likely, but this small addition just seemed like something nice.

On the other hand, looking at some of the community sizes of the ones released (e.g. ~600 for Southern Sami, ~2000 for Lule Sami, ~300 for Inari Sami, ~400 for Skolt Sami), this is clearly a case where it isn't about the size, per se. So one never knows, I guress!

And now we have coverge for that "tenth reindeer" Olive (from a careful mis-hearing of the Rudolph song and that TV special Olive, the Other Reindeer).

Even if young Olive doesn't get her own locale, she doesn't have to feel too out of sorts about it anymore. We've got your back, Olive!


John Cowan on 13 Sep 2010 10:44 AM:

Well, Olive is a dog that doesn't fly, so that might not be the most tactful comparison.  But I'm glad it works!

Michael S. Kaplan on 13 Sep 2010 10:50 AM:

Just in that special. In the joke we were teling for years, Olive was the 10th reindeer. Though she was kind of bitchy, laughing and calling Rudolph names....


go to newer or older post, or back to index or month or day