by Michael S. Kaplan, published on 2010/12/21 07:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2010/12/21/10107496.aspx
The title quote refers to a small bit from Martin Cruz Smith's Gorky Park that has very little to do with today's blog....
As with most of the items that eventually become blogs, today's blog started as a couple of emails to me.
Email #1:
Hello. I have code something like this:
System.Globalization.CultureInfo.CurrentCulture says fi-FI x.Name is a string.
foreach (var i in items.OrderBy(x=>x.Name)) foreach (var i in items.OrderBy(x=>x.Name,StringComparer.CurrentCulture))
both sort stuff beginning with W before V! The other StringComparer settings sort OK. The Finnish alphabet goes v,w not w,v incase it needs to be said so this seems odd.
Paths in Explorer seem to be sorting correctly.
System win7 64 (english UI):
Region format Finnish (Finland)
location Finland
Default input language
Finland (Finland) - Finnish
Installed services:
Finnish (Finland) - Keyboard * Finnish
English (United States) - Keyboard * US
I have NET 4 SP1 Beta installed & tried on both targeting x86 NET 3.5 & NET 4.0 .
test code:
static void Main(string[] args) {
List<string> ls = new List<string>();
ls.Add("Wtt"); ls.Add("Utt"); ls.Add("Vtt"); ls.Add("Stt");
ls.Add("Wtt2"); ls.Add("Utt2"); ls.Add("Vtt2"); ls.Add("Stt2");
Console.WriteLine("\r\nInvariantCultureIgnoreCase ");
foreach (var ss in ls.OrderBy(x => x.ToString(), StringComparer.InvariantCultureIgnoreCase)) Console.Write(ss+" ");
//Stt Stt2 Utt Utt2 Vtt Vtt2 Wtt Wtt2
Console.WriteLine("\r\nCurrentCulture ");
foreach (var ss in ls.OrderBy(x => x.ToString(), StringComparer.CurrentCulture)) Console.Write(ss+" ");
//Stt Stt2 Utt Utt2 Vtt Wtt Vtt2 Wtt2 --- EXPECTED SAME AS ABOVE
Console.WriteLine("\r\n"+System.Globalization.CultureInfo.CurrentCulture);
//fi-FI
Console.WriteLine(System.Globalization.CultureInfo.InstalledUICulture);
// en-US
Console.WriteLine(System.Globalization.CultureInfo.InvariantCulture);
// ""
}
What's up, any ideas?
Happy holidays, AV
Email #2:
I wasn't able to find the standard online however I found an article explaining this a bit.
The article states, that if list-format contents are entirely in Finnish V & W are "equal" in terms of sorting.
The article roughly translated says "most lists can be understood to be multilingual, which can be formally argued to make the case for not mixing v & w".
Now I argue that since my Windows UI (InstalledUICulture) is English, I expect the UI to work the same everywhere. eg. Explorer sorts things correctly (no mixing of V & W). So unless somehow explicitly specified, the default sort should* assume multi-lingual lists because the standard says that in multi-lingual lists V & W can be separate (not mixed). http://web.archive.org/web/20040618085838/www.sfs.fi/standard/20000619.html
*(becauseFinland is multi-lingual country: everyone has to learn fi,eng,swe - so lists should not be assumed to be only in Finnish in Finland)
So what is the solution when I want every program consistenly by default to sort "english/invariant style" and show language in English but I need date, time and keyboard settings to be Finnish? It's still common occurrence that 3rd party programs do not respect UICulture resulting practically random program language all over the place.
I suggest the solution is to somehow hack the APIs so that if UICulture is English then only Date/Time/Key APIs use the Finnish settings. Probably hard to do but there doesn't seem to be other good solution.
I will start simply and point out that is entirely and completely by design, and unlikely to change any time soon.
But no one should be satisfied with that as an answer; its important for there to be an explanation that has facts on its side.
So, let me explain what is going on here....
The most important place to start, looking at that quote was a small challenge for me given the relatively poor quality of my Finnish reading skills, but I believe I can verify that the meaning of
Monikielisessä aineistossa v ja w voidaan nyt aakkostaa erikseen
is indeed that in the multilingual context, V and W can be sorted separately.
This is a very big problem for us, since the contention of s difference between "entirely Finnish" and "multilingual Finnish" as the latter does not exist as an option.
There are other letters in Finnish that will not sort properly if one does not use the Finnish sort.
The point is even more interesting when one considers the changes that the neighboring swedes may eventually be contemplating, which I described in Why do we call w 'double u' -- doesn't it look more like a 'double v' ?. Given that nearly 14 years prior the Finns were considering the need to distinguish the two letters in some contexts, it is interesting they had no interest in supporting an eventual change to distinguish them, to the point where they would suggest that bifurcation was the only way the Swedes would be able to change their sort....
Now at the moment (and absent the Swedish government pushing the suggested changes, for the foreseeable future) the Swedish and Finnish sorts are the same.
Now in addition to the changes to give "w" a secondary distinction from "v", at a minimum all of the following letters sort differently than they sort in English:
üÜåÅäÄöÖØØ
Now, being neither Finn nor Swede, I cannot calculate the exact impact on changing the W/V relationship, at the xpense of breaking the way all of those other letters are sorted. But I don't think it would be a positive one for either of them....
So, absent a new sort (I could not find the standard either so I couldn't say whether the Finns ever defined it -- just as with the Swedes), there is no way to make a change to change one letter;s behavior without changing any other.
You may be wondering what the sort keys of these varous strings in the above test look like. I know I was interested, at least.
So even if you weren't, then you'll have to sit through it, since here you are just along for the ride!
First we'll look at the strings in English (en-US), which will be the same as Invariant (the default table), in weight order:
Stt 0e 91 0e 99 0e 99 01 01 12 01 01 00
Stt2 0e 91 0e 99 0e 99 0d 1a 01 01 12 01 01 00
Utt 0e 9f 0e 99 0e 99 01 01 12 01 01 00
Utt2 0e 9f 0e 99 0e 99 0d 1a 01 01 12 01 01 00
Vtt 0e a2 0e 99 0e 99 01 01 12 01 01 00
Vtt2 0e a2 0e 99 0e 99 0d 1a 01 01 12 01 01 00
Wtt 0e a4 0e 99 0e 99 01 01 12 01 01 00
Wtt2 0e a4 0e 99 0e 99 0d 1a 01 01 12 01 01 00
And then we'll look at Finnish (fi-FI), in weight order:
Stt 0e 91 0e 99 0e 99 01 01 12 01 01 00
Stt2 0e 91 0e 99 0e 99 0d 1a 01 01 12 01 01 00
Utt 0e 9f 0e 99 0e 99 01 01 12 01 01 00
Utt2 0e 9f 0e 99 0e 99 0d 1a 01 01 12 01 01 00
Vtt 0e a2 0e 99 0e 99 01 01 12 01 01 00
Wtt 0e a2 0e 99 0e 99 01 03 01 12 01 01 00
Vtt2 0e a2 0e 99 0e 99 0d 1a 01 01 12 01 01 00
Wtt2 0e a2 0e 99 0e 99 0d 1a 01 03 01 12 01 01 00
And now you can see why the "W" can sometimes come before the "V" -- because that number at the end has a primary distinction, unlike that "W" has, in Finnish.
Now there are some random points about the installed UI culture which aren't relevant to this conversation, and the current culture, which are. But none of these things impact the behavior here.
Of course there is one last mystery here, which is explaining why paths in Explorer "seem to be sorting correctly", which of course (given the above) i would be considered incorrectly.
But that is something I am unable to reproduce here:
In these screenshots from Windows 7, the Format: language for the first is English (United States) and for the second is Finnish (Finland).
Perhaps the confusion about UI language or installed UI language having some impact on sorting (from email #2) led to some confusion on settings -- neither ever controls this behavior in Explorer.
So, as I said earlier, the described behavior is pretty much expected. Though in the larger world of the Finnish suggestion of different potentially desired "multilingual" sorting behavior and the potential eventual changes in Swedish, where all of this will end up in 5-10 years is completely uncertain.
To me, at least.
As is the exact meaning of the quote in the title, though I can probably divine that a bit more effectively (and no, I do not need to fry anyone in butter first to get the answer!)....
av on 22 Dec 2010 10:56 PM:
Thanks for the answer, I'd have put some more effort to the mails had I known you'd go and publish them in entirety :-)
The reason why I thought that explorer sorts things correctly was because I just tried one string and it happened to work correctly - whoops!
My preferred sorting would be exactly as it says in the alphabet/wikipedia:
A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z, Å, Ä, Ö.
Like said I don't have the full standard. Wikipedia says only the V,W are special case and that's also what I remember from school. Depending on the implementation it could be that Å,Ä,Ö do not sort correctly if the en-US sort is used,
(testing)
Ok you are right, the Å,Ä,Ö sort unexpectedly with the en-US setting. Since I don't have Finnish stuff pretty much anywhere on my computer it's unlikely I would've noticed this problem.
The seemingly obvious solution is to, in my mind, add new Finnish language setting that reads something like "Finnish (multilingual lists)" that sorts according to above alphabet minus the W,V special case rule. Again, depending on technical details, this could be either as trivial as copying some definition file and changing a byte or two or require lot of changes.
I suppose if this was annoying enough I could just dig in with windbg to see where the OS is pulling the rules from and modify them but TBH this isn't nearly as annoying as the "random point" about UICulture which I know isn't really directly related but was used as messenger to say that "my system in English, no Finnish stuff beyond what I want please". Which could also be fixed by identifying whether a setup is running (Vista+7 already do this to some level to offer UAC elevation!) and use appcompat layer to return the english setting to those API's that setups use to determine UI to be Finnish in a system that has English UI. Off topic I know but having to switch ton of apps from Fin to Eng just because I have Finnish keyboard/date setting really gets annoying.
So how would I fix BOTH problems? If they are result of same source, maybe just "inherit"/copy the OS's English definition and add Å,å,Ä,ä,Ö,ö after Z,z in the end and apps would hopefully be none the wiser that I had English settings now with few extra letters for sorting scenarios.
Michael S. Kaplan on 22 Dec 2010 11:14 PM:
No, there is no way to get a 'Finnish multilingual' sort on the current platform. Since (like I said) the Finnish standards folks specifically considered that change to not be needed when their Swedish neighbors suggested it be done, it doesn't sound like they consider it a high priority at the moment....
Which isn't to say it wouldn't be reasonable. But the bar for changes like that is much higher.