by Michael S. Kaplan, published on 2008/12/18 10:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/12/18/9234559.aspx
WARNING: This blog has nothing whatsoever¹ do with Nordic sex.
Regular reader Santhosh Pillai had a question not too long ago that I found to be rather kick ass and cool, professionally speaking.
It was:
Hi,
I am trying to figure out how these characters sort in Danish: A, Å, Æ, B, Ø, Z
- If I follow the Danish and Norwegian Sort Order documentation in MSDN, it should be: A, B, Z, Å, Æ, Ø.
- If I follow, Unicode sort order, it will be: A, Å, Æ, B, Ø, Z
- If I follow the order by which they are pronounced, it will be: A, Å, Æ, B, Ø, Z (because Å is Aa, Æ is Ae, and Ø is Oe)
- And if I look at the alphabet list and how they appear in the list, the order will be: A, B, Z, Æ, Ø, Å
Which one is “more correct”?
In the product I work on, here is what happens:
- In "task list”, the letters are sorted as: A, B, Z, Æ, Ø, Å
- In "event list", the letters are sorted as: Å. A, B, Z, Æ, Ø
- In "contact list", the letters are sorted as: B, A, Z, Æ, Ø, Å (don’t know where this comes from!)
Thanks
Fascinating!
The MSDN "topic" he was talking about in #1 above is from the appendix from v1 of Developing International Software. That table doesn't appear to be as complete as I once thought it was, and I'll have to remember to consider giving Cathy a hard time about that at some point in the not too distant future! :-)
We know that #2 (the "Unicode" order) is wrong because it is from the default table and we know Danish has some different rules.
And we know that #3 is wrong since it is not how someone in Denmark would pronounce things....
And that #4 list from Wikipedia is tempting though the format doesn't appeal to me quite as much. I am much more of glutton for punishment. :-)
Looking at what Windows does, we can go to the protocol documents and (excerpting the relevant data from the Active Directory Sort Table it points to to get both the entries in the default table and the ones that control exceptions for Danish):
0x0041 14 2 2 18 ; A Latin Capital Letter A
0x0042 14 9 2 18 ; B Latin Capital Letter B
0x005a 14 169 2 18 ; Z Latin Capital Letter Z
0x00c6 14 172 2 18 ; Æ Latin Capital Letter AE
0x00d8 14 174 2 18 ; Ø Latin Capital Letter O With Stroke
0x00c5 14 177 2 18 ; Å Latin Capital Letter A With Ring Above
And there we have it -- the Wikipedia article is right.
And 2/3 of the pieces of the nameless product are doing their own thing, off the reservation. And they definitely aren't doing it like Danes do it!
1 - Sorry if you jumped in, based on the title, thinking I was go all totally inappropriate about my last stay in Copenhagen².
2 - It was actually for a trip to Malmo³ with several evenings in Copenhagen⁴.
3 - For the Festival, you see....
4 - This was done primarily⁵ for the cooler hotel on the Denmark side.
5 - Oh, and also the nightlife⁶ in Copenhagen on the non-Malmo nights.
6 - Ah, never mind; I said this blog wouldn't be about Nordic sex! :-)
This blog brought to you by Ø (U+00d8, aka LATIN CAPITAL LETTER O WITH STROKE)
Michael Madsen on 18 Dec 2008 4:41 PM:
...Additionally, Aa is sorted as Å, for historic reasons (we Danes didn't always have Å - it didn't get officially introduced until 1948, and many cities and surnames are still spelled with the double-A). Aardvarks tend to suffer from this. :)
Michael S. Kaplan on 18 Dec 2008 5:06 PM:
Yes, I limited myself to the letters in the original question. The more thorny issue of Aa (which also leads into the disunification with Norwegian) I discuss a bit here and here....
And we should all feel bad for the poor aardvarks! :-)
Jesper Larsen-Ledet on 23 Oct 2013 7:01 AM:
I have a list of names (some Danish) that I would like to group by first letter and I would like "Aage" (Åge) to be placed under Å.
If I sort the list it's easy enough to get him to show up next to Åmund and Åse but how can I get just 'Å' from "Aage"?