How are the Danes doing it?

by Michael S. Kaplan, published on 2008/12/18 10:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/12/18/9234559.aspx


WARNING: This blog has nothing whatsoever¹ do with Nordic sex.

Regular reader Santhosh Pillai had a question not too long ago that I found to be rather kick ass and cool, professionally speaking.

It was:

Hi,

I am trying to figure out how these characters sort in Danish: A, Å, Æ, B, Ø, Z

  1. If I follow the Danish and Norwegian Sort Order documentation in MSDN, it should be: A, B, Z, Å, Æ, Ø.
  2. If I follow, Unicode sort order, it will be: A, Å, Æ, B, Ø, Z
  3. If I follow the order by which they are pronounced, it will be: A, Å, Æ, B, Ø, Z (because Å is Aa, Æ is Ae, and Ø is Oe)
  4. And if I look at the alphabet list and how they appear in the list, the order will be: A, B, Z, Æ, Ø, Å


Which one is “more correct”?

In the product I work on, here is what happens:

Thanks

Fascinating!

The MSDN "topic" he was talking about in #1 above is from the appendix from v1 of Developing International Software. That table doesn't appear to be as complete as I once thought it was, and I'll have to remember to consider giving Cathy a hard time about that at some point in the not too distant future! :-)

We know that #2 (the "Unicode" order) is wrong because it is from the default table and we know Danish has some different rules.

And we know that #3 is wrong since it is not how someone in Denmark would pronounce things....

And that #4 list from Wikipedia is tempting though the format doesn't appeal to me quite as much. I am much more of glutton for punishment. :-)

Looking at what Windows does, we can go to the protocol documents and (excerpting the relevant data from the Active Directory Sort Table it points to to get both the entries in the default table and the ones that control exceptions for Danish):

0x0041   14    2    2   18 ; A   Latin Capital Letter A
0x0042   14    9    2   18 ; B   Latin Capital Letter B
0x005a   14  169    2   18 ; Z   Latin Capital Letter Z
0x00c6   14  172    2   18 ; Æ  
Latin Capital Letter AE
0x00d8   14  174    2   18 ; Ø  
Latin Capital Letter O With Stroke
0x00c5   14  177    2   18 ; Å   Latin Capital Letter A With Ring Above

And there we have it -- the Wikipedia article is right.

And 2/3 of the pieces of the nameless product are doing their own thing, off the reservation. And they definitely aren't doing it like Danes do it!

 

1 - Sorry if you jumped in, based on the title, thinking I was go all totally inappropriate about my last stay in Copenhagen².
2 - It was actually for a trip to Malmo³ with several evenings in Copenhagen
.
3 - For the Festival, you see....
4 - This was done primarily⁵ for the cooler hotel on the Denmark side.
5 - Oh, and also the nightlife⁶ in Copenhagen on the non-Malmo nights.

6 - Ah, never mind; I said this blog wouldn't be about Nordic sex! :-)


This blog brought to you by Ø (U+00d8, aka LATIN CAPITAL LETTER O WITH STROKE)


Michael Madsen on 18 Dec 2008 4:41 PM:

...Additionally, Aa is sorted as Å, for historic reasons (we Danes didn't always have Å - it didn't get officially introduced until 1948, and many cities and surnames are still spelled with the double-A). Aardvarks tend to suffer from this. :)

Michael S. Kaplan on 18 Dec 2008 5:06 PM:

Yes, I limited myself to the letters in the original question. The more thorny issue of Aa (which also leads into the disunification with Norwegian) I discuss a bit here and here....

And we should all feel bad for the poor aardvarks! :-)

Jesper Larsen-Ledet on 23 Oct 2013 7:01 AM:

I have a list of names (some Danish) that I would like to group by first letter and I would like "Aage" (Åge) to be placed under Å.

If I sort the list it's easy enough to get him to show up next to Åmund and Åse but how can I get just 'Å' from "Aage"?


Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day