The disunification of Norwegian and Danish sorting

by Michael S. Kaplan, published on 2006/04/27 03:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/04/27/584439.aspx


A few days ago, I wrote a post entitled Why do we call w 'double u' -- doesn't it look more like a 'double v' ?

The post itself had nothing to do with the title; it was actually about the Swedish Academy changes to consider 'w' and 'v' to be separate letters, as well as some of the potential consequences for Microsoft products if such a change is reflected in actual customer usage that needs to be captured in the future.

(the title was actually a little experiment to see whether a catchy title would get more attention in comments than the substance of the post; you can look for yourself what was proven in that experiment.)

Anyway, in the process of my speculation on products and what they might do in the future, I forgot that Vista already has such a change, in a country that is quite near to Sweden and Finland!

It has to do with the use of "aa" in Danish and Norwegian.

Both languages have the same basic alphabet -- the 26 latin letters used in English and most places, plus æ, ø, and å. Though in earlier days the letters aa were used instead of å (in fact in Danish this letter was not added until a spelling reform that happened in the late 1940s -- it was already widely used in both Norwegian and Swedish).

Now the Danish and Norwegian sort in Microsoft products was using the special sorting of the aa as a unique letter after z that was basically equal to å.

But while this was a (relatively) recent addition for Danish, it has been in Norwegian for much longer. And feedback had come in from customers such as the following:

If you take a Norwegian dictionary you would in fact find the German town Aachen as one of the first entries under the letter A, but Explorer will put it at the end after the letter Z.

As I pointed out earlier Aa is never interpreted as Å for anything but family names dating pre-1917, and even then it is not uncommon in Norway to sort those names as double A. A person with the name Aalberg might frown to see his name listed at the top, but would be far from surprised. However a person searching for the town Aachen would not understand why Explorer put it at the end.

Feedback such as this led to some more investigation and the final assesment was made that the time had come to remove the aa entries from the compression tables for Norwegian.

"But Michael," you might be asking, "What about Danish?"

Well, the answer there is that it is still too common to expect it to be treated as a letter -- and there are way too many textbooks and websites that put an extra (aa) at the end of the alphabet. Danish could not make the same change that Norwegian needed.

So, in Vista, the Norwegian tables have had the three variants of the aa compression removed, while the Danish (and also the Greenlandic, which uses the same sort as Denmark) have not. Therefore, these formerly unified sorts will now been disunified.

That theoretical question I posed the other day has become decidely non-theoretical! :-)

You may be filled with one or more of the following questions:

  1. What does this mean for prior versions of Windows and the .NET Framework?
  2. What does this mean for Microsoft Access (7.0 - 11.0) and their Norwegian/Danish sorts?
  3. What does this mean for DAO 3.5/3.6 and their dbLangNorwDan/dbSortNorwdan enumeration members?
  4. What does it mean for the Norwegian/Danish sort in SQL Server 7.0?
  5. What does it mean for the DANISH_NORWEGIAN_* collations in SQL Server 2000 and 2005?
  6. What does this mean for Windows Vista?
  7. What does it mean for future versions of Access, Jet, and SQL Server?
  8. What does it mean for WinFS?
  9. What does it mean for future versions of the .NET Framework?

The answer for questions 1-5 is simple -- not a bleeding thing. We can't change those prior version results.

The answer for question 6 is also simple -- it is going to be changed. The two sorts with the two different LCIDs (0x0406 versus 0x0414/0x0814) and different names (da-DK versus nb-NO/nn-NO) will return two different results.

The answer for questions 7-9 is also simple (though that may change at some point!) -- I do not have a freaking clue. But you can bet your lunch money that I will be asking people some questions about this issue for WinFS and for the next version of SQL Server and for the upcoming version of Access that ships with Office 12.

 

This post brought to you by "ø" (U+00f8, a.k.a. LATIN SMALL LETTER O WITH STROKE)


# charless on 27 Apr 2006 10:36 AM:

So, I have been reading your blog off and on for a while. This post did prompt one additional question for me though. Why was it not sponsered by the letter å? :^)

# Michael S. Kaplan on 27 Apr 2006 11:00 AM:

Well, å has already been getting a lot of ad time in other posts, so I figured ø deserved a little time, too. :-)

referenced by

2010/03/15 Thus the problems resist solution, and the workarounds are often inadequate

2010/03/03 Nordic duck duck goose -- Bokmål, Bokmål, Bokmål, Nynorsk!

2009/02/18 In search of the Swedish Tipping Point....

2008/03/27 The disunification of Norwegian and Danish sorting ( SQL Server 2008 Edition!)

2007/10/25 Not all in sync quite yet (aka SQL and the CLR and Windows and .NET)

2007/09/16 A&P of Sort Keys, part 6 (aka Relax, be calm, and deCOMPRESS if you are feeling out of sorts)

2007/07/31 See that version there? It is going down, man! #2 (aka Everybody WYNNs)

2007/06/18 If you don't always preserve case, you don't always preserve meaning

2006/08/26 The myth of cross-product compatibility

go to newer or older post, or back to index or month or day