In search of the Swedish Tipping Point....

by Michael S. Kaplan, published on 2009/02/18 03:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2009/02/18/9430335.aspx


By necessity, my blogs are often about something on the micro scale -- one customer report, one phenomenon that interests me, one event, one bug, one concert, one wheelchair, one function. Even the occasional groupings of these things are quite small.

And then there are trends. Thus Why do we call w 'double u' -- doesn't it look more like a 'double v'?, where I talked about the Swedish Academy's change to the way the letters W and V were to be handled in collation, and the impact on Microsoft software when this change eventually makes it to the point where it needs to be integrated and that one day this "theoretical" issue that is a punch line in a blog post from Raymond or I would have far reaching design consequences led to The disunification of Norwegian and Danish sorting a few days later where I noted a "Nordic" scenario where it was happening already not so far from Sweden. The follow-up on this theoretical scenario turning real and being fixed in Vista then saw its culmination to the fix for SQL Server in The disunification of Norwegian and Danish sorting ( SQL Server 2008 Edition!).

Meanwhile, back in Malmo (a place in Sweden that I have visited several times over the years, for the festival)....

Several years prior to the fix in SQL Server, in Unicode and SQL Collations have nothing to do with each other, I pointed out to a customer who was confused about how the SQL_SwedishStd_Pref_CP1_CI_AS collation returned different results for Unicode and non-Unicode columns because Unicode columns go through the Windows collations, always.

Note how the assumption was that the Unicode column and thus the Windows collation behavior was correct.

Now, less than three years later, another customer reports:

Thanks for all the research on this issue, we really appreciate it! As I see it ‘v’ and ‘w’ should not be treated the same in the Swedish language and as was pointed out in the article http://blogs.msdn.com/michkap/archive/2006/04/25/583307.aspx referred below and subsidiary in http://www.saol.se/saol13_pres.html (written in Swedish from the Swedish Academy) the two letters ‘v’ and ‘w’ should be distinct letters and not treated as equally in e.g. sorting. That the Finnish government does not seem to have changed there meaning must mean that Sweden and Finland can not share the same collation in SQL Server. I must say that it is really strange that this has not been corrected for so long.

As we see it this behaviour should be changed to the right one as soon as possible through the use of e.g. a specific Swedish collation that is different from the Finnish one. What is the process to get this to work? Does that go through Microsoft Sweden office or Connect? I guess that this is not changed over a night, so that we will have to live with this for a while, but I should definitely recommend a change. Can you help with this?

I have forwarded the information on to the appropriate owners, so this first customer report of an assumption that the suggested change has been duly noted by the people who need to know about it.

But the zeroeth customer report (to use the zero-based counting system that I recall seeing in elevators (lifts) in Sweden!) is of course not the tipping point for determining when the change is most appropriate to make -- so there will obviousl need to be some research to determine when would be the best release in which to make the change. And when it has been long enough.

This will require more than the "micro " report that made its way to this blog.

Though of course one the change is made, the fact that there is a mitigation for those not read for the change -- the fact that the Finns look like they are not changing the same way -- should ease the pain a bit! :-)

In the meantime, the customer asked if there was a workaround, a wa to get the newer behavior sooner.

I found one, reall the only one I could think of.

A way to make a letter that is not a V and that has a unique alphabetic weight that could masquerade like a "new Swedish collation style W".

It starts with , aka U+2123 (VERSICLE).

It has a unique alphabetic weight just after V but before W.

In fact, it has always had such a weight, since as far back as NT 3.1!

Now there is no lowercase version (only an uppercase one), but if one built a calculated column that replaced all instance of both W and w with then indexing on that calculated column will allow every case-insensitive Swedish Windows-style collation in SQL Server to return the expected results, and every case-sensitive Swedish Windows-style collation in SQL Server to return almost the expected results.

For completeness, replacing Ŵ and ŵ (U+0174 and U+0175, aka CAPITAL and SMALL LATIN LETTER W WITH CIRCUMFLEX) with plus some diacritic (like U+0302 -- COMBINING CIRCUMFLEX ACCENT) would handle the other "W-style" letter moved by Swedish/Finnish today....

So here's to the lookout for that tipping point!

 

This blog brought to you by(U+2123, aka VERSICLE)


Rene on 17 Mar 2009 10:56 PM:

Hello Michael,

A little off topic here! I spet practically the whole day trying to find the Windows default casing/collation tables that you talk about frequently on you posts.

Where are these tables? Are they located in some file in the OS?  Are they in some place in the registry?

Please heeeeeeeeeeeeeelp me.

Thanks

Rune on 6 Sep 2013 1:09 PM:

So Michael, whatever happened to this issue?

We have an existing system used by public libraries in Sweden. Currently we use a DBMS from a different vendor, but we are evaluating MS SQL Server. However, the outdated Swedish collation turns out to be a royal pain in the posterior.

You see, some of our customers run libraries for schools. They need to separate grades. Each grade can be divided so that there are no more than twentyish students in the classroom. I.e. "Class 1A", "1B", "1C" and so on...all the way to "1V"..."1W"... oh wait... I can't put an index on this field, because the string "Class 1V" is identical to "Class 1W". Ooops.

So we tried to specify Latin1 as the collation for this particular column (and a handful other columns that need to be unique). Which leaves me dealing with lots of horrible error messages from MSSQL saying "Cannot resolve collation conflict between "Latin1_General_CI_AS" and "Finnish_Swedish_100_CI_AI" in add operator occurring in SELECT statement". It gets even worse once I blend Entity Framework into the mix.

Basically, from our POV, Windows' Swedish collation is broken and unusable. (No worries mate, PostgreSQL and MySQL has virtually non-existing collation support in comparision, so we long abandoned them as potential candidates).


referenced by

2010/03/15 Thus the problems resist solution, and the workarounds are often inadequate

go to newer or older post, or back to index or month or day