In search of the Swedish Tipping Point....

by Michael S. Kaplan, published on 2009/02/18 03:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2009/02/18/9430335.aspx

By necessity, my blogs are often about something on the micro scale -- one customer report, one phenomenon that interests me, one event, one bug, one concert, one wheelchair, one function. Even the occasional groupings of these things are quite small.

And then there are trends. Thus Why do we call w 'double u' -- doesn't it look more like a 'double v'?, where I talked about the Swedish Academy's change to the way the letters W and V were to be handled in collation, and the impact on Microsoft software when this change eventually makes it to the point where it needs to be integrated and that one day this "theoretical" issue that is a punch line in a blog post from Raymond or I would have far reaching design consequences led to The disunification of Norwegian and Danish sorting a few days later where I noted a "Nordic" scenario where it was happening already not so far from Sweden. The follow-up on this theoretical scenario turning real and being fixed in Vista then saw its culmination to the fix for SQL Server in The disunification of Norwegian and Danish sorting ( SQL Server 2008 Edition!).

Meanwhile, back in Malmo (a place in Sweden that I have visited several times over the years, for the festival)....

Several years prior to the fix in SQL Server, in Unicode and SQL Collations have nothing to do with each other, I pointed out to a customer who was confused about how the SQL_SwedishStd_Pref_CP1_CI_AS collation returned different results for Unicode and non-Unicode columns because Unicode columns go through the Windows collations, always.

Note how the assumption was that the Unicode column and thus the Windows collation behavior was correct.

I have forwarded the information on to the appropriate owners, so this first customer report of an assumption that the suggested change has been duly noted by the people who need to know about it.

But the zeroeth customer report (to use the zero-based counting system that I recall seeing in elevators (lifts) in Sweden!) is of course not the tipping point for determining when the change is most appropriate to make -- so there will obviousl need to be some research to determine when would be the best release in which to make the change. And when it has been long enough.

Though of course one the change is made, the fact that there is a mitigation for those not read for the change -- the fact that the Finns look like they are not changing the same way -- should ease the pain a bit! :-)

In the meantime, the customer asked if there was a workaround, a wa to get the newer behavior sooner.

A way to make a letter that is not a V and that has a unique alphabetic weight that could masquerade like a "new Swedish collation style W".

Now there is no lowercase version (only an uppercase one), but if one built a calculated column that replaced all instance of both W and w with ℣ then indexing on that calculated column will allow every case-insensitive Swedish Windows-style collation in SQL Server to return the expected results, and every case-sensitive Swedish Windows-style collation in SQL Server to return almost the expected results.

For completeness, replacing Ŵ and ŵ (U+0174 and U+0175, aka CAPITAL and SMALL LATIN LETTER W WITH CIRCUMFLEX) with ℣ plus some diacritic (like U+0302 -- COMBINING CIRCUMFLEX ACCENT) would handle the other "W-style" letter moved by Swedish/Finnish today....

Hello Michael,

A little off topic here! I spet practically the whole day trying to find the Windows default casing/collation tables that you talk about frequently on you posts.

Where are these tables? Are they located in some file in the OS? Are they in some place in the registry?

Please heeeeeeeeeeeeeelp me.

Thanks

So Michael, whatever happened to this issue?

We have an existing system used by public libraries in Sweden. Currently we use a DBMS from a different vendor, but we are evaluating MS SQL Server. However, the outdated Swedish collation turns out to be a royal pain in the posterior.

You see, some of our customers run libraries for schools. They need to separate grades. Each grade can be divided so that there are no more than twentyish students in the classroom. I.e. "Class 1A", "1B", "1C" and so on...all the way to "1V"..."1W"... oh wait... I can't put an index on this field, because the string "Class 1V" is identical to "Class 1W". Ooops.

So we tried to specify Latin1 as the collation for this particular column (and a handful other columns that need to be unique). Which leaves me dealing with lots of horrible error messages from MSSQL saying "Cannot resolve collation conflict between "Latin1_General_CI_AS" and "Finnish_Swedish_100_CI_AI" in add operator occurring in SELECT statement". It gets even worse once I blend Entity Framework into the mix.

Basically, from our POV, Windows' Swedish collation is broken and unusable. (No worries mate, PostgreSQL and MySQL has virtually non-existing collation support in comparision, so we long abandoned them as potential candidates).