Unicode and SQL Collations have nothing to do with each other

by Michael S. Kaplan, published on 2006/06/06 03:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/06/06/618798.aspx


It was just a few days ago when I was channeling comedies of my youth in the post Je, for sure, from Sweden. And in that post, I responded to Alexey Sadomov's query about unusual behavior in the SQL_SwedishStd_Pref_CP1_CI_AS SQL Server compatibility collation where Unicode columns and non-Unicode columns returned different results, and I did have two theories about the behavior:

Uh oh, what happened? Why does sorting by the non-Unicode column break the Swedish/Finnish sort behavior for the SQL compatibility collation?

Or perhaps it never worked properly in prior versions and the bug is that the newer support of Unicode is where the incompatibility is coming from?

Between the two theories, the second seems a bit more likely to me....

Well, I suppose we could file this one under the heading of truth being stranger than my crackpot theories, because the answer is simple:

For Unicode, we always use NLS code, including SQL collations. So we do get inconsistent sorting results for varchar vs. nvarchar in SQL collations. That’s why we recommend users to use Windows collation for consistency. Though the performance difference prevents us from always recommending it.

The sort table was inherited long time ago from Sybase. I have no idea why particular characters are sorted in a particular way. (Read: nobody here knows)

If nothing else, I now have a good question to ask Ken Whistler of Sybase about. :-)

In the end, it is simple -- if you are storing your data as Unicode, then there really is no good reason to be using one of these old collations. Not even for reasons of  compatibility since the sort will not be compatible with the old sort anyway.

So take those way too retro SQL compatibility collations and dump 'em, today. They really are losing any semblance of useful purpose....

 

This post brought to you by (U+0c8c, a.k.a. KANNADA LETTER VOCALIC L)


no comments

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2009/02/18 In search of the Swedish Tipping Point....

2008/11/11 Trying to ignore the small stuff is harder, if you're Arabic

2008/05/23 SQL vs. Windows collations: the tricycle versus the Porsche

2007/10/25 Not all in sync quite yet (aka SQL and the CLR and Windows and .NET)

2007/10/10 Collations (SQL Compat. vs. Windows) -- not so subtle in the difference department

2006/12/07 SQL and the CLR: Part 1 (the things we can make work well)

2006/10/29 SQL Server: compatibility collations vs. Window collations

2006/06/07 Performance issues with language specific sorts?

go to newer or older post, or back to index or month or day