Not all SQL Server collations are created equal

by Michael S. Kaplan, published on 2005/05/15 12:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/05/15/417601.aspx


Not all of the collations in SQL Server 2005 (a.k.a. Yukon) and earlier versions have coverage over the same full sets of Unicode code points.

(By the way, this topic will be covered during my TechEd sessions, particularly DAT290: Databases for the World: Designing Multilingual Databases Using SQL Server 2005, along with several of the other posts I have done lately like this one, this one, and this other one....)

There are basically four different "families" of collations in SQL Server:

This can become an issue if you have data that is using a collation in one of these families and then try to run a query on the data that attempts to use a collation in one of the other families. You may find data being ignored in one that is recognized in the other.

Because of this fact, it is crucial to make sure that you do not arbitrarily move between collations in these different families without working through the potential consequences of the move.

Now note that this is not a bug -- if one of the older collations were to have code points added to it between versions and nothing is re-inexed, then any upgrade would return different results than the indexes, which can lead to index corruption and data inonsistencies. Now that would be a bug!

And it is certainly better than the 'safe' alternatives -- (a) forcing everyone to reindex on upgrade or (b) never adding support for new languages. Both of those alternatives would bw pretty bad in my opinion, too.

But it does mean you have to plan out what you want to do when it comes to changing collations....

Another time, I will give some examples of these "gotcha" cases (including talking about the non-Unicode scenarios).

 

This post brought to you by "ޓ" (U+0793, a.k.a. THAANA LETTER TAVIYANI)


no comments

referenced by

2006/05/25 SQL Server's cross-version collation support

2005/05/18 No data loss (as long as it is Unicode data)

go to newer or older post, or back to index or month or day