by Michael S. Kaplan, published on 2006/10/29 15:47 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/10/29/897317.aspx
The other day when I talked about When collations collide, John Ingres commented:
We've been looking at the implications of moving our database from
SQL_Latin1_General_CP1_xxxx
to
Latin1_General_xxx
since it is the recommended practice to use Windows collations instead of the legacy SQL collation but this is a large change affecting more than 30 applications and over a hundred production systems and I have been investigating differences between those two collations and information is extremely scarce. Is there a source of information with detailed information? Of course, we will test exhaustively but subtle differences in sort order for example are not always easily apparent.
thanks
John
There is really no single source of info John, but as the question becomes more and more common for people to ask, it will become more and more important for the folks on the SQL Server team to provide some of those answers (and to provide more consistent messaging around the right collation to use).
(That the defaults will also need to change for those times that the SQL Server product itself is using for its server default collation goes without saying!)
In the meantime, I have posted here about some of the real problems with the legacy SQL_* collations in posts like these:
and others as well. I mean, there is room for improvement in the Windows collations too (and there are posts that point out some of the issues there) but they are all much easier to deal with than the ones embedded in the SQL compatibility collations, which even at their best are just a mess that will never be changed or addressed or fixed.
Some of these posts focus on the international support of these legacy items, which ranges from lame to meager. But other posts focus more on a lot of the consistency problems with these collations, which simply don't match user expectations. They are definitely something to really think about getting rid of, if you can....
But I don't want to let the SQL Server team off the hook here -- they really need to provide more information on the differences here if they want people to migrate. Their solution is more compatible with prior versions than the one used in the Unicode changes from Jet Red 3.5 to 4.0 (they simply moved from all strings being non-Unicode to all strings being Unicode, period, and I can recall a conversation I had with Ken Whistler a few year back where I pointed this out and he was even more horrifed than I was!).
But some effort to underscore the benefits and especially the migration issues and differences to expect is really the least they can do at this point (in fact I'd say this a long overdue work item).
This post brought to you by ር (U+122d, a.k.a. ETHIOPIC SYLLABLE RE)
# John Ingres on 29 Oct 2006 6:41 PM:
Thanks for answering!
# John Ingres on 12 Nov 2006 6:27 PM:
Thanks for your comments and links!
BTW, while testing some databases converted to windows collation I found an undocumented scary thing; obvious if you think about it for more than 1 second but much too easy to forget:
Changing the db collation affect aggregates used with TOP clauses. TOP is always affected by the ORDER BY, which in turn is affected by the collation. Add a MIN(str_column) or MAX(str_column) to the mix and you have a nasty hard to detect bug.
referenced by
2008/11/11 Trying to ignore the small stuff is harder, if you're Arabic
2008/05/23 SQL vs. Windows collations: the tricycle versus the Porsche
2007/10/25 Not all in sync quite yet (aka SQL and the CLR and Windows and .NET)
2007/10/10 Collations (SQL Compat. vs. Windows) -- not so subtle in the difference department
2006/12/07 SQL and the CLR: Part 1 (the things we can make work well)