SQL Compatibility collations are a bit too retro for me

by Michael S. Kaplan, published on 2005/11/08 07:59 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/11/08/490305.aspx


Now when I say that 'SQL Compatibility collations are a bit too retro for me' I mean retro in the sense of retrograde....

Last month, Richard asked me:

Still working on our implementation of this and having problems with the sort of collation SQL_Latin1_General_CP1_CI_AS which I would have thought had pretty perfect implementation being the default etc.

When there are apostrophes in the data for example...

O'Connor
O'Connor
O'Halloran Hill
O'Sullivans Beach
Oak Flats
Oak Park
Oakbank
Oakdale
Oakden
Oakford
Oakhurst
Oaklands Junction
Oaklands Park
Oakleigh
Oakleigh South
Oakville
Oatlands
Oatlands
Oatley
Ocean Grove
Ocean Reef
Officer

My CompareOptions are 0x00030001 from the SQL Server which I convert to 0x19 for the

CompareInfo.Compare(a,b,m_PrevCollation.CompareOptions)

So what is the best way for me to compare SQL_ collations in .NET ? I seem to have stumbled across another inconsistency in the .NET string ordering in that case :)
 
a
a'
A
 
using Latin1_General_CS_AS collation gets things in a different order to SQL Server.

Right the problem we have is that people want to compare their databases that still use the SQL_ collation so we have to have a function that will sort them in as close to the SQL_ order as possible.

I understand about the punctuation issue it's just it seems to be sorted incorrectly in .NET whereas it's sorted like that in SQL Server.

Am I missing something or will I have to develop a work around for this case?

It is funny how there is no specific rule in English about how the sort is supposed to happen, and further that any time an organization or government tries to make a certain order "correct" the order chosen is as arbitrary as any other, yet we can latch onto one specific ordering as being the "correct" one. :-)

In this case, the SQL compatibility collations in SQL Server are not the default because they are any more correct or proper; they are the default for one simple reason: if you cover less territory you can be faster. Back before SQL Server supported Unicode, when the collation setting was at the server level only could not be changed without a reinstall, it was easier to be faster.

But that does not make the order it returned any better or worse than anything else ever created. And it is certainly not going to be 100% re-creatable anywhere other than SQL Server, although you can come close when you play with the various ComoareOptions flags. However, the meaning of the 0x00030001 as indicated in the information in String.Compare is for sissies (not for people who want SQLCLR consistency) would indicate that the meaning was:

NORM_IGNORECASE | NORMIGNOREKANATYPE | NORM_IGNOREWIDTH

This does equate to 0x19 in .NET, or

CompareOptions.IgnoreCase | CompareOptions.IgnoreKanaType | CompareOptions.IgnoreWidth

when it is clear that there is also a bit of SORT_STRINGSORT/CompareOptions.StringSort in there as well.

One could consider this a bug in the ComparisonStyle returned by the SQL compatibility collations. Though I am not sure if all of them show this behavior, to tell you the truth.

The best answer is to stop using those SQL compatibility collations! Because not even using tricks like the ComparisonStyle can give you 100% answers there....

 

This post brought to you by "a" (U+0061, a.k.a. LATIN SMALL LETTER A)


# Michael S. Kaplan on 8 Nov 2005 8:12 AM:

For more info on SORT_STRINGSORT, see http://blogs.msdn.com/356280.aspx for the explanation....

referenced by

2008/11/11 Trying to ignore the small stuff is harder, if you're Arabic

2008/05/23 SQL vs. Windows collations: the tricycle versus the Porsche

2007/10/25 Not all in sync quite yet (aka SQL and the CLR and Windows and .NET)

2007/10/10 Collations (SQL Compat. vs. Windows) -- not so subtle in the difference department

2006/12/07 SQL and the CLR: Part 1 (the things we can make work well)

2006/10/29 SQL Server: compatibility collations vs. Window collations

2006/06/06 Unicode and SQL Collations have nothing to do with each other

2006/06/02 Je, for sure, from Sweden.

go to newer or older post, or back to index or month or day