by Michael S. Kaplan, published on 2005/06/09 11:25 -07:00, original URI: http://blogs.msdn.com/michkap/archive/2005/06/09/427293.aspx
Yesterday at the TechEd booth in which I was sitting, someone was asking me how to get the comparisons in the .NET Framework to be consistent with the ones in SQL Server 2005.
This is a lot harder than it has to be, unfortunately!
First we will start with the constants used by Windows from winnls.h in the Platform SDK:
#define NORM_IGNORECASE 0x00000001 // ignore case
#define NORM_IGNORENONSPACE 0x00000002 // ignore nonspacing chars
#define NORM_IGNORESYMBOLS 0x00000004 // ignore symbols
#define NORM_IGNOREKANATYPE 0x00010000 // ignore kanatype
#define NORM_IGNOREWIDTH 0x00020000 // ignore width
#define SORT_STRINGSORT 0x00001000 // use string sort method
These definitions almost match the ones used by the .NET Framework, in the CompareOptions enumeration:
CompareOptions.IgnoreCase 1 (0x00000001)
CompareOptions.IgnoreKanaType 8 (0x00000008)
CompareOptions.IgnoreNonSpace 2 (0x00000002)
CompareOptions.IgnoreSymbols 4 (0x00000004)
CompareOptions.IgnoreWidth 16 (0x00000010)
CompareOptions.None 0 (0x00000000)
CompareOptions.Ordinal 1073741824 (0x40000000)
CompareOptions.StringSort 536870912 (0x20000000)
You can see where the differences are: NORM_IGNOREWIDTH/CompareOptions.IgnoreWidth, NORM_IGNOREKANATYPE/CompareOptions.IgnoreKanaType, and SORT_STRINGSORT/CompareOptions.StringSort. Let us keep these two differences in mind, they will become important soon.
In both Shiloh (SQL Server 2000) and Yukon (SQL Server 2005), the very useful COLLATIONPROPERTY has two very cool attributes you can grab:
Thus if I run the following query:
SELECT
name,
COLLATIONPROPERTY(name, 'CodePage') as CodePage,
COLLATIONPROPERTY(name, 'LCID') as LCID,
COLLATIONPROPERTY(name, 'ComparisonStyle') as ComparisonStyle,
description
FROM ::fn_helpcollations()
You get back a nice big table that has the exact information for the code page used in WideCharToMultiByte/MultiByteToWideChar calls, and the LCID/dwCmpFlags values to use in a CompareString call (excerpt here, I do not want top duplicate the whole 1029-row table Yukon returns!):
Collation name CP LCID Flags Description
Albanian_BIN 1250 1052 0 Albanian, binary sort
Albanian_BIN2 1250 1052 0 Albanian, binary code point comparison sort
Albanian_CI_AI 1250 1052 196611 Albanian, case-insensitive, accent-insensitive, kanatype-insensitive, width-insensitive
Albanian_CI_AI_WS 1250 1052 65539 Albanian, case-insensitive, accent-insensitive, kanatype-insensitive, width-sensitive
Albanian_CI_AI_KS 1250 1052 131075 Albanian, case-insensitive, accent-insensitive, kanatype-sensitive, width-insensitive
Albanian_CI_AI_KS_WS 1250 1052 3 Albanian, case-insensitive, accent-insensitive, kanatype-sensitive, width-sensitive
Albanian_CI_AS 1250 1052 196609 Albanian, case-insensitive, accent-sensitive, kanatype-insensitive, width-insensitive
Albanian_CI_AS_WS 1250 1052 65537 Albanian, case-insensitive, accent-sensitive, kanatype-insensitive, width-sensitive
Albanian_CI_AS_KS 1250 1052 131073 Albanian, case-insensitive, accent-sensitive, kanatype-sensitive, width-insensitive
Albanian_CI_AS_KS_WS 1250 1052 1 Albanian, case-insensitive, accent-sensitive, kanatype-sensitive, width-sensitive
Albanian_CS_AI 1250 1052 196610 Albanian, case-sensitive, accent-insensitive, kanatype-insensitive, width-insensitive
Albanian_CS_AI_WS 1250 1052 65538 Albanian, case-sensitive, accent-insensitive, kanatype-insensitive, width-sensitive
Albanian_CS_AI_KS 1250 1052 131074 Albanian, case-sensitive, accent-insensitive, kanatype-sensitive, width-insensitive
Albanian_CS_AI_KS_WS 1250 1052 2 Albanian, case-sensitive, accent-insensitive, kanatype-sensitive, width-sensitive
Albanian_CS_AS 1250 1052 196608 Albanian, case-sensitive, accent-sensitive, kanatype-insensitive, width-insensitive
Albanian_CS_AS_WS 1250 1052 65536 Albanian, case-sensitive, accent-sensitive, kanatype-insensitive, width-sensitive
Albanian_CS_AS_KS 1250 1052 131072 Albanian, case-sensitive, accent-sensitive, kanatype-sensitive, width-insensitive
Albanian_CS_AS_KS_WS 1250 1052 0 Albanian, case-sensitive, accent-sensitive, kanatype-sensitive, width-sensitive
I could not find an easy T-SQL way to make the flag values show up as hexidecimal. But the mappings would be:
Collation name Flags
Albanian_BIN 0x00000000
Albanian_BIN2 0x00000000
Albanian_CI_AI 0x00030003
Albanian_CI_AI_WS 0x00010003
Albanian_CI_AI_KS 0x00020003
Albanian_CI_AI_KS_WS 0x00000003
Albanian_CI_AS 0x00030001
Albanian_CI_AS_WS 0x00010001
Albanian_CI_AS_KS 0x00020001
Albanian_CI_AS_KS_WS 0x00000001
Albanian_CS_AI 0x00030002
Albanian_CS_AI_WS 0x00010002
Albanian_CS_AI_KS 0x00020002
Albanian_CS_AI_KS_WS 0x00000002
Albanian_CS_AS 0x00030000
Albanian_CS_AS_WS 0x00010000
Albanian_CS_AS_KS 0x00020000
Albanian_CS_AS_KS_WS 0x00000000
Updated query to give the hex values for LCID and ComparisonStyle directly (thank you, James Todd!):
SELECT
name,
COLLATIONPROPERTY(name, 'CodePage') as CodePage,
CONVERT(binary(4), COLLATIONPROPERTY(name, 'LCID')) as LCID,
CONVERT(binary(4), COLLATIONPROPERTY(name, 'ComparisonStyle')) as ComparisonStyle,
description
FROM ::fn_helpcollations()
So, if you take that flag value, convert all of the 0x00020000 to 0x00000010 (and 0x00010000 to 0x00000008!), you then have a value you can plug into the CompareInfo.Compare method.
But none of this will work with String.Compare!
If you want behavior between SQL Server and the CLR to have any chance of parity, you must use the methods off of CompareInfo.
Parity will seldom be 100%, because all of the SQL Server 2000 collations are based on the tables that shipped with Windows 2000 (just after Beta 1, before Beta 2), and Whidbey actually uses the same tables as Windows Server 2003. You will get much closer with the *_90 collations that were added in Yukon, though.
A lot of dancing around and converting and munging to get parity -- too much, I think. There is definitely room for people to improve the experience here in future releases!
This post brought to you by "Ǿ" (U+01fe, a.k.a. LATIN CAPITAL LETTER O WITH STROKE AND ACUTE)
# James Todd on Thursday, June 09, 2005 1:21 PM:
# Maurits on Thursday, June 09, 2005 3:37 PM:
# Michael S. Kaplan on Thursday, June 09, 2005 4:12 PM:
# James Todd on Thursday, June 09, 2005 8:04 PM:
# Maurits on Friday, June 10, 2005 12:47 PM:
# Michael S. Kaplan on Friday, June 10, 2005 4:43 PM:
# Richard Mitchell on Wednesday, June 29, 2005 5:39 AM:
# Michael S. Kaplan on Wednesday, June 29, 2005 7:41 AM:
# Anonymous on Tuesday, July 05, 2005 12:51 PM:
referenced by
2007/10/25 Not all in sync quite yet (aka SQL and the CLR and Windows and .NET)
2006/08/26 The myth of cross-product compatibility
2006/03/18 Just in case you thought I was blathering. :-)
2005/11/29 SQL Server's cultural sensitivities
2005/11/08 SQL Compatibility collations are a bit too retro for me
2005/11/07 SQL Server and the CLR have different ways of getting the money
2005/10/09 Extending collation support in SQL Server and Jet, Part 3 (THAT CLASS)
2005/09/18 Extending collation support in SQL Server and Jet, Part 2 (generating sort keys)
2005/07/08 Real developers use CompareInfo's Compare (Part 1)
2005/07/06 Coming soon
2005/07/05 'Need more input, Stephanie!'
2005/06/16 More on locales in SQL Server
2005/06/10 TechEd Orlando: Day 5
2005/06/10 TechEd Orlando: Day 4