SQL vs. Windows collations: the tricycle versus the Porsche

by Michael S. Kaplan, published on 2008/05/23 10:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/05/23/8536649.aspx

Now with gas prices being what they are, there may be some real advantages to considering riding the tricycle instead of the Porsche, especially if one can fit in the smaller space and one does not have to take it as far. And if one does not need the extra features.

But in the end one still has a tricycle. And one might not fit in it very well, and has to decide if it is the best plan....

The mail from the other day via the contact link was:

Someone posted this:
----------
CREATE TABLE #TEMP (name varchar(80))
INSERT INTO #TEMP values ('Malteser Schloßschule')
INSERT INTO #TEMP values ('Malteser Schlossschule')

CREATE TABLE #TEMP1 (name nvarchar(160))
INSERT INTO #TEMP1 values ('Malteser Schloßschule')
INSERT INTO #TEMP1 values ('Malteser Schlossschule')

Query 1:
SELECT * FROM #TEMP WHERE name ='Malteser Schlossschule'
-- Results :- 'Malteser Schlossschule'

Query 2:
SELECT * FROM #TEMP1 WHERE name ='Malteser Schlossschule'
-- Results :- Malteser Schloßschule
-- Malteser Schlossschule

The collation of tempdb is SQL_Latin1_General_CP1_CI_AS. I'm using SQL Server 2005.
---------------------------

Why does this happen? I'd expect that it would work in the reverse way. 2 rows with varchar and 1 with nvarchar.

This is actually quite expected, believe it or not. There are several contributing issues here:

First, as described in SQL Compatibility collations are a bit too retro for me, the order in the SQL compatibility collations is not a very good one. Among other things is the fact that they do not support the German Sharp S in any real sense, which makes them kind of bad for German. There are many other examples, but this is the basic reason why the first query returns one row.

Second, as described in Unicode and SQL Collations have nothing to do with each other, there is no Unicode support in the SQL compatibility collations. The server is forced to go with the closest Windows collation in this case, and all Windows collations have the Sharp S support in the default table, which is why the second query returns two rows.

Thirdly, as described in SQL Server: compatibility collations vs. Window collations, there are a whole lot of differences between these two when they are compared head to head -- it isn't just like comparing apples and oranges; it is like comparing bicycles and earmuffs.

There really is no way to put these two different kinds of collations in an attempt to compare them. They are way too different....

In the end, even non-Unicode data is better with a Windows collation -- just like even the toddler who fits well in the tricycle is better off getting a ride to preschool from a parent in the Porsche. Sure there is a bit of extra work with car seats and such, but it is much better for getting the job done than trying to force this less adequate mechanism to do everything.

This post brought to you by ር (U+122d, a.k.a. ETHIOPIC SYLLABLE RE)

no comments

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day