Wanna take a look at the collation bug Kim pointed out? You're in for a hell of a Tripp!

by Michael S. Kaplan, published on 2007/10/08 10:16 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/10/08/5347670.aspx

Over on the side is that list of blogs I read, in various categories, one of them is Kimberly L. Tripp's, who I once had like an hour-long conversation with, incredibly impressed by her for what she was saying before being even more impressed after realizing who she is and connecting her to all the stuff she has said in the past.

I think the last time I linked to her was back in May 2006 (Avoiding SQL Server collation woes in the TempDB), but it was a great tip since I was doing it the hard[er] way when I really didn't need to....

The fact is, she is a hell of a Tripp. :-)

Late last week when she posted The perils of case-insensitive data (and our life in tangent-land) I was captivated once again by the words.

Not for the two problems the post was nominally posing solutions for (both of which had good solutions suggested) but one of the tangents, quoted in part here:

-- First, I'll create a test database. Without a collation specified,
-- it will use the server's default collation.

CREATE DATABASE TestAdventureWorks
go

-- Verify the database collation
SELECT DATABASEPROPERTYEX('TestAdventureWorks', 'Collation')
go

-- database is set to SQL_Latin1_General_CP1_CI_AS as expected
-- this is a case-insensitive database

USE TestAdventureWorks
go

SELECT LastName collate database_default AS LastName
, FirstName collate database_default AS FirstName
, MiddleName collate database_default AS MiddleName
INTO MyTestContacts
FROM Adventureworks.Person.Contact
go

SELECT *
FROM MyTestContacts
WHERE Lastname = N'Adams'
go -- (86 row(s) affected)

SELECT *
FROM MyTestContacts
WHERE Lastname = N'adams'
go -- (86 row(s) affected)

SELECT *
FROM MyTestContacts
WHERE Lastname COLLATE Latin1_General_CS_AS_KS_WS = N'Adams'
go -- (86 row(s) affected)

SELECT *
FROM MyTestContacts
WHERE Lastname COLLATE Latin1_General_CS_AS_KS_WS = N'adams'
go -- (0 row(s) affected)

-- Next, create a view:
CREATE VIEW ContactLastNameCaseSensitive
AS
SELECT LastName COLLATE Latin1_General_CS_AS_KS_WS AS CSName
FROM MyTestContacts
go

SELECT *
FROM ContactLastNameCaseSensitive
WHERE CSName = N'Adams'
go -- (86 row(s) affected)

SELECT *
FROM ContactLastNameCaseSensitive
WHERE CSName = N'adams'
go -- (0 row(s) affected)

And, everything works... in TestAdventureworks. In the *real* AdventureWorks, I get an error when I try to create the view:
Msg 2791, Level 16, State 5, Procedure ContactLastNameCaseSensitive, Line 3
Could not resolve expression for schemabound object or constraint.

So, this is the first issue. It seems as though you can't create the view if your database has a different collation than the server collation. Well, (again), I haven't spent all that much time on this one but I did repro what the chain on the forum seemed to have found.

This does indeed seem like a bug, but one that is easy to imagine in a world where collations that will return identical results (e.g. Unicode data in LATIN1_GENERAL_CI_AS and GREEK_CI_AS) yet they will be considered incompatible for the purposes of trying to compare columns using each of them.

Those errors all tend to fall under the "Columns in this expression have incompatible collations" or "The collation properties of columns <Columnname> and <Columnname> do not match" type of categories, which if trapped during the view creation process I can easily imagine being re-thrown under the more generic "Could not resolve expression" category.

Which does not mean it wouldn't still be a bug, it is just easy to see how (given the generic (documented) limitation where an indexed view definition cannot contain the collation clause, it is easy to imagine running into this kind of problem in view creation any time the collations do not match, even when they are complete subsets of on another and should be able to do better, as I pointed out in A&P of Sort Keys, part 8 (aka You can often think of ignoring weights as a form of ignorance) only worse since it leads to not just a performance problem but a bug.

It is hard to say which looks worse:

Collations that are literally identical (Unicode data, LATIN1_GENERAL_CI_AS vs. GREEK_CI_AS) being incompatible under any circumstances, or
Collations that are literal subsets (LATIN1_GENERAL_CI_AS vs. LATIN1_GENERAL_CS_AS) being incompatible under any circumstances.

The former is clearly stupider from the standpoint of a Windows collation purist who views the arbitrary separation of these identical collations as a flaw in the SQL Server collation design, sure.

(That would be me, Mondays, Wednesdays, and Fridays!)

But the latter is much more likely to come up in real world customer situations, as it did here for Kim.

(That would be me Tuesdays, Thursdays, and Saturdays!)

I'm going to ask some people over on the SQL Server side about this issue, in any case -- it is certainly not a documented limitation if there is no way to address it, but whether it can be fixed o not I'd at least like to make sure it is tracked somehow....

This post brought to you by ⓚ (U+24da, a.k.a. CIRCLED LATIN SMALL LETTER K)

Vindalou on 9 Oct 2007 8:46 PM:

I have taken a couple of classes from Ms. Tripp - one of the finest instructors I have ever had. At the end of the week my brain was stuffy and I was worn out, but happy. She has an incredible knowledge of the database world.

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day