Achieving case insensitivity

by Michael S. Kaplan, published on 2005/05/11 02:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/05/11/416290.aspx


Julia Lerman said just a few hours ago that she is a SQL Server luddite.

I think that is not really true. I've read her blog, and I know people who have heard her speak in public. I plan to see her speak soon, in Orlando, because of that!

But when I read one part of her post:

Today I was trying to do a query that was case sensitive. I guess in all the years I have used SQL, I have never needed a case sensitive search. So I didn't know that I had to have my field as a varchar so that I could cast it to varbinary in order to do the comparison. 

I had to say something about the topic. :-)

(This post contains several topics I will be covering in my TechEd 2005 talks in Orlando and Amsterdam -- lets hope TechEdBloggers picks it up, they missed this other one!)

Now the easiest way to do a case sensitive sort on a column is to use a case-sensitive collation, like with this query:

SELECT * FROM Customers
ORDER BY TeritoryCode COLLATE Latin1_General_CS_AS

or something similar. You are now instantly case sensitive for that ordering.

You can also specify a collation to use in comparisons rather than just ordering, like with this query:

SELECT * FROM Customers
WHERE Code1 = Code2 COLLATE Latin1_General_CS_AI

Now you will want to read up at When good SQL queries have trouble.... and Making SQL Server index usage a bit more deterministic to avoid any performance problems with doing these types of things, but otherwise it is the best way to get case sensitivity.

Note that you can even use a binary collation there to make it even more sensitive to differences between characters, if you need to do that. But note that binary will often make things that really are identical (as far as G-d and Unicode would see it) look like they are different. So Binary may not be the way to go. YMMV....

 

This post brought to you by "₭" (U+20ad, a.k.a. KIP SIGN)


# Dean Harding on 11 May 2005 12:54 AM:

I would also imagine that a case-insentive collation would be much faster that casting your column to a VARBINARY! Especially if you use your indexed computed column trick...

By the way, I was going to ask you whether you could have used an indexed view for your 'alternate collation' trick, but apparently you can't create indexs on views that use the COLLATE clause. At least, when I tried it, SQL Server gave me the warning "The optimizer cannot use the index because the select list of the view contains a non-aggregate expression." I'm not sure if I'm just doing something wrong, if it's actually not supported. Here's the T-SQL I was trying:

CREATE TABLE tbl
(c NVARCHAR COLLATE Latin1_General_CI_AS)

CREATE VIEW view_jap
WITH SCHEMABINDING
AS
SELECT
[c] COLLATE Japanese_CI_AS AS [c_jap]
FROM
dbo.tbl

CREATE UNIQUE CLUSTERED INDEX
IX_view_jap
ON view_jap (c_jap)

Do you know what's going on there? :)

# Michael S. Kaplan on 11 May 2005 1:12 AM:

Ah yes, it will not work in a view -- that whole trouble with the engine not being able to do the whole thing deterministically. But you can do it in the table like at http://blogs.msdn.com/michkap/archive/2005/05/06/415199.aspx (the place I meant to link to here, I will change it now) and then use THAT column in a view....

# Dean Harding on 11 May 2005 1:38 AM:

Yeah, I was just wondering if you could it all in a view, which would stop you having lots of columns with the same data but different sorting in the one table. I guess if you do that, you can also just create a view for each and only select the column with the collation you care about.

Do you know if it's fixed in SQL Server 2005? I mean, collation isn't exactly non-deterministic, so it *should* work as far as I can see.

# Michael S. Kaplan on 11 May 2005 1:48 AM:

It is something I would have to test out, but offhand I would guess that the problem is still there (though they fixed the problem that required you to select the specific column with the index on it, which is much of the battle!).

It is on my list of things to try, fwiw....

# Julie on 11 May 2005 7:27 AM:

See , now you have just proved my point! :-) I looked and looked and the CAST example was all I could find. I will modify my sproc and also my post to point to this better way of achieving this! Thanks.

# Michael S. Kaplan on 11 May 2005 8:19 AM:

My pleasure -- and glad I could help!

One of the things I like best about what I do is that it is often complementary to other skills. So I have the chance to work with people whose work impresses me at the same time that I can do something helpful, even when I am not an expert in their area of expertise.

If nothing else it helps me look much more impressive than I actually am....

# Julie on 11 May 2005 10:36 AM:

oh and I just realized that you said something about seeing me speak [I guess that would be hearing me speak, or seeing me present :-)] in Orlando? Not if you mean TechEd. I'm flattered that you would even make that presumption. I have some BOFs but I am definitely not a TechEd presenter!

# Michael S. Kaplan on 11 May 2005 11:28 AM:

I could just go to a BoF if there is one that I would actually be an appropriate "bird" for. :-)

Or I'll have to try to make it to Vermont some day, then. Though it has been many years since I have been to the NE, and even more since anywhere past Connecticut...

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2005/05/15 Not all SQL Server collations are created equal

2005/05/11 Case/kana/accent/width sensitive SQL Server, for testing

go to newer or older post, or back to index or month or day