In SQL Server, A-Z, A-z, a-Z, and a-z may not mean the same thing!

by Michael S. Kaplan, published on 2007/12/06 10:16 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/12/06/6661053.aspx


Yesterday, I was blathering about how In SQL Server, the distance between A and Z is wider than you might think.

So now let's add case sensitivity to the mix, which is a great way to find problems, as I pointed out in If you are more sensitive, you'll pick up on more problems and Case/kana/accent/width sensitive SQL Server, for testing.

In particular, it will help find the problems I am about to talk about. :-)

Now, we already know that you should avoid

LIKE '[ABCDEFGHIJKLMNOPQRSTUVWXYZ]'

like it is poison.

But let's say, on a case sensitive server, what do you think might be the difference between all of the following expressions?

LIKE '[A-Z]'

LIKE '[a-z]'

LIKE '[A-z]'

LIKE '[a-Z]'

Now wait before you answer.

First you might want to read Which comes first, 'a' or 'A' ? and Technically it *is* a hungarian sort.

So, how confident are you feeling now about ranges?

And now would you create an expression that would not break when the collation underneath it changed? :-)

Damn. You know, this series was certainly heading toward the areas where good expressions were hard to find and/or unfindable, but I really was not trying to get close so fast. And we still have some distance to go!

 

This post brought to you by Z (U+005a, LATIN CAPITAL LETTER Z, who sued since he feels like he did not get his money's worth out of yesterday's post so as part of the out-of-court settlement gets to sponsor today's post free!)


# Aaron on 6 Dec 2007 1:12 PM:

This is why you should never use '[A-Za-z]', but rather '\w'.  Of course, i'm not familiar enough with SQL regular expressions, perhaps it's missing '\w' also known as 'match word character'.

#aaron

# Michael S. Kaplan on 6 Dec 2007 2:00 PM:

Exactly, Aaron -- though I am trying to go through the proofs on why this is the case very methodically, as each individual objection has people fight to figure out a new expression to use....

# Dean Harding on 6 Dec 2007 4:57 PM:

The "LIKE" operator has very simple syntax. Basically it supports "%" for "0 or more characters", "_" for exactly one character, and [] and [^] for in-range, not-in-range respectively.

To be honest, the number of possible uses for the LIKE operator is relatively small, and I tend to avoid it.

# Michael S. Kaplan on 6 Dec 2007 5:00 PM:

Deceptively simple, perhaps? :-)

k on 25 Sep 2011 3:59 AM:

k


referenced by

2008/07/25 Let's save some time and call them all IRregular expression engines

2007/12/11 In SQL Server, there is the rest of Unicode (aka the SiaO Incompleteness Theorem)

2007/12/10 In SQL Server, different collations implies different ranges (aka Not every table has its THORN)

2007/12/09 In SQL Server, the alphabet does not end at Z!

2007/12/08 In SQL Server, your ranges also need to ACCENT-uate the positives!

2007/12/07 In SQL Server, the wrong range can make you seem insensitive to one's width! (aka Do my V's look fat?)

go to newer or older post, or back to index or month or day