by Michael S. Kaplan, published on 2007/12/06 10:16 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/12/06/6661053.aspx
Yesterday, I was blathering about how In SQL Server, the distance between A and Z is wider than you might think.
So now let's add case sensitivity to the mix, which is a great way to find problems, as I pointed out in If you are more sensitive, you'll pick up on more problems and Case/kana/accent/width sensitive SQL Server, for testing.
In particular, it will help find the problems I am about to talk about. :-)
Now, we already know that you should avoid
LIKE '[ABCDEFGHIJKLMNOPQRSTUVWXYZ]'
like it is poison.
But let's say, on a case sensitive server, what do you think might be the difference between all of the following expressions?
LIKE '[A-Z]'
LIKE '[a-z]'
LIKE '[A-z]'
LIKE '[a-Z]'
Now wait before you answer.
First you might want to read Which comes first, 'a' or 'A' ? and Technically it *is* a hungarian sort.
So, how confident are you feeling now about ranges?
And now would you create an expression that would not break when the collation underneath it changed? :-)
Damn. You know, this series was certainly heading toward the areas where good expressions were hard to find and/or unfindable, but I really was not trying to get close so fast. And we still have some distance to go!
This post brought to you by Z (U+005a, LATIN CAPITAL LETTER Z, who sued since he feels like he did not get his money's worth out of yesterday's post so as part of the out-of-court settlement gets to sponsor today's post free!)
# Aaron on 6 Dec 2007 1:12 PM:
This is why you should never use '[A-Za-z]', but rather '\w'. Of course, i'm not familiar enough with SQL regular expressions, perhaps it's missing '\w' also known as 'match word character'.
#aaron
# Michael S. Kaplan on 6 Dec 2007 2:00 PM:
Exactly, Aaron -- though I am trying to go through the proofs on why this is the case very methodically, as each individual objection has people fight to figure out a new expression to use....
# Dean Harding on 6 Dec 2007 4:57 PM:
The "LIKE" operator has very simple syntax. Basically it supports "%" for "0 or more characters", "_" for exactly one character, and [] and [^] for in-range, not-in-range respectively.
To be honest, the number of possible uses for the LIKE operator is relatively small, and I tend to avoid it.
# Michael S. Kaplan on 6 Dec 2007 5:00 PM:
Deceptively simple, perhaps? :-)
k on 25 Sep 2011 3:59 AM:
k
referenced by
2008/07/25 Let's save some time and call them all IRregular expression engines
2007/12/11 In SQL Server, there is the rest of Unicode (aka the SiaO Incompleteness Theorem)
2007/12/10 In SQL Server, different collations implies different ranges (aka Not every table has its THORN)
2007/12/09 In SQL Server, the alphabet does not end at Z!
2007/12/08 In SQL Server, your ranges also need to ACCENT-uate the positives!