by Michael S. Kaplan, published on 2006/01/11 09:51 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/01/11/511557.aspx
Ian Young asked the following question recently:
There seems to be a discrepancy in SQL Server in the way case is treated between the UPPER(), LOWER() functions and case-sensitive comparisons.
I would have thought that the following should produce no rows (where Numbers is a suitably large table of integers):SELECT n, c,
UNICODE(LOWER(c)) AS ln, LOWER(c) AS lc,
UNICODE(UPPER(c)) AS un, UPPER(c) AS uc FROM (
SELECT n, NCHAR(n) COLLATE Latin1_General_CS_AS AS c
FROM Numbers
WHERE n BETWEEN 32 AND 65533
) AS A
WHERE UNICODE(LOWER(c)) <> UNICODE(UPPER(c))
AND LOWER(c) = UPPER(c)That is, we're looking for Unicode characters where conversion to upper- and lowercase yield different characters (according to their Unicode codepoint), but compare as equal with a case-sensitive, accent-sensitive comparison.
The result is actually (SQL Server 2000, 2005) 176 rows consisting of a bunch of characters (U+04D0 -- U+04F9) from the Cyrillic block and many of the precomposed polytonic Greek characters in the Greek Extended block (U+1F00 -- U+1FFB).
This result is the same irrespective of whether the accent-sensitive or -insensitive collation is used. It also (only tested on SQL Server 2000) gets the same results for Greek, Cyrillic_General, Ukrainian, and Macedonian collations.Am I missing something here?
A suitably large table of integers:
CREATE TABLE Numbers (n int PRIMARY KEY)
INSERT INTO Numbers
SELECT D0.i + D1.i*10 + D2.i*100 + D3.i*1000 + D4.i*10000 FROM
(SELECT 0 AS i UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3
UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6
UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) AS D0,
(SELECT 0 AS i UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3
UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6
UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) AS D1,
(SELECT 0 AS i UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3
UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6
UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) AS D2,
(SELECT 0 AS i UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3
UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6
UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) AS D3,
(SELECT 0 AS i UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3
UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6
UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) AS D4
Ian is correct, there are a large number of characters that fall into this category.
The reason for this is that there are many differences between the operations involved with converting the case of a function and doing case insensitive comparisons.
For example:
So in summary, the difference between the operations that Ian noted is an intentional one for linguistic reasons, though I think he wins an award for the most extensive investigation of the issue. :-)
This post brought to you by "ς" (U+03c2, GREEK SMALL LETTER FINAL SIGMA)
# Maurits [MSFT] on 11 Jan 2006 12:39 PM:
# Abhinaba Basu [MSFT] on 12 Jan 2006 1:04 AM:
# Michael S. Kaplan on 12 Jan 2006 1:17 AM:
referenced by
2010/09/26 If case conversion were harder, people would do it less
2008/02/24 The idea has to do more than just make sense to me (aka How S-Sharp are *you* feeling today?)
2007/10/25 Jokes that aren't really all that funny in the end (aka At least SQL Server isn't on our case)
2007/08/24 Every character has a story #28: U+1e9e (CAPITAL SHARP S)
2007/07/31 If this post really describes a bug, would I actually put it in the WYNN column?
2007/04/10 When methods use collation to 'disturb the peace' we charge them with being 'out of sorts'
2006/12/07 SQL and the CLR: Part 1 (the things we can make work well)
2006/08/08 Collation != case, still
2006/06/02 What the @!#$% is the TERTIARY_WEIGHTS() function for?
2006/03/15 Casing and IgnoreCase are still not the same thing....