Azeri serious bug for non-Unicode SQL Server columns?

by Michael S. Kaplan, published on 2008/11/25 10:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/11/25/9140152.aspx


So it all started in a conversation with some of the folks from the SQL Server team when I was at PASS.

What did they expect? Send 600 Microsoft employees to an event and you are bound to run into some of them just in the act of walking around!

They were telling me about an interesting bug in a couple collations in SQL Server 2008 -- bugs that only impact non-Unicode columns for those couple of collations.

Apparently they got a couple code page values wrong.

This was "Azeri Serious Bug".

Once I was back in front of my computer, I ran the following query:

SELECT
  name,
  COLLATIONPROPERTY(name, 'CodePage') As CodePage,
  CONVERT(binary(4), COLLATIONPROPERTY(name, 'LCID')) As LCID,
  CONVERT(binary(4), COLLATIONPROPERTY(name, 'ComparisonStyle')) As ComparisonStyle,
  description
FROM
  ::fn_helpcollations()
WHERE
  name LIKE '%Azer%' OR name LIKE '%Turki%'

in SQL Server 2005:

Azeri_Cyrillic_90_BIN    1251    0x0000082C    0x00000000    Azeri-Cyrillic-90, binary sort
Azeri_Cyrillic_90_BIN2    1251    0x0000082C    0x00000000    Azeri-Cyrillic-90, binary code point comparison sort
Azeri_Cyrillic_90_CI_AI    1251    0x0000082C    0x00030003    Azeri-Cyrillic-90, case-insensitive, accent-insensitive, kanatype-insensitive, width-insensitive
Azeri_Cyrillic_90_CI_AI_WS    1251    0x0000082C    0x00010003    Azeri-Cyrillic-90, case-insensitive, accent-insensitive, kanatype-insensitive, width-sensitive
Azeri_Cyrillic_90_CI_AI_KS    1251    0x0000082C    0x00020003    Azeri-Cyrillic-90, case-insensitive, accent-insensitive, kanatype-sensitive, width-insensitive
Azeri_Cyrillic_90_CI_AI_KS_WS    1251    0x0000082C    0x00000003    Azeri-Cyrillic-90, case-insensitive, accent-insensitive, kanatype-sensitive, width-sensitive
Azeri_Cyrillic_90_CI_AS    1251    0x0000082C    0x00030001    Azeri-Cyrillic-90, case-insensitive, accent-sensitive, kanatype-insensitive, width-insensitive
Azeri_Cyrillic_90_CI_AS_WS    1251    0x0000082C    0x00010001    Azeri-Cyrillic-90, case-insensitive, accent-sensitive, kanatype-insensitive, width-sensitive
Azeri_Cyrillic_90_CI_AS_KS    1251    0x0000082C    0x00020001    Azeri-Cyrillic-90, case-insensitive, accent-sensitive, kanatype-sensitive, width-insensitive
Azeri_Cyrillic_90_CI_AS_KS_WS    1251    0x0000082C    0x00000001    Azeri-Cyrillic-90, case-insensitive, accent-sensitive, kanatype-sensitive, width-sensitive
Azeri_Cyrillic_90_CS_AI    1251    0x0000082C    0x00030002    Azeri-Cyrillic-90, case-sensitive, accent-insensitive, kanatype-insensitive, width-insensitive
Azeri_Cyrillic_90_CS_AI_WS    1251    0x0000082C    0x00010002    Azeri-Cyrillic-90, case-sensitive, accent-insensitive, kanatype-insensitive, width-sensitive
Azeri_Cyrillic_90_CS_AI_KS    1251    0x0000082C    0x00020002    Azeri-Cyrillic-90, case-sensitive, accent-insensitive, kanatype-sensitive, width-insensitive
Azeri_Cyrillic_90_CS_AI_KS_WS    1251    0x0000082C    0x00000002    Azeri-Cyrillic-90, case-sensitive, accent-insensitive, kanatype-sensitive, width-sensitive
Azeri_Cyrillic_90_CS_AS    1251    0x0000082C    0x00030000    Azeri-Cyrillic-90, case-sensitive, accent-sensitive, kanatype-insensitive, width-insensitive
Azeri_Cyrillic_90_CS_AS_WS    1251    0x0000082C    0x00010000    Azeri-Cyrillic-90, case-sensitive, accent-sensitive, kanatype-insensitive, width-sensitive
Azeri_Cyrillic_90_CS_AS_KS    1251    0x0000082C    0x00020000    Azeri-Cyrillic-90, case-sensitive, accent-sensitive, kanatype-sensitive, width-insensitive
Azeri_Cyrillic_90_CS_AS_KS_WS    1251    0x0000082C    0x00000000    Azeri-Cyrillic-90, case-sensitive, accent-sensitive, kanatype-sensitive, width-sensitive
Azeri_Latin_90_BIN    1254    0x0000042C    0x00000000    Azeri-Latin-90, binary sort
Azeri_Latin_90_BIN2    1254    0x0000042C    0x00000000    Azeri-Latin-90, binary code point comparison sort
Azeri_Latin_90_CI_AI    1254    0x0000042C    0x00030003    Azeri-Latin-90, case-insensitive, accent-insensitive, kanatype-insensitive, width-insensitive
Azeri_Latin_90_CI_AI_WS    1254    0x0000042C    0x00010003    Azeri-Latin-90, case-insensitive, accent-insensitive, kanatype-insensitive, width-sensitive
Azeri_Latin_90_CI_AI_KS    1254    0x0000042C    0x00020003    Azeri-Latin-90, case-insensitive, accent-insensitive, kanatype-sensitive, width-insensitive
Azeri_Latin_90_CI_AI_KS_WS    1254    0x0000042C    0x00000003    Azeri-Latin-90, case-insensitive, accent-insensitive, kanatype-sensitive, width-sensitive
Azeri_Latin_90_CI_AS    1254    0x0000042C    0x00030001    Azeri-Latin-90, case-insensitive, accent-sensitive, kanatype-insensitive, width-insensitive
Azeri_Latin_90_CI_AS_WS    1254    0x0000042C    0x00010001    Azeri-Latin-90, case-insensitive, accent-sensitive, kanatype-insensitive, width-sensitive
Azeri_Latin_90_CI_AS_KS    1254    0x0000042C    0x00020001    Azeri-Latin-90, case-insensitive, accent-sensitive, kanatype-sensitive, width-insensitive
Azeri_Latin_90_CI_AS_KS_WS    1254    0x0000042C    0x00000001    Azeri-Latin-90, case-insensitive, accent-sensitive, kanatype-sensitive, width-sensitive
Azeri_Latin_90_CS_AI    1254    0x0000042C    0x00030002    Azeri-Latin-90, case-sensitive, accent-insensitive, kanatype-insensitive, width-insensitive
Azeri_Latin_90_CS_AI_WS    1254    0x0000042C    0x00010002    Azeri-Latin-90, case-sensitive, accent-insensitive, kanatype-insensitive, width-sensitive
Azeri_Latin_90_CS_AI_KS    1254    0x0000042C    0x00020002    Azeri-Latin-90, case-sensitive, accent-insensitive, kanatype-sensitive, width-insensitive
Azeri_Latin_90_CS_AI_KS_WS    1254    0x0000042C    0x00000002    Azeri-Latin-90, case-sensitive, accent-insensitive, kanatype-sensitive, width-sensitive
Azeri_Latin_90_CS_AS    1254    0x0000042C    0x00030000    Azeri-Latin-90, case-sensitive, accent-sensitive, kanatype-insensitive, width-insensitive
Azeri_Latin_90_CS_AS_WS    1254    0x0000042C    0x00010000    Azeri-Latin-90, case-sensitive, accent-sensitive, kanatype-insensitive, width-sensitive
Azeri_Latin_90_CS_AS_KS    1254    0x0000042C    0x00020000    Azeri-Latin-90, case-sensitive, accent-sensitive, kanatype-sensitive, width-insensitive
Azeri_Latin_90_CS_AS_KS_WS    1254    0x0000042C    0x00000000    Azeri-Latin-90, case-sensitive, accent-sensitive, kanatype-sensitive, width-sensitive
Turkish_BIN    1254    0x0000041F    0x00000000    Turkish, binary sort
Turkish_BIN2    1254    0x0000041F    0x00000000    Turkish, binary code point comparison sort
Turkish_CI_AI    1254    0x0000041F    0x00030003    Turkish, case-insensitive, accent-insensitive, kanatype-insensitive, width-insensitive
Turkish_CI_AI_WS    1254    0x0000041F    0x00010003    Turkish, case-insensitive, accent-insensitive, kanatype-insensitive, width-sensitive
Turkish_CI_AI_KS    1254    0x0000041F    0x00020003    Turkish, case-insensitive, accent-insensitive, kanatype-sensitive, width-insensitive
Turkish_CI_AI_KS_WS    1254    0x0000041F    0x00000003    Turkish, case-insensitive, accent-insensitive, kanatype-sensitive, width-sensitive
Turkish_CI_AS    1254    0x0000041F    0x00030001    Turkish, case-insensitive, accent-sensitive, kanatype-insensitive, width-insensitive
Turkish_CI_AS_WS    1254    0x0000041F    0x00010001    Turkish, case-insensitive, accent-sensitive, kanatype-insensitive, width-sensitive
Turkish_CI_AS_KS    1254    0x0000041F    0x00020001    Turkish, case-insensitive, accent-sensitive, kanatype-sensitive, width-insensitive
Turkish_CI_AS_KS_WS    1254    0x0000041F    0x00000001    Turkish, case-insensitive, accent-sensitive, kanatype-sensitive, width-sensitive
Turkish_CS_AI    1254    0x0000041F    0x00030002    Turkish, case-sensitive, accent-insensitive, kanatype-insensitive, width-insensitive
Turkish_CS_AI_WS    1254    0x0000041F    0x00010002    Turkish, case-sensitive, accent-insensitive, kanatype-insensitive, width-sensitive
Turkish_CS_AI_KS    1254    0x0000041F    0x00020002    Turkish, case-sensitive, accent-insensitive, kanatype-sensitive, width-insensitive
Turkish_CS_AI_KS_WS    1254    0x0000041F    0x00000002    Turkish, case-sensitive, accent-insensitive, kanatype-sensitive, width-sensitive
Turkish_CS_AS    1254    0x0000041F    0x00030000    Turkish, case-sensitive, accent-sensitive, kanatype-insensitive, width-insensitive
Turkish_CS_AS_WS    1254    0x0000041F    0x00010000    Turkish, case-sensitive, accent-sensitive, kanatype-insensitive, width-sensitive
Turkish_CS_AS_KS    1254    0x0000041F    0x00020000    Turkish, case-sensitive, accent-sensitive, kanatype-sensitive, width-insensitive
Turkish_CS_AS_KS_WS    1254    0x0000041F    0x00000000    Turkish, case-sensitive, accent-sensitive, kanatype-sensitive, width-sensitive

and in SQL Server 2008:

Azeri_Cyrillic_100_BIN    1251    0x0000082C    0x00000000    Azeri-Cyrillic-100, binary sort
Azeri_Cyrillic_100_BIN2    1251    0x0000082C    0x00000000    Azeri-Cyrillic-100, binary code point comparison sort
Azeri_Cyrillic_100_CI_AI    1251    0x0000082C    0x00030003    Azeri-Cyrillic-100, case-insensitive, accent-insensitive, kanatype-insensitive, width-insensitive
Azeri_Cyrillic_100_CI_AI_WS    1251    0x0000082C    0x00010003    Azeri-Cyrillic-100, case-insensitive, accent-insensitive, kanatype-insensitive, width-sensitive
Azeri_Cyrillic_100_CI_AI_KS    1251    0x0000082C    0x00020003    Azeri-Cyrillic-100, case-insensitive, accent-insensitive, kanatype-sensitive, width-insensitive
Azeri_Cyrillic_100_CI_AI_KS_WS    1251    0x0000082C    0x00000003    Azeri-Cyrillic-100, case-insensitive, accent-insensitive, kanatype-sensitive, width-sensitive
Azeri_Cyrillic_100_CI_AS    1251    0x0000082C    0x00030001    Azeri-Cyrillic-100, case-insensitive, accent-sensitive, kanatype-insensitive, width-insensitive
Azeri_Cyrillic_100_CI_AS_WS    1251    0x0000082C    0x00010001    Azeri-Cyrillic-100, case-insensitive, accent-sensitive, kanatype-insensitive, width-sensitive
Azeri_Cyrillic_100_CI_AS_KS    1251    0x0000082C    0x00020001    Azeri-Cyrillic-100, case-insensitive, accent-sensitive, kanatype-sensitive, width-insensitive
Azeri_Cyrillic_100_CI_AS_KS_WS    1251    0x0000082C    0x00000001    Azeri-Cyrillic-100, case-insensitive, accent-sensitive, kanatype-sensitive, width-sensitive
Azeri_Cyrillic_100_CS_AI    1251    0x0000082C    0x00030002    Azeri-Cyrillic-100, case-sensitive, accent-insensitive, kanatype-insensitive, width-insensitive
Azeri_Cyrillic_100_CS_AI_WS    1251    0x0000082C    0x00010002    Azeri-Cyrillic-100, case-sensitive, accent-insensitive, kanatype-insensitive, width-sensitive
Azeri_Cyrillic_100_CS_AI_KS    1251    0x0000082C    0x00020002    Azeri-Cyrillic-100, case-sensitive, accent-insensitive, kanatype-sensitive, width-insensitive
Azeri_Cyrillic_100_CS_AI_KS_WS    1251    0x0000082C    0x00000002    Azeri-Cyrillic-100, case-sensitive, accent-insensitive, kanatype-sensitive, width-sensitive
Azeri_Cyrillic_100_CS_AS    1251    0x0000082C    0x00030000    Azeri-Cyrillic-100, case-sensitive, accent-sensitive, kanatype-insensitive, width-insensitive
Azeri_Cyrillic_100_CS_AS_WS    1251    0x0000082C    0x00010000    Azeri-Cyrillic-100, case-sensitive, accent-sensitive, kanatype-insensitive, width-sensitive
Azeri_Cyrillic_100_CS_AS_KS    1251    0x0000082C    0x00020000    Azeri-Cyrillic-100, case-sensitive, accent-sensitive, kanatype-sensitive, width-insensitive
Azeri_Cyrillic_100_CS_AS_KS_WS    1251    0x0000082C    0x00000000    Azeri-Cyrillic-100, case-sensitive, accent-sensitive, kanatype-sensitive, width-sensitive
Azeri_Latin_100_BIN    1254    0x0000042C    0x00000000    Azeri-Latin-100, binary sort
Azeri_Latin_100_BIN2    1254    0x0000042C    0x00000000    Azeri-Latin-100, binary code point comparison sort
Azeri_Latin_100_CI_AI    1254    0x0000042C    0x00030003    Azeri-Latin-100, case-insensitive, accent-insensitive, kanatype-insensitive, width-insensitive
Azeri_Latin_100_CI_AI_WS    1254    0x0000042C    0x00010003    Azeri-Latin-100, case-insensitive, accent-insensitive, kanatype-insensitive, width-sensitive
Azeri_Latin_100_CI_AI_KS    1254    0x0000042C    0x00020003    Azeri-Latin-100, case-insensitive, accent-insensitive, kanatype-sensitive, width-insensitive
Azeri_Latin_100_CI_AI_KS_WS    1254    0x0000042C    0x00000003    Azeri-Latin-100, case-insensitive, accent-insensitive, kanatype-sensitive, width-sensitive
Azeri_Latin_100_CI_AS    1254    0x0000042C    0x00030001    Azeri-Latin-100, case-insensitive, accent-sensitive, kanatype-insensitive, width-insensitive
Azeri_Latin_100_CI_AS_WS    1254    0x0000042C    0x00010001    Azeri-Latin-100, case-insensitive, accent-sensitive, kanatype-insensitive, width-sensitive
Azeri_Latin_100_CI_AS_KS    1254    0x0000042C    0x00020001    Azeri-Latin-100, case-insensitive, accent-sensitive, kanatype-sensitive, width-insensitive
Azeri_Latin_100_CI_AS_KS_WS    1254    0x0000042C    0x00000001    Azeri-Latin-100, case-insensitive, accent-sensitive, kanatype-sensitive, width-sensitive
Azeri_Latin_100_CS_AI    1254    0x0000042C    0x00030002    Azeri-Latin-100, case-sensitive, accent-insensitive, kanatype-insensitive, width-insensitive
Azeri_Latin_100_CS_AI_WS    1254    0x0000042C    0x00010002    Azeri-Latin-100, case-sensitive, accent-insensitive, kanatype-insensitive, width-sensitive
Azeri_Latin_100_CS_AI_KS    1254    0x0000042C    0x00020002    Azeri-Latin-100, case-sensitive, accent-insensitive, kanatype-sensitive, width-insensitive
Azeri_Latin_100_CS_AI_KS_WS    1254    0x0000042C    0x00000002    Azeri-Latin-100, case-sensitive, accent-insensitive, kanatype-sensitive, width-sensitive
Azeri_Latin_100_CS_AS    1254    0x0000042C    0x00030000    Azeri-Latin-100, case-sensitive, accent-sensitive, kanatype-insensitive, width-insensitive
Azeri_Latin_100_CS_AS_WS    1254    0x0000042C    0x00010000    Azeri-Latin-100, case-sensitive, accent-sensitive, kanatype-insensitive, width-sensitive
Azeri_Latin_100_CS_AS_KS    1254    0x0000042C    0x00020000    Azeri-Latin-100, case-sensitive, accent-sensitive, kanatype-sensitive, width-insensitive
Azeri_Latin_100_CS_AS_KS_WS    1254    0x0000042C    0x00000000    Azeri-Latin-100, case-sensitive, accent-sensitive, kanatype-sensitive, width-sensitive
Turkish_BIN    1254    0x0000041F    0x00000000    Turkish, binary sort
Turkish_BIN2    1254    0x0000041F    0x00000000    Turkish, binary code point comparison sort
Turkish_CI_AI    1254    0x0000041F    0x00030003    Turkish, case-insensitive, accent-insensitive, kanatype-insensitive, width-insensitive
Turkish_CI_AI_WS    1254    0x0000041F    0x00010003    Turkish, case-insensitive, accent-insensitive, kanatype-insensitive, width-sensitive
Turkish_CI_AI_KS    1254    0x0000041F    0x00020003    Turkish, case-insensitive, accent-insensitive, kanatype-sensitive, width-insensitive
Turkish_CI_AI_KS_WS    1254    0x0000041F    0x00000003    Turkish, case-insensitive, accent-insensitive, kanatype-sensitive, width-sensitive
Turkish_CI_AS    1254    0x0000041F    0x00030001    Turkish, case-insensitive, accent-sensitive, kanatype-insensitive, width-insensitive
Turkish_CI_AS_WS    1254    0x0000041F    0x00010001    Turkish, case-insensitive, accent-sensitive, kanatype-insensitive, width-sensitive
Turkish_CI_AS_KS    1254    0x0000041F    0x00020001    Turkish, case-insensitive, accent-sensitive, kanatype-sensitive, width-insensitive
Turkish_CI_AS_KS_WS    1254    0x0000041F    0x00000001    Turkish, case-insensitive, accent-sensitive, kanatype-sensitive, width-sensitive
Turkish_CS_AI    1254    0x0000041F    0x00030002    Turkish, case-sensitive, accent-insensitive, kanatype-insensitive, width-insensitive
Turkish_CS_AI_WS    1254    0x0000041F    0x00010002    Turkish, case-sensitive, accent-insensitive, kanatype-insensitive, width-sensitive
Turkish_CS_AI_KS    1254    0x0000041F    0x00020002    Turkish, case-sensitive, accent-insensitive, kanatype-sensitive, width-insensitive
Turkish_CS_AI_KS_WS    1254    0x0000041F    0x00000002    Turkish, case-sensitive, accent-insensitive, kanatype-sensitive, width-sensitive
Turkish_CS_AS    1254    0x0000041F    0x00030000    Turkish, case-sensitive, accent-sensitive, kanatype-insensitive, width-insensitive
Turkish_CS_AS_WS    1254    0x0000041F    0x00010000    Turkish, case-sensitive, accent-sensitive, kanatype-insensitive, width-sensitive
Turkish_CS_AS_KS    1254    0x0000041F    0x00020000    Turkish, case-sensitive, accent-sensitive, kanatype-sensitive, width-insensitive
Turkish_CS_AS_KS_WS    1254    0x0000041F    0x00000000    Turkish, case-sensitive, accent-sensitive, kanatype-sensitive, width-sensitive
Turkish_100_BIN    1254    0x0000041F    0x00000000    Turkish-100, binary sort
Turkish_100_BIN2    1254    0x0000041F    0x00000000    Turkish-100, binary code point comparison sort
Turkish_100_CI_AI    1254    0x0000041F    0x00030003    Turkish-100, case-insensitive, accent-insensitive, kanatype-insensitive, width-insensitive
Turkish_100_CI_AI_WS    1254    0x0000041F    0x00010003    Turkish-100, case-insensitive, accent-insensitive, kanatype-insensitive, width-sensitive
Turkish_100_CI_AI_KS    1254    0x0000041F    0x00020003    Turkish-100, case-insensitive, accent-insensitive, kanatype-sensitive, width-insensitive
Turkish_100_CI_AI_KS_WS    1254    0x0000041F    0x00000003    Turkish-100, case-insensitive, accent-insensitive, kanatype-sensitive, width-sensitive
Turkish_100_CI_AS    1254    0x0000041F    0x00030001    Turkish-100, case-insensitive, accent-sensitive, kanatype-insensitive, width-insensitive
Turkish_100_CI_AS_WS    1254    0x0000041F    0x00010001    Turkish-100, case-insensitive, accent-sensitive, kanatype-insensitive, width-sensitive
Turkish_100_CI_AS_KS    1254    0x0000041F    0x00020001    Turkish-100, case-insensitive, accent-sensitive, kanatype-sensitive, width-insensitive
Turkish_100_CI_AS_KS_WS    1254    0x0000041F    0x00000001    Turkish-100, case-insensitive, accent-sensitive, kanatype-sensitive, width-sensitive
Turkish_100_CS_AI    1254    0x0000041F    0x00030002    Turkish-100, case-sensitive, accent-insensitive, kanatype-insensitive, width-insensitive
Turkish_100_CS_AI_WS    1254    0x0000041F    0x00010002    Turkish-100, case-sensitive, accent-insensitive, kanatype-insensitive, width-sensitive
Turkish_100_CS_AI_KS    1254    0x0000041F    0x00020002    Turkish-100, case-sensitive, accent-insensitive, kanatype-sensitive, width-insensitive
Turkish_100_CS_AI_KS_WS    1254    0x0000041F    0x00000002    Turkish-100, case-sensitive, accent-insensitive, kanatype-sensitive, width-sensitive
Turkish_100_CS_AS    1254    0x0000041F    0x00030000    Turkish-100, case-sensitive, accent-sensitive, kanatype-insensitive, width-insensitive
Turkish_100_CS_AS_WS    1254    0x0000041F    0x00010000    Turkish-100, case-sensitive, accent-sensitive, kanatype-insensitive, width-sensitive
Turkish_100_CS_AS_KS    1254    0x0000041F    0x00020000    Turkish-100, case-sensitive, accent-sensitive, kanatype-sensitive, width-insensitive
Turkish_100_CS_AS_KS_WS    1254    0x0000041F    0x00000000    Turkish-100, case-sensitive, accent-sensitive, kanatype-sensitive, width-sensitive

but all of the code page values looked right!

Hmmm.

I went back and asked for more information, and then I got the clues I should have seen the first time:

And with this information I stumbled on a fact that I did not realize even though it was right in front of me.

The fn_helpcollations function wasn't returning deprecated collations!

It would have helped for me to look at the SQL Server 2008 version of the topic, which had the following note:

Deprecated collations that are no longer supported or installed in SQL Server 2008 cannot be queried with fn_helpcollations.

Aha!

though the older collation is actually still there; you just can't enumerate it. I was able to use it in T-SQL just fine for several operations.

And then I ran the following query:

SELECT
  Name, 
  COLLATIONPROPERTY(Name, 'CodePage') As CodePage,
  CONVERT(binary(4), COLLATIONPROPERTY(Name, 'LCID')) As LCID
FROM
  (VALUES ('Azeri_Latin_90_BIN'),
          ('Azeri_Latin_100_BIN'),
          ('Azeri_Cyrillic_90_BIN'),
          ('Azeri_Cyrillic_100_BIN'),
          ('Turkish_BIN'),
          ('Turkish_100_BIN')
  ) AS Collations(Name);

which returned the following results (incorrect ones that do not match the SQL Server 2005 values in red):

Azeri_Latin_90_BIN        1251    0x0000042C
Azeri_Latin_100_BIN       1254    0x0000042C
Azeri_Cyrillic_90_BIN     1254    0x0000082C
Azeri_Cyrillic_100_BIN    1251    0x0000082C
Turkish_BIN               1254    0x0000041F
Turkish_100_BIN           1254    0x0000041F

And there's the bug they were talking about.

Well, that and the documentation bug that claims that these deprecated collations aren't installed. I think what they were trying to say was that they were not available in setup to choose as the default collation, and it is just bbeing described in a weird way -- since clearly the old ones are still round (and would have to be or no database using them could be upgraded properly!).

and if you use Unicode columns then this bug won't matter.

But if you use CHAR, VARCHAR, or TEXT columns with this collation, your results will be wrong -- very very wrong....

Let's look at the code pages themselves to see how wrong, and where. First code page 1254:

Windows code page 1254 

and then code page 1251:

Windows Code Page 1251

and you can see that while the Latin Azeri will be missing some characters, the Cyrillic Azeri will be missing pretty much all the ones you need.

Okay, a serious bug -- one that has to be fixed.

The easy fix is to change to the newer Azeri collations, but this can be complicated by anything that tries to map the characters through Unicode first since the data they have on the mappings is wrong here!

Not to mention users who notice the problem and just use the wrong collation that works correctly -- fixing the bug could even break a few people....

Now you may (if you are a regular reader) remember 100% roundtrip ASCII? 100% roundtrip ANSI?, which talks about missing characters in the two code pages.

In this case, code page 1254 is missing seven bytes:

0x81  0x8D  0x8E  0x8F  0x90  0x9D  0x9E

which if they were on the Cyrillic code page would be:

While code page 1251 is just missing one byte:

0x98

which if they were in the Latin code page would be:

Once again, it is indeed the Cyrillic code page that suffers the most, though in the case of improperly stored data it might have been the easiest to detect being on the wrong code page, if these letters were being used in Cyrillic Azeri (which according to pages like this one they don't really seem to be!).

In the end, this bug becomes just another incident where due to the fact that Unicode is what is being emphasized, bugs in non-Unicode code paths slip through.

This might be another good reson to support Unicode, everyone!

I'll probably talk more about deprecation here, another day. This was an interesting and not-wrong (in my opinion) design direction that deserves some real understanding and elucidation....

 

This blog brought to you by the eight above mentioned non-Azeri Unicode characters that have been pulled into some degree of Azeriness...


# Michael Ratanapintha on 25 Nov 2008 7:27 PM:

I guess I am being a bit stupid here, but I was wondering what the "::" operator in your T-SQL code does.  Is it some kind of scope resolution operator, as the C++ analogy indicates?  Or does it do some other function?

# Michael S. Kaplan on 25 Nov 2008 9:40 PM:

I don't think that makes you stupid. :-)

This is something I was in the first UDF sample and have just used ever since, I assume it is an analogous scope resolution operator. I just seem to do it out of habit....


Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day