What the @!#$% is the TERTIARY_WEIGHTS() function for?

by Michael S. Kaplan, published on 2006/06/02 23:31 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/06/02/615571.aspx


This function TERTIARY_WEIGHTS() has been in Microsoft SQL Server for a while.

Now you have perhaps read what I have had to say about tertiary weights previously in posts like this one or this one or this one (the latter is especially topical to some of what I will point out in this post -- note that lowercase is given less weight than uppercase!).

According to SQL Server's Books Online in SQLS 2005:

Returns a binary string of weights for each character in a non-Unicode string expression defined with an SQL tertiary collation.

Syntax 
TERTIARY_WEIGHTS( non_Unicode_character_string_expression )
 

Arguments
non_Unicode_character_string_expression
Is a string expression of type char, varchar, or varchar(max) defined on a tertiary SQL collation. For a list of these collations, see Remarks.

Return Types
TERTIARY_WEIGHTS returns varbinary when non_Unicode_character_string_expression is char or varchar, and returns varbinary(max) when non_Unicode_character_string_expression is varchar(max).

Remarks
TERTIARY_WEIGHTS returns NULL when non_Unicode_character_string_expression is not defined with an SQL tertiary collation. The following table shows the SQL tertiary collations.

Sort order ID  SQL collation 
33    SQL_Latin1_General_Pref_CP437_CI_AS
34    SQL_Latin1_General_CP437_CI_AI
43    SQL_Latin1_General_Pref_CP850_CI_AS
44    SQL_Latin1_General_CP850_CI_AI
49    SQL_1xCompat_CP850_CI_AS
53    SQL_Latin1_General_Pref_CP1_CI_AS
54    SQL_Latin1_General_CP1_CI_AI
56    SQL_AltDiction_Pref_CP850_CI_AS
57    SQL_AltDiction_CP850_CI_AI
58    SQL_Scandinavian_Pref_CP850_CI_AS
82    SQL_Latin1_General_CP1250_CI_AS
84    SQL_Czech_CP1250_CI_AS
86    SQL_Hungarian_CP1250_CI_AS
88    SQL_Polish_CP1250_CI_AS
90    SQL_Romanian_CP1250_CI_AS
92    SQL_Croatian_CP1250_CI_AS
94    SQL_Slovak_CP1250_CI_AS
96    SQL_Slovenian_CP1250_CI_AS
106   SQL_Latin1_General_CP1251_CI_AS
108   SQL_Ukrainian_CP1251_CI_AS
113   SQL_Latin1_General_CP1253_CS_AS
114   SQL_Latin1_General_CP1253_CI_AS
130   SQL_Latin1_General_CP1254_CI_AS
146   SQL_Latin1_General_CP1256_CI_AS
154   SQL_Latin1_General_CP1257_CI_AS
156   SQL_Estonian_CP1257_CI_AS
158   SQL_Latvian_CP1257_CI_AS
160   SQL_Lithuanian_CP1257_CI_AS
183   SQL_Danish_Pref_CP1_CI_AS
184   SQL_SwedishPhone_Pref_CP1_CI_AS
185   SQL_SwedishStd_Pref_CP1_CI_AS
186   SQL_Icelandic_Pref_CP1_CI_AS

TERTIARY_WEIGHTS is intended for use in the definition of a computed column that is defined on the values of a char, varchar, or varchar(max) column. Defining an index on both the computed column and the char, varchar, or varchar(max) column can improve performance when the char, varchar, or varchar(max) column is specified in the ORDER BY clause of a query.

Hmmm.... this does not sound much like any of the tertiary stuff I was talking about. And did you notice how many CI (case insensitive) collations are listed there?

Let's try it out:

SELECT TERTIARY_WEIGHTS('ABCDEFGHIJKLMNOPQRSTUVWZYZ' COLLATE SQL_Latin1_General_Pref_CP437_CI_AS)

0x0101010101010101010101010101010101010101010101010101

Hmmm... well, it is not using Microsoft's weights (remember our aversion to embeded 0x01 bytes!) but if uppercase with an ignore case flag returns a low number, perhaps there is some common ground.

Let's try the lowercase letters:

SELECT TERTIARY_WEIGHTS('abcdefghijklmnopqrstuvwxyz' COLLATE SQL_Latin1_General_Pref_CP437_CI_AS)

0x0202020202020202020202020202020202020202020202020202

Hmmm... never mind. :-)

Clearly it is only capturing case differences, and since it will not work with either Unicode types or unusual letters like U+028f (ʏ, a.k.a. LATIN LETTER SMALL CAPITAL Y), all of the numbers will be 0x01 or 0x02. Let's make sure:

SELECT TERTIARY_WEIGHTS('AbCdEfGhIjKlMnOpQeStUvWxYz' COLLATE SQL_Latin1_General_Pref_CP437_CI_AS)

0x0102010201020102010201020102010201020102010201020102

Ok, that looks a bit like proof, right? :-)

So this is a function that will return information about case distinctions even in case insensitive sorts, to help with index performance of non-Unicode SQL compatibility collation columns. Though how it helps is quite unclear at this point.

And to wind down on a weird note, you can try the following query:

SELECT  TERTIARY_WEIGHTS('W' COLLATE SQL_Scandinavian_Pref_CP850_CI_AS),
  TERTIARY_WEIGHTS('w' COLLATE SQL_SwedishStd_Pref_CP1_CI_AS),
  TERTIARY_WEIGHTS('W' COLLATE SQL_SwedishStd_Pref_CP1_CI_AS)

0x03   0x02    0x01

I am afraid to ask what it is about the letter W in Scandinavia that would cause TERTIARY_WEIGHTS() to return a 0x03 here.

Trivia -- does anyone know how to get a 0x04? Or another byte value higher than that? :-)

Perhaps I am even more afraid of asking how on earth anything one does with this function can help performance. Does anyone have any thoughts here?

I would probably recommend giving this function a miss, in the meantime (a statement I will happily retract if someone can explain a purpose or use for this function that makes sense!).

 

This post brought to you by ʏ (U+028f, a.k.a. LATIN LETTER SMALL CAPITAL Y)


# Steve Kass on 20 Jul 2006 6:12 PM:

Here's a clue for your trivia question: SELECT TERTIARY_WEIGHTS('AaÀàÁáÂâÃãÄäÅå' COLLATE SQL_Latin1_General_CP850_CI_AI)

Steve Kass

# Michael S. Kaplan on 26 Jul 2006 3:25 PM:

Indeed, and that is a great example -- can you explain why? :-)

# John Cowan on 6 Dec 2007 12:34 PM:

I'm no SQL Server mavin, but clearly what is special about W in Scandinavia is that it's just a glyphic variant of V, mostly used in proper names nowadays.  The old royal dynasty of Sweden (1523-1654) is usually spelled Vasa now, but the rye-crisps are still Wasa.


referenced by

2006/10/29 SQL Server: compatibility collations vs. Window collations

go to newer or older post, or back to index or month or day