In SQL Server, the alphabet does not end at Z!

by Michael S. Kaplan, published on 2007/12/09 04:11 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/12/09/6673424.aspx


Previous posts in this series:

  • In SQL Server, the distance between A and Z is wider than you might think
  • In SQL Server, A-Z, A-z, a-Z, and a-z may not mean the same thing
  • In SQL Server, the wrong range can make you seem insensitive to one's width! (aka Do my V's look fat?)
  • In SQL Server, your ranges also need to ACCENT-uate the positives
  • When I think back to that very first post in the series, I wonder how many people realized that I basically moved on from that post where I admitted that there were 15 or 20 characters that did not fit between anything akin to an A or anything akin to a Z, since the next few posts after that spent so much time dealing with the Z border -- as if those other characters didn't even exist.

    They fall into two categories:

    1) The various forms of YOGH/EZH, which could in theory all go after Z by default anyway (which is where they end up here, if you notice. :-)

    2) A bunch of other miscellaneous stuff that is not needed by any of the languages that rely on the default collation table for their collation behavior.

    Anyway, here they all are (the five that are not in any shipping version of SQL Server marked in bold):

    So then we get to it -- you could start thinking of your A-ʭ/a-ʭ range (well, technically your A-ʘ/a-ʘ range).

    Obviously if new Latin letters are added that don't specifically make sense to be interlaced within the table. So thinking about that for a moment (and not wanting some random future version with an updated collation option like SQL Server 2005 gave suggesting the need to have something different.

    Simmer with that one for a bit. It does get a bit worse, and I will get to that in tomorrow's post!

     

    This post brought to you by ʘ (U+0298, aka LATIN LETTER BILABIAL CLICK)


    # John Cowan on 9 Dec 2007 1:08 PM:

    Thorn, what happened to Thorn?  It's properly the 27th basic letter of the Latin alphabet.  And of course Yogh should sort with G, not with Z and Ezh; the whole point of Yogh was to discriminate between the G-like and Z-like forms.

    # Michael S. Kaplan on 9 Dec 2007 1:38 PM:

    Very good question, one which actually anticipates tomorrow's exciting episode....

    # Mihai on 10 Dec 2007 12:37 PM:

    Just wondering: SQL actually includes the listed characters (and only the listed ones) if I use Z-ʘ? Meaning exclude non-letters? Or exclude Latin letters? (although "MODIFIER LETTER SMALL EZH" is not "Latin letter" and is included).

    # Michael S. Kaplan on 10 Dec 2007 12:40 PM:

    SQLS does not use character categories -- it uses collations. And when characters are sorted somewhere, that is where you'll find them. One of the biggest problems with range checking!


    referenced by

    2007/12/11 In SQL Server, there is the rest of Unicode (aka the SiaO Incompleteness Theorem)

    2007/12/10 In SQL Server, different collations implies different ranges (aka Not every table has its THORN)

    go to newer or older post, or back to index or month or day