WORD SORT...Why'd it have to be...WORD SORT?

by Michael S. Kaplan, published on 2010/06/10 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2010/06/10/10021417.aspx


So the other day I was in one of those email threads again.

One of those threads that makes me wonder if anyone realizes I have not worked in the NLS code for years.

No worries, the nature of the questions hasn't changed all that much. :-)

The question went something like this:

--create table tab1
--(a nvarchar(100))

--insert into tab1 values ('2-1'), ('21') ,('3-1'), ('31') ,('3-1-'), ('31-')

select a from tab1 order by 1
select a collate latin1_general_bin  from tab1 order by 1

begin tran
update tab1
set a = a + N'-'
select a from tab1 order by 1
select a collate latin1_general_bin  from tab1 order by 1
rollback

--drop table tab1

Output looks like following.

a
----------------------------------------------------------------------------------------------------
21
2-1
31
3-1
3-1-
31-

(6 row(s) affected)


----------------------------------------------------------------------------------------------------
2-1
21
3-1
3-1-
31
31-

(6 row(s) affected)


(6 row(s) affected)
a
----------------------------------------------------------------------------------------------------
2-1-
21-
3-1-
3-1--
31-
31--

(6 row(s) affected)


----------------------------------------------------------------------------------------------------
2-1-
21-
3-1-
3-1--
31-
31--

(6 row(s) affected)

We clearly see the sort order getting modified for which I cannot find a reason. If I replace hyphens with any other special character (eg ^, +, /, “ etc), sort order seems to work fine.

a
----------------------------------------------------------------------------------------------------
2^1
21
3^1
3^1^
31
31^

(6 row(s) affected)


----------------------------------------------------------------------------------------------------
21
2^1
31
31^
3^1
3^1^

(6 row(s) affected)


(6 row(s) affected)
a
----------------------------------------------------------------------------------------------------
2^1^
21^
3^1^
3^1^^
31^
31^^

(6 row(s) affected)


----------------------------------------------------------------------------------------------------
21^
2^1^
31^
31^^
3^1^
3^1^^

(6 row(s) affected)

I have been trying to understand this flip in sort order for a while and nothing in String sort help or anywhere which explains this flip. Now as I understand it, hyphens are treated as pure symbols (no weight to them I suppose). So the first output then should be other way, shouldn’t it? That’s what happens when we add a hyphen to it (second output).

What I am missing here? Why does hyphen get some really special, special character treatment? How do we explain this?

Regular readers will probably know why the titular allusion to Indiana Jones was thusly inspired....

The rest of you can sample some of the following blogs:

and so on.

Now SQL Server does not give one the choice between word sorts and string sorts -- it is always a word sort, except when it is a _BIN or _BIN2 collation, in which case the hyphen and such will be in a string sort-ish kind of situation (along with everything else that changes).

So in particular when one is looking at:

21
2-1

versus:

2-1-
21-

and wondering why the order changes, it becomes fairly obvious in the world of the word sort, where the hyphen in always given quadernary weight.

Or maybe it isn't? :-)

Let's look at the four sort keys (winning lower weights are blue, losing higher larger weights are red):

21   0d 1a 0d 19 01 01 01 01 00
2-1  0d 1a 0d 19 01 01 01 01 ff fe 82 12 00

2-1- 0d 1a 0d 19 01 01 01 01 ff fe 82 12 ff fd 82 12 00
21-  0d 1a 0d 19 01 01 01 01 ff fd 82 12 00

See what's happening here? In terms of primary weights all four strings are actually identical, and it is just the nature of these very doubleplusnonprimary weights that we call hyphens that cause the problems....

But CompareString on Windows 7 is not giving me the same results as the ones they are claiming are coming from SQL Server.

So this looks like a SQL Server-specific problem -- maybe a bug that has been fixed in some vfersion of Windows?


no comments

go to newer or older post, or back to index or month or day