Logical StrCmpLogicalW changes in Vista

by Michael S. Kaplan, published on 2006/10/01 03:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/10/01/778990.aspx


I mentioned yesterday in StrCmp for humans? It's in there! that some change happened in the behavior of StrCmpLogicalW in Vista. I thought I'd explain what the change is....

So let's say you had the following list of files:

My File.txt
My File (4).txt
My File (5).txt
My File (10).txt
My File (200).txt
My File (3000).txt

In the CMD shell, the order is:

My File (10).txt
My File (200).txt
My File (3000).txt
My File (4).txt
My File (5).txt
My File.txt

while it is obvious that the vast majority of non geeks would prefer the first order -- the original basis of the change the led to what I described in What is up with number sorting?

Unfortunately, in XP and Server 2003, the order that StrCmpLogicalW was creating was

My File (4).txt
My File (5).txt
My File (10).txt
My File (200).txt
My File (3000).txt
My File.txt

The reasons why would be obvious, given that SPACE is a symbol (see I need my SPACE, symbolically speaking for more information on that), and the lightest one, at that -- so that comparing U+0020 to U+002e will always have U+0020 come first, and thus you see this undesired order.

Now note that the default is to hide file extensions, in which case everything would look correct:

My File
My File (4)
My File (5)
My File (10)
My File (200)
My File (3000)

but punishing people for wanting to see the extension seems like a bad idea, all the way around.

(I know I am opening myself up for an onslaught of people who hate that default, but that really cannot be helped, sorry!)

Changes to remove the space would also not work since U+002e also comes before U+0028 (LEFT PARENTHESIS). I suppose they could have asked us to change that, since the ordering of most symbols is somewhat arbitrary anyway, but I am sure people found the lack of a space there to be asthetically displeasing....

So the change that was made (very recently) was to treat U+002e as a "break" character betweeen chunks of the string that were being compared, the same way the numbers were being treated currently -- in essence treating the FULL STOP as a stop. That way the "correct" order would show up whether one was showing file extensions or not.

This kind of matches another cool change in Vista where renaming a file in Explorer does not highlight the extension when it highlights the filename -- essentially treating the extension of the file as more of a metadata thing. :-)

And of course there is a new group policy setting in Vista to make the sorting change with FULL STOP not happen, for the people who don't want it. Just go to one of the following keys depending on whether you want it as a user setting or a machine one (if it is not there, then create it!):

HKLM\SOFTWARE\Policies\Microsoft\Windows\Explorer

HKCU\SOFTWARE\Policies\Microsoft\Windows\Explorer

and then add a DWORD value of NoDotBreakInLogicalCompare and make its value 1. This will return you to the older behvior if you liked it better (which is unlikely), or more likely if you commonly used the FULL STOP in file names before the file extension and the change was creating non-intuitive results!

This post brought to you by . (U+002e, a.k.a. FULL STOP)


# orcmid on 1 Oct 2006 12:37 PM:

Very interesting.  Nice resolution to the problem.  

Meanwhile, I discovered something weird.  My browser (IE6) is showing me

HKCU\SOFTWARE\Policies\Microsoft\Windows\Explorer

with the Yen symbols, "¥",  instead of "\".   So is Outlook (where I see the RSS version of the page).  

Now, the browser says the page is encoded as Unicode, and my language preferences are for EN-us (or en-US, I can never remember).  It also happens that my "language" for non-unicode programs is currently set to "Japanese" (that is, default code page 932) so I could demonstrate the problem of displaying Unicode on the Windows Console.

It is a mystery to me why this odd convention (that you and Raymond have written about, I'm sure) infects Unicode applications too!  It's quite annoying when attempting to display multi-lingual examples.  Almost as bad as having emoticon glyphs be substituted for certain sequences in code examples.

[I can't wait to see how IE shows me this comment!]

# Michael S. Kaplan on 1 Oct 2006 3:26 PM:

Actually, I have posted about that issue many times, and it is the default system locale setting that decides it.

# Bill_McC on 17 Jan 2008 11:03 AM:

what's the TSQL equivalent ?

# Michael S. Kaplan on 17 Jan 2008 1:31 PM:

It would be nice if there was one, to be honest! But currently there isn't. I do not know of a plan to add it, though there has been a request for some time from several groups to add the functionality to the lower level functionality within NLS....

# Bill_McC on 17 Jan 2008 8:29 PM:

Thanks Michka :)  

I suppose a CLR function calling out to StrCmpLogicalW, or re-write the equivalent.

# Sue D. Nymme on 2 Apr 2008 1:00 PM:

Michael,

   Thank you for the enlightening post.  Now can you explain the following?

   I created several directories, named "a", "b", "b-c", "b-c-d", "b-c2-d", "b_c_d", "b_c2_d", "c", and "b-c2".  They sorted as follows:

  Why does "b_c2_d" come before "b_c_d", and "b-c2-d" come before "b-c-d", yet "b-c" comes before "b-c2"?

   That's in Explorer.  In Excel, it sorts in a slightly different, yet still inconsistent, order:

   Note that in Excel, "b_c_d" comes before "b_c2_d", yet "b-c2-d" comes before "b-c-d".

What gives?

# Michael S. Kaplan on 4 Apr 2008 6:31 AM:

This is a StrCmpLogicalW issue -- Explorer uses the function and Excel does not, so differences between the two are based on the difference in the way numbers are handled....


referenced by

2008/03/04 Consistency in the Windows Shell is not overrated; it's just underobserved!

2007/12/22 Incomplete Scenarios: They don't know everything that's up with number sorting

go to newer or older post, or back to index or month or day