by Michael S. Kaplan, published on 2006/10/01 03:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/10/01/778990.aspx
I mentioned yesterday in StrCmp for humans? It's in there! that some change happened in the behavior of StrCmpLogicalW in Vista. I thought I'd explain what the change is....
So let's say you had the following list of files:
My File.txt
My File (4).txt
My File (5).txt
My File (10).txt
My File (200).txt
My File (3000).txt
In the CMD shell, the order is:
My File (10).txt
My File (200).txt
My File (3000).txt
My File (4).txt
My File (5).txt
My File.txt
while it is obvious that the vast majority of non geeks would prefer the first order -- the original basis of the change the led to what I described in What is up with number sorting?
Unfortunately, in XP and Server 2003, the order that StrCmpLogicalW was creating was
My File (4).txt
My File (5).txt
My File (10).txt
My File (200).txt
My File (3000).txt
My File.txt
The reasons why would be obvious, given that SPACE is a symbol (see I need my SPACE, symbolically speaking for more information on that), and the lightest one, at that -- so that comparing U+0020 to U+002e will always have U+0020 come first, and thus you see this undesired order.
Now note that the default is to hide file extensions, in which case everything would look correct:
My File
My File (4)
My File (5)
My File (10)
My File (200)
My File (3000)
but punishing people for wanting to see the extension seems like a bad idea, all the way around.
(I know I am opening myself up for an onslaught of people who hate that default, but that really cannot be helped, sorry!)
Changes to remove the space would also not work since U+002e also comes before U+0028 (LEFT PARENTHESIS). I suppose they could have asked us to change that, since the ordering of most symbols is somewhat arbitrary anyway, but I am sure people found the lack of a space there to be asthetically displeasing....
So the change that was made (very recently) was to treat U+002e as a "break" character betweeen chunks of the string that were being compared, the same way the numbers were being treated currently -- in essence treating the FULL STOP as a stop. That way the "correct" order would show up whether one was showing file extensions or not.
This kind of matches another cool change in Vista where renaming a file in Explorer does not highlight the extension when it highlights the filename -- essentially treating the extension of the file as more of a metadata thing. :-)
And of course there is a new group policy setting in Vista to make the sorting change with FULL STOP not happen, for the people who don't want it. Just go to one of the following keys depending on whether you want it as a user setting or a machine one (if it is not there, then create it!):
HKLM\SOFTWARE\Policies\Microsoft\Windows\Explorer
HKCU\SOFTWARE\Policies\Microsoft\Windows\Explorer
and then add a DWORD value of NoDotBreakInLogicalCompare and make its value 1. This will return you to the older behvior if you liked it better (which is unlikely), or more likely if you commonly used the FULL STOP in file names before the file extension and the change was creating non-intuitive results!
This post brought to you by . (U+002e, a.k.a. FULL STOP)
# orcmid on 1 Oct 2006 12:37 PM:
# Michael S. Kaplan on 1 Oct 2006 3:26 PM:
# Bill_McC on 17 Jan 2008 11:03 AM:
what's the TSQL equivalent ?
# Michael S. Kaplan on 17 Jan 2008 1:31 PM:
It would be nice if there was one, to be honest! But currently there isn't. I do not know of a plan to add it, though there has been a request for some time from several groups to add the functionality to the lower level functionality within NLS....
# Bill_McC on 17 Jan 2008 8:29 PM:
Thanks Michka :)
I suppose a CLR function calling out to StrCmpLogicalW, or re-write the equivalent.
# Sue D. Nymme on 2 Apr 2008 1:00 PM:
Michael,
Thank you for the enlightening post. Now can you explain the following?
I created several directories, named "a", "b", "b-c", "b-c-d", "b-c2-d", "b_c_d", "b_c2_d", "c", and "b-c2". They sorted as follows:
Why does "b_c2_d" come before "b_c_d", and "b-c2-d" come before "b-c-d", yet "b-c" comes before "b-c2"?
That's in Explorer. In Excel, it sorts in a slightly different, yet still inconsistent, order:
Note that in Excel, "b_c_d" comes before "b_c2_d", yet "b-c2-d" comes before "b-c-d".
What gives?
# Michael S. Kaplan on 4 Apr 2008 6:31 AM:
This is a StrCmpLogicalW issue -- Explorer uses the function and Excel does not, so differences between the two are based on the difference in the way numbers are handled....
referenced by