by Michael S. Kaplan, published on 2011/12/06 07:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2011/12/06/10244626.aspx
The other day, in response to GetHashCode vs. Compare vs. SortKey, Alexander Savin commented:
Michael, with respect to your comment "The fact that more often people test comparisons by comparing "AAA" to "BBB" shows ..." I would say quite the opposite. It is much more important that a comparison function returns as fast as possible if two input strings mismatch not too far from the beginning because most comparisons in the world compare such unequal strings.
I have no alternative to CompareString but I was able to test performance of CompareStringOrdinal (with IgnoreCase = false) vs. wcscmp that I believe behave the same (binary comparison).
For strings having identical prefix not less than 20-30 characters CompareStringOrdinal beats wcscmp. For longer prefix (>100) it is twice as fast (on my i7 machine) if I pass string lengths instead of -1. (Amazingly, the documentation states that CompareString is optimized for the case of cch == -1 and some other restrictions, but no word it is not the case for CompareStringOrdinal.) However, if two strings differ in the very first character or have a short equal prefix CompareStringOrdinal is up to 3 times slower than wcscmp.
I know wcscmp is just a straight loop (even if intrinsic) and CompareStringOrdinal seems to have a "preparation preamble" and manual loop unrolling. I wonder why CompareStringOrdinal is optimized for non-typical case. I understand there still are some cases where CompareStringOrdinal would be beneficial (like sorting path names for the files located in the same deep directory).
Wow, that conflated a lot of issues, didn't it? :-)
I mean, especially since that blog he commented in wasn't talking about CompareStringOrdinal or wcwscmp or any binary type comparison at all -- it was referring to three different ways of doing linguistic comparisons, and which one was best!
Perhaps a bit on the history of CompareStringOrdinal might be indicated. I'll do a bigger version of that some other time, though.
For now I'll focus on the one obvious point -- that CompareString != CompareStringOrdinal.
So every rule, every optimization, every everything about one function is not the same as the other.
The CompareStringOrdinal function was born as a thin wrapper around RtlCompareUnicodeString, and it picking up the name based on CompareString is a retcon of that purpose to make it look like something else.
The name with the "Ordinal" suffix, was trying to take the .Net terminology to explain what it does, even though that terminology is not at all intuitive.
We added trimmings to make the return values work the same, and then in Windows 7/Server 2008 R2, we stopped making it an actual wrapper since that slowed the function down some -- we cloned the function in user mode so the performance hit of a kernel transition wouldn't have to happen.
The reasons for this change are discussef at length in That function is always faster! (well, except for that one case when it can actually be slower...).
Which I guess explains a little why it was "optimized" the way it was -- because it was never really optimized in any particular way anyway, not like CompareString was, at least....
The ordinal function was added so that Win32 and the NLS API would have a way to do the same kinds of comparisons that were built into .Net. Once we made the .Net comparisons more "Windows like", anyway. Since we were wrapping an existing function first added in the very early NT days, there was very little we could do to change the way it was optimized -- without breaking the expectations about the behavior of well over a decade of customers, that is!
Lionel on 6 Dec 2011 9:44 AM:
> we cloned the function in user mode so the performance hit of a kernel transition wouldn't have to happen
Are you sure there is no mistake? I thought that Rtl* functions (e.g. RtlCompareUnicodeString) were helper function for subsystem developers and for kernel mode, residing in ntdll.dll (and thus, running in user mode when called from user mode). Using a syscall for this seems quite wasteful.