Is RtlCompareUnicodeString used correctly?

by Michael S. Kaplan, published on 2006/02/09 03:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/02/09/528297.aspx


I'm not sure how many of you remember when I posted Hungarian is even more complicated than I thought and More on the fabled EqualString.

Not because I don't have stats or anything, but because there is no way to gauge how many of you are new readers and how many of you really have nothing better to do that read what I am posting here. :-)

Anyway, I am going to talk about RtlEqualUnicodeString and RtlCompareUnicodeString, the functions in ntdll.dll that do binary comparisons that can be cae insensitive, again.

I found out something really interesting about them the other day.

Now it is obvious how RtlEqualUnicodeString might be used -- I mean, if you have two strings and you need to know in a binary sense whether they are equal (possibly ignoring case) then it can be very handy. Because no matter how un-natural the comparison seems to humans, the fact is that lots of Windows loves it.

Of course the actual usage of RtlCompareUnicodeString is a bit less clear -- I mean, the order has no meaning to humans. So a function that uses it to order two strings seems like a ripe source for incorrect usage.

Don't worry, it turns out that nobody is using that order inappropriately.

In just about every case, the return value of the function is tested to see whether it was equal to or not equal to zero.

Yes, that is right -- almost everyone who uses it is essentially duplicating the functionality of RtlEqualUnicodeString.

When you get down to it, one has to wonder how much more expensive is operation A than operation B:

A - compare two strings, one WCHAR at a time, return the difference if there is one as soon as you find it, then compare that number to zero to see if there is in fact a difference.

B - compare two strings, one WCHAR at a time, return TRUE or FALSE as soon as you know whether they are in fact equal.

Remembering for a moment that a difference that makes no difference, makes no difference -- do you think it makes a significant difference?

Hopefully not. Though it worries me that no one seems to be doing anything beyond what RtlEqualUnicodeString would provide. So why take a hit at all?

I resisted the temptation to just go and fix all of the occurrences (it is a 100% safe change but even so, I hate when people do it to code I own).

I also resisted the temptation to send out a bunch of mail to all of the owners to tell them to change their code (I hate when people do that to me, too).

Now that I read this post again, it occurs to me that this will probably not actually be very interesting to people. It just seemed weird to me.

Though if you own one of those calls to RtlCompareUnicodeString, then feel free to change it; at worst it will just be more self-documenting as to the intention, and at best (if the code is called many times in a tight loop) it could even help performance!

 

This post brought to you by "P" (U+0050, a.k.a. LATIN CAPITAL LETTER P)


# Siva on 9 Feb 2006 3:45 AM:

It is very useful

# Nick Lamb on 9 Feb 2006 7:04 AM:

The links suffer from the usual MSDN URL instability, do they work for you Michael?

RtlCompareUnicodeString ought to be useful except that it is poorly described, which tends to make one think that the people who wrote it didn't know what it was for either...

Anyway, an optimist would expect it to be useful where strings (e.g. filenames) need an order, any unique order. This allows you to binary search, build various types of tree structure etc. Obviously doing such things with a locale-sensitive function would be silly (Microsoft has done it before, but it's still silly).

The UTF-16 WCHAR algorithm described in this article would probably work for such a purpose, but if that's what was used it should be clearly indicated because UTF-16 isn't code point order, unlike UTF-8 or UTF-32. So the result is even /more/ unintuitive.

# Michael S. Kaplan on 9 Feb 2006 8:48 AM:

They do work for me, maybe something was down?

Well, it is as useful as the *_BIN collations in SQL Server (which actually have the same order!).

# Dean Harding on 9 Feb 2006 5:29 PM:

Yeah, I would have imagined that RtlCompareUnicodeString would be useful for building sorted lists or trees needed for searching, but not necessarily by humans.

It does seem a little silly to use it instead of RtlEqualUnicodeString, but surely the only difference would be the return statement (i.e. just return the difference between the two inequal WCHARs, instead of a constant TRUE/FALSE) so it's probably at most 1 extra seembly instruction...

# Christian Kaiser on 10 Feb 2006 2:32 AM:

Just what I wanted to write: comparing the strings and returning an order is needed for skip lists and other lists that need a way to find a key. This function is well needed.

So my question would be: what's better with RtlEqualUnicodeString()? All us C/C++ programmers will mix TRUE with "==0" is the APIs are mixed... Who has not yet written ("if (strcmp())") at east once and meant that the following code should be executed when the strings are equal? ;-)

The execution speed should be the same, testing for a Zero or non-Zero-result is no difference.

My conclusion: both are needed, but adding the "Equal" function helps for clarity of source code, in places where only the "equalness" counts.

Christian

# Michael S. Kaplan on 10 Feb 2006 3:16 AM:

I agree both are potentially needed (each for different sitautions), it is just bothersome to me that most of the usage of one ought to be the other, that's all. :-)

# Serge Wautier on 11 Feb 2006 8:07 AM:

Ouch... for a while, I figured Rtl stands for Right-to-left!
I hope I don't need some coffee because I hate coffee :-D

Yuhong Bao on 12 Mar 2009 1:00 AM:

"Ouch... for a while, I figured Rtl stands for Right-to-left! "

Nope, it stands for "Run-Time Library". Do you know about the NT Native APIs in NTDLL.DLL? Rtl* APIs were utility functions that in user mode did not call directly to kernel mode. Nt* and Zw* APIs were in user mode stubs to do a system call to kernel mode which implements the API. In kernel mode, the Win32 API do not exist, and so the NT Native APIs are always used, and in kernel mode there is a difference between Zw* and Nt*. These APIs are mostly undocumented, the third-party that created Interix (a replacement for the POSIX subsystem) had to obtain NT source code to do so.


referenced by

2010/03/15 Thus the problems resist solution, and the workarounds are often inadequate

2007/10/15 We're Lost. Maybe the map is wrong. Maybe all maps are wrong?

2007/09/14 How do I feel about lstrcmpi? I think it blows....

2007/07/12 Wait til you see my 'O'[EMCP based technology]

2006/04/29 'Which comes first?' vs. 'Are they equal?'

2006/03/02 CompareString ignores case by lowercasing....

go to newer or older post, or back to index or month or day