The answer is dependent on the question (aka Turkish Delight)

by Michael S. Kaplan, published on 2006/02/04 03:05 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/02/04/524716.aspx


The truism in the title of this post seems fairly obvious. Though in several situations this week, I have had to point out this fact when I explained why the answer they got turned out to not be the one they wanted....

A lot of it comes down to the use of the lstrcmpi function.

(As an aside, note that this KB article is simply wrong for every 32-bit version of Windows. I will have to talk about why I think the MSKB is sometimes a priceless asset to be treasured and other times a pariah to be shunned!)

Now the fact is that lstrcmpi is a wrapper around CompareString, which regular readers here know is in Windows for the pupose of doing linguistically meaningful comparisons. And you can fry a linguist in butter and they'd still be a linguist1 -- so no mere wrapper function is going to change the fundamental purpose....

So, if you are a developer thinking about a case insensitive items like FAT/FAT32/NTFS files or the registry or OS objects like events/mutexes/etc., and you think that lstrcmpi looks like the perfect function to use for comparing two such items, what would I say?

Well, I promise I would not call you an idiot2. But if pressed I might have to call a design document that had such an implementation plan idiotic. And I would really try to convince you to fix your code. :-)

Now here is where we get back to the title of this post.

Because if you are calling lstrcmpi for appropriate reasons (i.e. you wanted to get linguistically meaningful results, say in the sorting of a list in a user interface) but you wanted to have behavior that did not vary with different locales, then CompareString with LOCALE_INVARIANT is a good answer.

But if you wanted almost anything else, including all of the non-linguistic purposes hinted at earlier, then CompareStringOrdinal or RtlCompareUnicodeString is a much better choice.

Maybe the fact that these non-linguistic functions are so much FASTER might have some influence on people. Or the fact that if people are makingsecurity decisions based on the results they could be crafting their own security bugs? I mean, both performance and security are "sexier" than international to a lot of these people.:-)

So, how someone asks the question (especially keeping in mind the fact that code already calling lstrcmpi implies a specific, requested usage) can have serious impact on the answer I would give.

Though to be honest, for the last few years my cynical side has decided to assume that original code was probably wrong, and that neither the person who wrote the code nor the person asking me the question has really thought through the scenario.

So perhaps now I would say that the answer is dependent on the cynicism of the person being asked? :-)

One of my presentations at the 29th Internationalization and Unicode Conference (the one entitled Tales of Incorrect String Comparisons) actually talks about this issue and several other collation-type problems. I would highly recommend it to anyone who finds this type of thing to be interesting. Several cool demos, etc. :-)

 

1 - Apologies to Martin Cruz Smith for the slightly munged quote!
2 - Well, not unless you decided lstrcmpi was the function to use even after the problem was explained to you, and even then probably not to your face. :-)

This post brought to you by "İ" (U+0130, a.k.a. LATIN CAPITAL LETTER I WITH DOT ABOVE)


# Dean Harding on 5 Feb 2006 6:43 PM:

To be fair, that KB article says it refers to Windows SDK 3.0/3.1, so it's not talking about 32-bit Windows anyway (Plus the fact that it's got such a low number should be a hint that it may be out of date as well). Though I guess it could probably be spelled out a bit more clearly... Plus, on the page it actually says "last review: August 2004" but then, I don't know what actually goes on in the "review" process (do they actually check the validitity of the data, or just fix broken links, etc?)

To be honest, it's always been my opinion that application code should never actually make any security decisions itself anyway. You should just try it and if it doesn't work report the error. For example, rather than checking to see if a given file name is "valid" or not, just try to create the file - that's the only way that'll work 100% of the time.

Of course, there are some times when you've got to be more careful. Like if you're a web server mapping request URIs to local files names, you've got to be *very* careful that you never map to a file name outside of the web server's root directory. But I'd say that's the exception, rather than the rule.

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2006/03/05 Will I C U at the IUC?

go to newer or older post, or back to index or month or day