On being consistently consistent, while still managing to be dead wrong

by Michael S. Kaplan, published on 2006/12/09, original URI: http://blogs.msdn.com/b/michkap/archive/2006/12/09/1246712.aspx


(Apologies to Hughes Mearns for the poetic excerpt that is a revision to his work!)

When I thought about the problem several years ago, at the point where I first became involved in the .NET Framework's System.Globalization.CompareInfo class, the poem immediately came to mind.

And then when the problem came up again as Brian's intern Scott ran across some interesting asserts while testing various pieces of the .NET Framework, it came to mind again. It depends on assuming that char rhymes with bear, not bar or care, in its pronunciation:

Last night I saw upon the char
A zero length string that wasn't there
He wasn't there again, per IndexOf
Oh, how I'd like to give him a great big shove...

So much for poetry. On to the actual work. :-)

The problem, simply stated, is that since the beginning of the .NET Framework, a decision was made that prepending every string is a string of zero length, so that if one calls "Hello".IndexOf("") one would have a 0 returned rather than a -1 (the latter would indicate the substring was not found).

This of course causes problems even here since actually using that index would not return a string of zero length, it would return an uppercase letter H but that is a minor problem compared to the one that was really baking my noodle (to borrow a Matrixism) -- the fact that this design principle was extended to a particularly obnoxious case -- "".IndexOf("") was also usually returning 0. Even though ""[0] would obviously throw an exception so the result could not actually be used!

Now I say usually because in truth the 1.0 and 1.1 versions of the .NET Framework had a few inconsistencies where it would behave one way in some of these corner cases and a different way in some others. One of the goals that Yung-Shin (my manager) and I were working through for 2.0 was some consistency, especially as I was trying to port the IndexOf/LastIndexOf/IsPrefix/IsSuffix functionality to Vista's unmanaged FindNLSString function (first mentioned here), which we wanted to behave consistently with .NET since it was being added for the sake of .NET anyway.

So we divided the work up; while I was porting, Yung-Shin was fixing up all the inconsistencies and adding a ton of test cases to verify that things were finally behaving consistently (I ended up porting his tests, too). His initial intuition led him to the same conclusion that I reached above -- that ("".IndexOf("") == 0) doesn't make sense so it shouldn't be the way things are.

However, it soon became clear in a series of bugs, asserts, emails, and complaints that there was lots of existing code that depended on this weird logic that claimed that every string was assumed to be prepended by a string of zero length, even if the string in which one was searching was also of zero length. So the initial implementation in 2.0 had to be revised to consistently apply this (in my opinion, twisted) logic.

We both agreed that it made no sense, but given the importance of backcompat in the .NET Framework, the consistency vs. correctness battle was in this case finally arbitrated in favor of consistency with prior versions. In other words, breaking apps is worse than defying common sense....

So he and Shawn worked to make things consistent with the prior version logic, more consistently.

Now in my mind the consistency argument made no sense and I still thought it was incorrect, since it required the caller to do extra strange checks to avoid the weird corner case and possible exceptions being thrown. It would have made more sense to me to just fix the callers relying on the weird results. I hated the idea of polluting FindNLSString with such logic, even though I knew no doing so would really be against the reasons for having the function in the first place.....

But I consoled myself with the fact that in the case of FindNLSString there was a pcchFound parameter that would let the caller know that what was found was also of zero length so that a sensible and consistent check on that return of 0 would keep one from an AV. With the bonus that being that the check made sense in the non-corner cases, too. And not just for the sake of consistency but for the sake of returning correct results.

Looking back to managed code, .NET doesn't have this feature, and there is in fact no easy way to emulate it in linguistically appropriate string comparisons; this was the reason that FindNLSString was added to Vista in the first place!

In fact, it was in the initial design planning for FindNLSString, back when we were calling it plain old FindString, that Tarek (another dev on our team who I have mentioned before) pointed out that without the functionality that the pcchFound parameter provided, there was no good way to add replace logic via FindNLSString.

(I find Tarek's cool contribution to FindNLSString to be quite ironic given that he owns a lot of the .NET Framework side of things, and the .NET Framework does not yet have this functionality. It causes lots of weirdness that keeps the System.String.Replace method from working consistently with System.String.Compare in many cases!)

But then again there are always future versions to do a more complete job here.... :-)

Ok, that's a long story about how easy it is to be consistently consistent and wrong at the same time, but there you have it. I'm wordy. :-)

 

This post brought to you by (U+17d8, a.k.a. KHMER SIGN BEYYAL)


comments not archived

referenced by

2007/04/12 Can you really say international support is irreplaceable?

go to newer or older post, or back to index or month or day