Wait til you see my 'O'[EMCP based technology]

by Michael S. Kaplan, published on 2007/07/12 07:48 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/07/12/3829450.aspx


(no, this post is not about a rap or hip hop song, or its lyrics, though I admit the title may have been inspired by one, just like last time)

Looking at Larry Osterman's post yesterday entitled How do I compare two different NetBIOS names?....

(A nice brief history of a feature with piss-poor international support that even to this day resists all efforts to improve, by the way!)

The actual question that prompted the "simplified" question that Larry covered was interesting on its own and I wanted to talk about it a bit.

There was a need to compare two computer names, but one of them was UTF-16 and the other (like all NetBIOS names) is in CP_OEMCP. And the question was how to do the comparison....

Obviously there are two possible ways:

  1. Convert the UTF-16 name to CP_OEMCP and compare them;
  2. Convert the CP_OEMCP name to UTF-16 and compare them.

In both comparisons, one is wanting to use that whole "uppercase+binary" kind of ignore case comparison that we all know and love.

Given the complications in doing the case insensitive binary comparison in an arbitrary code page, it is much better to go with choice #2 (where there is just the one case table to deal with and there is a handy CompareStringOrdinal function to do the actual comparison with).

In this particular case they were in a position to instead consider calling RtlEqualUnicodeString directly rather than CompareStringOrdinal or even RtlCompareUnicodeString, which has the bonus of being even faster (though it would be hard to call it enough times to notice the difference making the performance issue most likely theoretical, the function has the benefit of doing exactly what they are looking for (this is the whole issue I talk about in Is RtlCompareUnicodeString used correctly?, where the answer is that by and large, it isn't!).

Of course fixing the NetBIOS/computer name story to do better than an OEMCP world would be even better, but that seems a lot less likely. :-(

 

This post brought to you by (U+1ed7, a.k.a. LATIN SMALL LETTER O WITH CIRCUMFLEX AND TILDE)


# Robert MacLean on 12 Jul 2007 8:44 AM:

What about converting both the UTF-16 and CP_OEMCP to Unicode and then comparing two Unicode values?

# Michael S. Kaplan on 12 Jul 2007 9:41 AM:

UTF-16 *is* Unicode, for these purposes. :-)

Yuhong Bao on 4 Dec 2010 4:50 PM:

"(A nice brief history of a feature with piss-poor international support that even to this day resists all efforts to improve, by the way!)"

To be honest, DNS names are no better either. Why do you think Punycode/IDN was created?

Michael S. Kaplan on 4 Dec 2010 7:02 PM:

With the DNS interfaces in Windows, there was hope for the future; NetBIOS never had a chance....


referenced by

2011/06/08 Wait til you see my Õ (Ō), Latvian edition

go to newer or older post, or back to index or month or day