There's no "I" in IDN, part 5: Stephen Colbert's job is not in any jeopardy

by Michael S. Kaplan, published on 2011/06/29 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2011/06/29/10181212.aspx


Prior blogs in this series:

I suspect some of my readers are either fans or at least regular watchers of The Colbert Report.

Perhaps just my smarter readers.

Or maybe just the ones with basic cable....

Today's blog ends up being about a combination "tip of the hat/wag of the finger" question.

In honor of Stephen Colbert....

The question goes something like this:

SUBJECT: String.Compare for double byte characters in .Net

I have following two string characters whose comparisons in SQL are equal, however I couldn’t figure out any comparisons in .net (culture/ordinal/case insensitive) that would return me equality. Any ideas?

Goal is to not change SQL settings, but to find insensitive compare in .net.

String.Compare(
    "0336753496aaa@ae2.dion.ne.jp",
     "0336753496aaa@ae2.dion.ne.jp",
    CultureInfo.InvariantCulture,
    CompareOptions.IgnoreCase)

OR (Tried all combinations)

String.Compare(
    "0336753496aaa@ae2.dion.ne.jp",
    "0336753496aaa@ae2.dion.ne.jp",
    StringComparison.OrdinalIgnoreCase)

Now I'll consolidate the many different tips and wags:

First of all, a wag of the finger since the question referred to "double byte characters" despite every string involved using Unicode, in a language (C#) that uses Unicode.

Perhaps somewhat forgivable since the example was clearly referencing Japan, so perhaps the questioner was thinking about Japanese at the time. And therefore "double byte" was just old school thinking about CJK. Kind of like how they never migrated all those people off the FAREAST domain, even as everything else started referencing east Asia. Even though domain account migrations are so much easier these days after those thousands of migrations in Windows kind of forced ITG to get better at it....

Second of all, a tip of the hat to the genuine attempt to try to do comparisons that fold out distinctions in an attempt to get parity between SQL Server and the .NET Framework.

Third of all, a wag of the finger for ignoring the most important distinction in this case -- the implicit Width Insensitive nature of all _C*_A* collations in SQL Server, which could have been simulated by adding a StringComparison.IgnoreWidth to the first call, had their names not masked the fundamenta nature of the "hidden width" that makes me wonder if someone in SQL Server isn't worried about their weight too much....

Fourth of all, a wag of the finger for taking a question obviously covering E-mail Address Internationalization (EAI) but doing it without even asking the question in a way or to a distribution list that suggested they were thinking about EAI.

With a bonus fifth of all wag of the finger to SQL Server since it is hiding so much of the problem here that people come out of SQL Server wondering how to make other products act like them, rather than coming out asking the real questions....

Okay, seems like a lot more wags than tips on this one. And that's even ignoring the extra wags i decided to leave for another day.

I've decided I can't do "tip of the hat/wag of the finger" very well. I should leave that sort of thing to the professionals. From now on, I will.

I'll talk more about EAI another day, too....


Jeffrey L. Whitledge on 29 Jun 2011 7:49 AM:

Do you think the severity of the first wag of the finger might also be reduced a bit since the strings are coming from SQL Server, which (AFAIK) continues to use UCS-2 rather than UTF-16, so the encoding might legitimately be called "double byte"?

Michael S. Kaplan on 29 Jun 2011 8:09 AM:

Perhaps a tiny bit, though the fact that SQLS flirts with UTF-16 and the fact that .NET isn't SQL blocks that some....

Mihai on 29 Jun 2011 11:06 AM:

I have encountered many people that use "double byte" for the wide variants of the Latin script.

Similarly, some use "4 byte characters" for characters that would need 4 bytes in GB 18030 even if they don't use surrogates as Unicode.

Michael S. Kaplan on 29 Jun 2011 11:55 AM:

That's why I was finding that one more forgivable, Mihai!


Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2013/10/17 There's no "I" in IDN, part 19: There's no "I" in IPv6, either!

2013/10/08 There's no "I" in IDN, part 18: There isn't even an "I" in John C. Klensin's name!

2013/09/13 There's no "I" in IDN, part 17: EAI made it to China, and everybody knows it!

2013/04/19 There's no "I" in IDN, part 16: It's a good thing they decided to call it EAI!

2012/10/12 There's no "I" in IDN, part 15: Still no 'I' in EAI.... but we could use an US sometime soon!

2012/08/08 There's no "I" in IDN, part 14: It turns out there's no "I" in IE, either

2012/05/18 There's no "I" in IDN, part 13: Desktop and Managed and Metro; oh my!

2012/02/27 There's no "I" in IDN, part 12: Emoji + IDN == U+1F4A9 (PILE OF POO)

2011/10/25 There's no "I" in IDN, part 11: There's no place like ::1, not even 127.0.0.1!

2011/09/21 There's no "I" in IDN, part 10: Who needs IDN support? How much? When? (Part 2)

2011/09/16 There's no "I" in IDN, part 9: Who needs IDN support? How much? When? (Part 1)

2011/08/12 There's no "I" in IDN part 8: Punycode don't do the PUA

2011/07/28 There's no "I" in IDN, part 7: IDN comes to AdWords

2011/07/14 There's no "I" in IDN, part 6: It isn't like there's an "I" in EAI, either!

go to newer or older post, or back to index or month or day