There's no "I" in IDN, part 5: Stephen Colbert's job is not in any jeopardy

by Michael S. Kaplan, published on 2011/06/29 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2011/06/29/10181212.aspx

I suspect some of my readers are either fans or at least regular watchers of The Colbert Report.

Today's blog ends up being about a combination "tip of the hat/wag of the finger" question.

I have following two string characters whose comparisons in SQL are equal, however I couldn’t figure out any comparisons in .net (culture/ordinal/case insensitive) that would return me equality. Any ideas?

First of all, a wag of the finger since the question referred to "double byte characters" despite every string involved using Unicode, in a language (C#) that uses Unicode.

Perhaps somewhat forgivable since the example was clearly referencing Japan, so perhaps the questioner was thinking about Japanese at the time. And therefore "double byte" was just old school thinking about CJK. Kind of like how they never migrated all those people off the FAREAST domain, even as everything else started referencing east Asia. Even though domain account migrations are so much easier these days after those thousands of migrations in Windows kind of forced ITG to get better at it....

Second of all, a tip of the hat to the genuine attempt to try to do comparisons that fold out distinctions in an attempt to get parity between SQL Server and the .NET Framework.

Third of all, a wag of the finger for ignoring the most important distinction in this case -- the implicit Width Insensitive nature of all _C*_A* collations in SQL Server, which could have been simulated by adding a StringComparison.IgnoreWidth to the first call, had their names not masked the fundamenta nature of the "hidden width" that makes me wonder if someone in SQL Server isn't worried about their weight too much....

Fourth of all, a wag of the finger for taking a question obviously covering E-mail Address Internationalization (EAI) but doing it without even asking the question in a way or to a distribution list that suggested they were thinking about EAI.

With a bonus fifth of all wag of the finger to SQL Server since it is hiding so much of the problem here that people come out of SQL Server wondering how to make other products act like them, rather than coming out asking the real questions....

Okay, seems like a lot more wags than tips on this one. And that's even ignoring the extra wags i decided to leave for another day.

I've decided I can't do "tip of the hat/wag of the finger" very well. I should leave that sort of thing to the professionals. From now on, I will.

Do you think the severity of the first wag of the finger might also be reduced a bit since the strings are coming from SQL Server, which (AFAIK) continues to use UCS-2 rather than UTF-16, so the encoding might legitimately be called "double byte"?

I have encountered many people that use "double byte" for the wide variants of the Latin script.

Similarly, some use "4 byte characters" for characters that would need 4 bytes in GB 18030 even if they don't use surrogates as Unicode.

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

2013/10/17 There's no "I" in IDN, part 19: There's no "I" in IPv6, either!

2013/10/08 There's no "I" in IDN, part 18: There isn't even an "I" in John C. Klensin's name!

2013/09/13 There's no "I" in IDN, part 17: EAI made it to China, and everybody knows it!

2013/04/19 There's no "I" in IDN, part 16: It's a good thing they decided to call it EAI!

2012/10/12 There's no "I" in IDN, part 15: Still no 'I' in EAI.... but we could use an US sometime soon!

2012/08/08 There's no "I" in IDN, part 14: It turns out there's no "I" in IE, either

2012/05/18 There's no "I" in IDN, part 13: Desktop and Managed and Metro; oh my!

2012/02/27 There's no "I" in IDN, part 12: Emoji + IDN == U+1F4A9 (PILE OF POO)

2011/10/25 There's no "I" in IDN, part 11: There's no place like ::1, not even 127.0.0.1!

2011/09/21 There's no "I" in IDN, part 10: Who needs IDN support? How much? When? (Part 2)

2011/09/16 There's no "I" in IDN, part 9: Who needs IDN support? How much? When? (Part 1)

2011/08/12 There's no "I" in IDN part 8: Punycode don't do the PUA

2011/07/28 There's no "I" in IDN, part 7: IDN comes to AdWords

2011/07/14 There's no "I" in IDN, part 6: It isn't like there's an "I" in EAI, either!