Fight the Future? (#11 of ??), aka Microsoft is giving this character nada weight but lotsa importance

by Michael S. Kaplan, published on 2008/04/09 03:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/04/09/8367652.aspx


Content of Michael Kaplan's personal blog not approved by Microsoft (see disclaimer)!
Regular readers should keep in mind that all I said in The End? still applies; the allusion to the X-Files continues for people who understand such references....

There are times where I will feel vaguely uncomfortable with something.

Like when I blogged The torrents of U+fffd (aka When security and conformance trump compatibility and reality) and went on the record about how I thought all the changes to get all U+fffd happy were not the greatest idea, particularly how they were kind of snuck in under the banner of conformance to Unicode.

Though I did not have beyond my vague feeling that it was a bad idea.

This reaction ends up as an example of the kind of issue that tends to make young program managers assume I am crotchety old man before they have even met me so many years before I can honestly wear that mantle, by the way!

In blogs like Validation of Unicode text is growing up I again point out some problems, kind of low key, like a "Chicken Little on Prozac" or something.

Maybe I should have been a little louder, though. Sure I'd be a crotchety old man, but people seem to think that anyway, right?

Attend me for a moment, oh regular reader of my blog....

It started when I was clearing out daily spam and noticed how the Getting the real (localized) name of the keyboard blog:

even now was continuing to attract a lot of U+fffd spam -- which most likely represents Community Server and a recent .NET Framework version doing what it does with bogus text. Neither here nor there (and this was a lighter day than most with just six such messages instead of the usual higher numbers).

But it got me thinking.

You might remember that Law & Order inspired The jury will give this string no weight blog.

You might know where I am now going with this now.

{Wait for it....}

That's right, campers, U+fffd (aka REPLACEMENT CHARACTER) has no weight!

Which means that anyone who is not using OrdinalIgnoreCase-type comparisons to look at filenames or symbolic identifiers, who was thinking that this new and bold "insert U+fffd to protect people from bad Unicode strings" and who thought the old "drop the bogus text" solution is bad, is ignoring the fact that the most common operation people will do on those strings (compare them) is gonna ignore those characters anyway!

This is not new behavior, mind you -- every version of Windows and every version of the .NET Framework has ignored this character and given it no weight.

Changing it in the .NET Framework opens one up for all of those index corruption issues, just like doing it in Windows service packs....

It turns out that no one likes this character, really. Look at the text in that U+fffd fileformat.info post:

warning

 Do not use this character in domain names. Browsers are blacklisting it because of the potential for phishing.

Kind of ironic that this character that is so tied to our conformance (an ideal that weighs heavily on Microsoft) is one that is given no "weight".

 

This blog brought to you by U+fffd, aka REPLACEMENT CHARACTER


no comments

referenced by

2010/12/16 You can't ignore crap and hope it won't cause problems...

2010/11/01 The consequences of being unintuitive and nonconformant

2008/05/11 The vector of this spam is [apparently] indeterminate

go to newer or older post, or back to index or month or day