We're damned if we do, and damned if we don't

by Michael S. Kaplan, published on 2005/12/10 21:31 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/12/10/502422.aspx


Some time the week before last I was asked via the contact link:

Sorry to post a question in this section..

Well it was pretty urgent from my side.. I guess you will be able to help me out..

In our software we have a validation phase in which we validate the incoming XML file at runtime..

So there is one character 0x19 which is coming in XML file and our validation fails as XMLTextReader is not able to read it..

Now in our case we use UTF-8 Character encoding to read that file..

This file is opening in IE that means that its a valid XML file so what I need to know from your side how can I suppress this character by using appropriate Character Encoding..

I am using C# .NET 2003..

(By the way, please read the info under 'Contacting Me' in regard to instant answering -- MS Product Support is the way to go if you need someone who can answer quickly!)

If you look at the XML standard itself as defined by the W3C, the use of control characters is illegal. The fact that it opens in Internet Explorer is actually the sort of problem that standards folks use to argue that IE is not a good browser, though I suppose this means that they might like the less forgiving XMLTextReader. Though of course the people who prefer the forgiving behavior can argue that it is not a good class.

Now you can look to the W3C directly for an answer. In their FAQs they directly answer the question How do I handle control codes (ie. the 'C0' U+0000-U+001F and 'C1' U+007F-U+009F ranges) in XML, XHTML and HTML?

The question of how Microsoft (or any company) can win when a standard is less flexible than some users believe they require is a complicated one. It is pretty much a religious battle at this point and to be honest I have found that the more strongly a person feels about the issue, the more likely they are to be full of crap. The whole internet could benefit from locking both sides in a room and letting them beat easch other up so those of us who need to get actual work done can avoid the religious fervor. :-)

But in the case of XMLTextReader I am inclined to cut Microsoft some slack -- if you don't like the behavior than you can send complaints to the W3C. And for the IE behavior I am also inclined to cut Microsoft some slack since the number of people who don't give a fig about standards who just want use the internet really outnumber the standards-conformant folks. And if no one creates such a page than the ability to not fail to read it won't ever be seen, anyway -- so the IE behavior prefers the reality of a world where not everyone follows the rules....

So Microsoft, by trying to be forgiving on end user read but unforgiving on developer write has (in my opinion) struck the right balance between these two extreme views.

If you disagree then feel free to comment, but try not to look like one of those people we would lock in that "fervor" room I was talking about. :-)

 

This post brought to you by "" (U+0e0f, a.k.a. THAI CHARACTER TO PATEK)


# Stuart Ballard on 10 Dec 2005 9:56 PM:

Just a datapoint on the side of "IE shouldn't be forgiving of bad data" - as demonstrated by this poster, a lot of people think that if IE opens something it's valid. If the creators of the file are using the "IE-litmus-test" for validity, then bad files get out into the wild (and people complain when other tools follow the standards). If IE had refused to read the file too, the poster would have placed the blame where it belongs: on the creator of the bogus XML file, rather than on XmlTextReader.

Not trying to be religious about this ;) but in a world where IE is so widely used, it's inevitable that some people won't test in anything else. And then tools that *do* attempt to follow the standards suffer, or are forced to try to emulate the (unspecified, undocumented, and usually inconsistent) IE behavior for these non-conformant files.

Perhaps if IE would display the files but give a big warning or error message that the file is non-conformant and may not work with other tools, it would be a good compromise...

# Michael S. Kaplan on 10 Dec 2005 10:00 PM:

That is actually an incredibky good compromise, and a great way to be all things to all people. :-)

Though I am faily certain you are incorrect about where people woukd place the blame -- if it does not open in IE, they will blame IE....

# Andy on 11 Dec 2005 1:08 AM:

Your RSS feed is currently broken presumably because it includes a character that is illegal in xml :-)

# Michael S. Kaplan on 11 Dec 2005 2:54 AM:

I must be missing the joke here -- my RSS feed has no problems?

# Björn Graf on 11 Dec 2005 3:45 AM:

It is not the feed per se that has the problem but RSSBandit fails to display this entries item with the nice error: "Refresh feed '' failed with error: hexadecimal value 0x19, is an invalid character. Line 10, position 68." The feed validator errors on the  too :]

# Michael S. Kaplan on 11 Dec 2005 4:06 AM:

Ok, I took it out (it was from the question text I was sent)....

# MSDNArchive on 11 Dec 2005 4:08 AM:

If your argument is about whether IE should go beyond the standard in terms of features, for the user's convenience, I agree, since standards really cannot keep up with the rate of development of new technology. Infact going beyond the standard might be the best way to push the standard forward.

However, it's a different issue to support a file that is clearly incorrectly formatted. A simple "Unexpected character found at line X, blah blah" would go a long way in fixing these problems and keeping religion out of the whole discussion.

# CornedBee on 11 Dec 2005 4:51 AM:

> So Microsoft, by trying to be forgiving on end user read but unforgiving on developer write has (in my opinion) struck the right balance between these two extreme views.

But is it forgiving on the users? It's actually forgiving on the developers in both cases, even if in the former, said developers might be some guy playing with FrontPage and without a clue what he's doing. If IE (and Netscape) hadn't accepted invalid pages ten years ago, web page developers would have learned to write them properly and the issue wouldn't come up.
It's not like the end user asks for invalid code.

Besides, if you try to open a page, and IE says, "I can't open this page because it makes absolutely no sense - whoever wrote it must have been very drunk," do you really think people would blame IE? Disregarding the current situation where browsers are expected to handle tag soup, I mean.

Nowadays, I think MS does the best it can with the given behaviour. But it was a great mistake in the past to make browsers so forgiving.

# Michael S. Kaplan on 11 Dec 2005 4:54 AM:

If we go into the past, a lot of the forgiving behavior in IE was for Netscape compatibility. But no one seems to blame them for their contribution to the problem? :-)

# CornedBee on 11 Dec 2005 9:55 AM:

Oh, I do. That's why I said "browsers", and why I mentioned Netscape.

But Netscape not really around to blame anymore. Gecko is forgiving in compatibility to IE, and in true XHTML mode not really forgiving at all: anything that's not well-formed doesn't get displayed.

Of course, there's one situation where I do blame IE, but that was because of a real bug in the XML parser - its failure to correctly parse the XHTML 1.1 DTDs.

# Serge Wautier on 11 Dec 2005 10:50 AM:

CornedBee, that's interesting: What about a setting in the already long list of Internet Options/Advanced :

o Reject invalid XML.

Can this behaviour be enforced simply by using a DTD ?

BTW, IE accepting loose HTML/XML makes me think of the superb user experience when you turn on the Debug Script options to debug your own scripts. Of course you don't turn it off right after. And let the show begin : Loads of 'A Runtime Error has occured. Do you want to debug ?' message boxes in most sites you usually read, making you wonder if these sites are tested/debugged at all at what features you're missing due to these errors that usually go unnoticed. But this is probably another story.

go to newer or older post, or back to index or month or day