We can't win when it comes to those pesky standards, can we?

by Michael S. Kaplan, published on 2005/12/22 03:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/12/22/506590.aspx

The title of this post is a little tongue-in-cheek; I do not actually think of standards as pesky in a generic sense like that. :-)

But the other day, someone inside Microsoft was having trouble with the XmlSerializer class in the .NET Framework. The problem was something like this:

I’m serializing a string using the XmlSerializer. It’s serializing this input:

"Line\r\nBreak\r\n\tTab"

And deserializing it into this string:

"Line\nBreak\n\tTab\n"

So when I put it back into my Textbox it loses its linebreaks.

This behavior is actually by design for the XmlSerializer though -- and based on the standard!

The behavior is defined in the XML Spec, right here:

2.11 End-of-Line Handling

XML parsed entities are often stored in computer files which, for editing convenience, are organized into lines. These lines are typically separated by some combination of the characters carriage-return (#xD) and line-feed (#xA).

To simplify the tasks of applications, the characters passed to an application by the XML processor must be as if the XML processor normalized all line breaks in external parsed entities (including the document entity) on input, before parsing, by translating both the two-character sequence #xD #xA and any #xD that is not followed by #xA to a single #xA character.

Funny how a standard only annoys us when the one behavior they choose is what we are using. :-)

Luckily very cool MSFTie Elena Kharitidi (who I met years ago when she was working on the Jet team) pointed out how you can make sure that a serialize/deserialize will roundtrip a little better:

Normalizing string values is the default XmlSerializer behavior, but you can override it by configuring your XmlWriter before calling XMlSerialier.Serialize() method.

You need to use XmlWriter.Create() with XmlWriterSetting.NewLineHandling = NewLineHandling.Entitize.

There are other unfortunate side effects of choosing XML as your persistence format: there are ranges of characters (most notably the ones from 0x0 to 0x1F without TAB, CR, LF) that are considered illegal in XML 1.0. The default XmlWriter will write them, but default XmlReader will throw on read.

But if you use XmlTextReader, you can workaround this by setting Normalization=false; on the reader.

Okay, so there are ways to get the standards conformant behavior, and ways to get the other behavior when you need it. Of course somebody will still be unhappy with the defaults we choose, which is why we can't seem to win these situations! :-)

This post brought to you by "ඇ" (U+0d87, a.k.a. SINHALA LETTER AEYANNA)

# Peter Millard on 22 Dec 2005 5:19 AM:

Nice. Had this exact same problem.

# Maurits [MSFT] on 22 Dec 2005 12:34 PM:

So... why doesn't the textbox respect the \n's as line breaks?

# Michael S. Kaplan on 22 Dec 2005 12:40 PM:

You know Maurits, that would make a great blog entry!

Oh wait, it already did! :-)

http://blogs.msdn.com/michkap/420890.aspx

# Maurits [MSFT] on 22 Dec 2005 1:17 PM:

All right, I read both your blog post and Raymond's blog post.

Raymond brought up the interesting point that old standards all specified CRLF.

Both of you brought up the point that "it's always been that way, and it would take a lot of work to change it, and nobody really cares anyway."

Well, now there's a new standard that specifies bare LF, and at least two people who care... so why not change the edit control to honor bare LFs as line endings, the way Wordpad does?

# Gabe on 22 Dec 2005 1:26 PM:

Forget changing the edit control because there would be too many backcompat issues. I don't see why Notepad can't just fix the file between reading it from disk and adding it to the edit control.

# Michael S. Kaplan on 22 Dec 2005 1:33 PM:

Maurits -- as Gabe points out, the potentional backsompat issues are too great (as was pointed out in a comment to my older post, where someone pointed out a use of the current support).

Gabe -- even on my wishlist there are higher priority features I would like to be added to Notepad, and they are not even getting to mine. So such a change does not seem very likely to me.... :-(

# Jerry Pisk on 22 Dec 2005 2:20 PM:

Adding a new style to the edit control, one that enables \n as a line separator, set to 0 by default, would imo be sufficient not to cause any appcompat issues. It will break apps that assume there will not be any new window styles and use the unused bits for their own purpose but those apps do deserve to be broken.

# Michael S. Kaplan on 22 Dec 2005 2:30 PM:

Hi Jerry,

The trouble here is that the real CUSTOMER complaint is not about the EDIT control, it is about Notepad (and people could be depending on the current behavior of Notepad!).

There are other problems here (like the fact that the EDIT control is code that no one wants to touch at this point), but the backcompat problem is significant....

# Maurits [MSFT] on 22 Dec 2005 2:33 PM:

Alright, the Edit control is frozen. But that leaves the standards breach as unsolved. Standards should be helpful...

I suppose you could add a property to textboxes that would allow them to accept either Windows line-endings or XML-standard line-endings? Something like

class TextBox {
__ public LineEnders LineEnder = LineEnders.CRLF;
}

enum LineEnders {
__ CRLF, /* show bare CR or bare LF as null glyphs */
__ LF, /* show CR as null glyph */
__ CR, /* show LF as null glyph */
__ CRLF_or_CR, /* show bare LF as null glyph */
__ CRLF_or_LF, /* show bare CR as null glyph */
__ CR_or_LF, /* CRLF is TWO line breaks */
__ CRLF_or_CR_or_LF /* Not quite the same as CR_or_LF */
}

The Edit control behavior would be the default...
WordPad behavior could be simulated with CRLF_or_LF...
Mac OS 9 SimpleText behavior could be simulated with CR...
etc., etc.

# Michael S. Kaplan on 22 Dec 2005 2:40 PM:

Maurits, Maurits, Maurits,

The EDIT control *is* the TextBox....

And of course beyond that there is the fundamental misperception that Notepad is the natural editor to use for XML and other documents that are not plain text! :-)

TheLostBrain on 24 Apr 2008 4:42 PM:

Dude you rock! ;)

You just saved me a ton of time!

Thanks!

-TheLostBrain

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day