by Michael S. Kaplan, published on 2005/12/22 03:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/12/22/506590.aspx
The title of this post is a little tongue-in-cheek; I do not actually think of standards as pesky in a generic sense like that. :-)
But the other day, someone inside Microsoft was having trouble with the XmlSerializer class in the .NET Framework. The problem was something like this:
I’m serializing a string using the XmlSerializer. It’s serializing this input:
"Line\r\nBreak\r\n\tTab"
And deserializing it into this string:
"Line\nBreak\n\tTab\n"
So when I put it back into my Textbox it loses its linebreaks.
This behavior is actually by design for the XmlSerializer though -- and based on the standard!
The behavior is defined in the XML Spec, right here:
2.11 End-of-Line Handling
XML parsed entities are often stored in computer files which, for editing convenience, are organized into lines. These lines are typically separated by some combination of the characters carriage-return (#xD) and line-feed (#xA).
To simplify the tasks of applications, the characters passed to an application by the XML processor must be as if the XML processor normalized all line breaks in external parsed entities (including the document entity) on input, before parsing, by translating both the two-character sequence #xD #xA and any #xD that is not followed by #xA to a single #xA character.
Funny how a standard only annoys us when the one behavior they choose is what we are using. :-)
Luckily very cool MSFTie Elena Kharitidi (who I met years ago when she was working on the Jet team) pointed out how you can make sure that a serialize/deserialize will roundtrip a little better:
Normalizing string values is the default XmlSerializer behavior, but you can override it by configuring your XmlWriter before calling XMlSerialier.Serialize() method.
You need to use XmlWriter.Create() with XmlWriterSetting.NewLineHandling = NewLineHandling.Entitize.
There are other unfortunate side effects of choosing XML as your persistence format: there are ranges of characters (most notably the ones from 0x0 to 0x1F without TAB, CR, LF) that are considered illegal in XML 1.0. The default XmlWriter will write them, but default XmlReader will throw on read.
But if you use XmlTextReader, you can workaround this by setting Normalization=false; on the reader.
Okay, so there are ways to get the standards conformant behavior, and ways to get the other behavior when you need it. Of course somebody will still be unhappy with the defaults we choose, which is why we can't seem to win these situations! :-)
This post brought to you by "ඇ" (U+0d87, a.k.a. SINHALA LETTER AEYANNA)
# Peter Millard on 22 Dec 2005 5:19 AM:
# Maurits [MSFT] on 22 Dec 2005 12:34 PM:
# Michael S. Kaplan on 22 Dec 2005 12:40 PM:
# Maurits [MSFT] on 22 Dec 2005 1:17 PM:
# Gabe on 22 Dec 2005 1:26 PM:
# Michael S. Kaplan on 22 Dec 2005 1:33 PM:
# Jerry Pisk on 22 Dec 2005 2:20 PM:
# Michael S. Kaplan on 22 Dec 2005 2:30 PM:
# Maurits [MSFT] on 22 Dec 2005 2:33 PM:
# Michael S. Kaplan on 22 Dec 2005 2:40 PM:
TheLostBrain on 24 Apr 2008 4:42 PM:
Dude you rock! ;)
You just saved me a ton of time!
Thanks!
-TheLostBrain