You can just byte me

by Michael S. Kaplan, published on 2006/10/28 16:35 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/10/28/890105.aspx


Evan asked in one the many programming aliases:

Hi:

Anyone knows why there are 3 extra characters added to the XML file saved via XmlDocument?

I viewed the file from a hex editor and found 3 characters (0xEF 0xBB 0xBF) are added to the XML file saved.

I did a simple test to verify that:

          XmlDocument doc = new XmlDocument();
          doc.Load(“test.xml”);
          doc.Save(“test2.xml”);

I created test.xml in Notepad and view it with hex editor to make sure the first char is “<” (0x3C). And when I view test2.xml, I found the 3 extra characters. These characters are not viewable and don’t affect Notepad, IE, VS.NET from viewing it at all.

I wonder what are these characters needed for?

Thanks,

Evan

Indeed these three bytes are the well known and somewhat controversial UTF-8 incarnation of the Unicode Byte Order Mark. The controversy is of course whether it is needed in UTF-8, and it comes up on a somewhat regular (though thankfully infrequent) basis....

 

This post brought to you by U+FEFF, a.k.a. ZERO WIDTH NO-BREAK SPACE)


no comments

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day