You can just byte me

by Michael S. Kaplan, published on 2006/10/28 16:35 -04:00, original URI:

Evan asked in one the many programming aliases:


Anyone knows why there are 3 extra characters added to the XML file saved via XmlDocument?

I viewed the file from a hex editor and found 3 characters (0xEF 0xBB 0xBF) are added to the XML file saved.

I did a simple test to verify that:

          XmlDocument doc = new XmlDocument();

I created test.xml in Notepad and view it with hex editor to make sure the first char is “<” (0x3C). And when I view test2.xml, I found the 3 extra characters. These characters are not viewable and don’t affect Notepad, IE, VS.NET from viewing it at all.

I wonder what are these characters needed for?



Indeed these three bytes are the well known and somewhat controversial UTF-8 incarnation of the Unicode Byte Order Mark. The controversy is of course whether it is needed in UTF-8, and it comes up on a somewhat regular (though thankfully infrequent) basis....


This post brought to you by U+FEFF, a.k.a. ZERO WIDTH NO-BREAK SPACE)

no comments

