7-bit UTF-8?

by Michael S. Kaplan, published on 2006/05/26 20:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/05/26/608289.aspx


A while back, regular reader 'Maurits' noted in the Suggestion Box:

Just submitted my first PSS support case (for an unrelated issue.)

The email confirmation I received had the following amusing snippet in the headers:

Content-Type: text/plain;
charset="utf-8"
Content-Transfer-Encoding: 7bit

I was a little amused, myself. But in searching around I found an Internet Draft or two which seemed a little bit relevant. It seems like this might even be possible, and may have a valid meaning?

Or maybe not. I suppose there is some meaning -- like UTF-8 that contains only 7-bit stuff (that weird state that causes Notepad to add the UTF-8 BOM that everyone hates so much!).

Call me crazy, but I thought that is what UTF-7 was intended for. :-)

 

This post brought to you by "рее" (U+0965, a.k.a. DEVANAGARI DOUBLE DANDA)


# Thomas on 27 May 2006 12:01 AM:

Since UTF-8 is ASCII compatiable, wouldnt that just give you the ASCII subset?

# Michael S. Kaplan on 27 May 2006 12:29 AM:

More or less, yeah. Which kind of limits the usefulness of saying it is UTF-8. :-)

# Dean Harding on 28 May 2006 9:42 PM:

It's probably some logic in the mailer, which just sets "utf-8" by default, then does some heuristics to work out the best Content-Transfer-Encoding (for example, all ASCII means it puts "7-bit", mostly ASCII + a few latin extended chars would be "quoted-printable", lots of han ideographs would be "base64" etc).

It's not the strangest thing I've seen when dealing with email :-)

# Tom Gewecke on 29 May 2006 9:13 PM:

I don't think it matters what the "charset" of a message content is, UTF-8 or some ISO or Big5 or GB, the usual "content tranfer encodings" used for mail, 7-bit, QP, an B64, all transform the original stuff into 7-bit ascii for transmission.

go to newer or older post, or back to index or month or day