What's the encoding, again?

by Michael S. Kaplan, published on 2006/06/04 11:13 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/06/04/616892.aspx

Regular reader Maurits sent me in email something he noticed the other day:

Notice on this page
that the characters in parentheses are being mis-displayed as windows-1252 pairs, rather than being UTF-8 decoded:

For example: "Many locales equate the ae ligature (æ) with the letters ae"

I did a little HTML snooping and found the reason for this.

The main page has this correct meta tag:
<META content="text/html;charset=utf-8" http-equiv="Content-Type">
(utf-8 is, in fact, the encoding being used)
... but the frame containing the actual interesting content has this incorrect meta tag:
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=windows-1252" />

In IE there's the workaround of right-clicking the frame and selecting UTF-8 from the Encoding submenu... but the page doesn't make a great deal of sense as-is. :(

Some people may recall when I talked about the similar problem I talked about previously in the post Linguistic and Unicode considerations (or Language-specific Processing #4).

Incorrect tagging is indeed annoying, all the more since even fixing it may not fix the problem.

The real fix is put all of MSDN and its content in UTF-8, and mark it as such, once and for all!


This post brought to you by (U+0dc5, a.k.a. SINHALA LETTER MUURDHAJA LAYANNA)

no comments

referenced by

2008/04/23 That brings new meaning to having "a ç-section" (Ãç§), doesn't it?

2007/10/17 CSI: Unicode?

2006/12/23 Do not adjust your browser, a.k.a. sometimes two wrongs DO make a right, a.k.a. dumb quotes

go to newer or older post, or back to index or month or day