What's the encoding, again?

by Michael S. Kaplan, published on 2006/06/04 11:13 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/06/04/616892.aspx


Regular reader Maurits sent me in email something he noticed the other day:

Notice on this page
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/nls_Collation.asp
that the characters in parentheses are being mis-displayed as windows-1252 pairs, rather than being UTF-8 decoded:

For example: "Many locales equate the ae ligature (æ) with the letters ae"

I did a little HTML snooping and found the reason for this.

The main page has this correct meta tag:
<META content="text/html;charset=utf-8" http-equiv="Content-Type">
(utf-8 is, in fact, the encoding being used)
... but the frame containing the actual interesting content has this incorrect meta tag:
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=windows-1252" />
Oops!!

In IE there's the workaround of right-clicking the frame and selecting UTF-8 from the Encoding submenu... but the page doesn't make a great deal of sense as-is. :(

Some people may recall when I talked about the similar problem I talked about previously in the post Linguistic and Unicode considerations (or Language-specific Processing #4).

Incorrect tagging is indeed annoying, all the more since even fixing it may not fix the problem.

The real fix is put all of MSDN and its content in UTF-8, and mark it as such, once and for all!

 

This post brought to you by (U+0dc5, a.k.a. SINHALA LETTER MUURDHAJA LAYANNA)


no comments

referenced by

2008/04/23 That brings new meaning to having "a ç-section" (Ãç§), doesn't it?

2007/10/17 CSI: Unicode?

2006/12/23 Do not adjust your browser, a.k.a. sometimes two wrongs DO make a right, a.k.a. dumb quotes

go to newer or older post, or back to index or month or day