by Michael S. Kaplan, published on 2006/06/04 11:13 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/06/04/616892.aspx
Regular reader Maurits sent me in email something he noticed the other day:
Notice on this page
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/nls_Collation.asp
that the characters in parentheses are being mis-displayed as windows-1252 pairs, rather than being UTF-8 decoded:
For example: "Many locales equate the ae ligature (æ) with the letters ae"
I did a little HTML snooping and found the reason for this.
The main page has this correct meta tag:
<META content="text/html;charset=utf-8" http-equiv="Content-Type">
(utf-8 is, in fact, the encoding being used)
... but the frame containing the actual interesting content has this incorrect meta tag:
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=windows-1252" />
Oops!!
In IE there's the workaround of right-clicking the frame and selecting UTF-8 from the Encoding submenu... but the page doesn't make a great deal of sense as-is. :(
Some people may recall when I talked about the similar problem I talked about previously in the post Linguistic and Unicode considerations (or Language-specific Processing #4).
Incorrect tagging is indeed annoying, all the more since even fixing it may not fix the problem.
The real fix is put all of MSDN and its content in UTF-8, and mark it as such, once and for all!
This post brought to you by ළ (U+0dc5, a.k.a. SINHALA LETTER MUURDHAJA LAYANNA)
referenced by