by Michael S. Kaplan, published on 2005/10/28 03:31 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/10/28/486034.aspx
The other day, my manager's manager's manager Delan was looking at various web sites and on one of them there was an unusual display issue on several pages.
Basically, each of the pages in question had text on them that looked like this:
"?/font>"
Very odd, and maybe a little frustrating too.
Of course when you are the head Globalization Infrastructure, Fonts, and Tools, what better place to start then with some of the font experts on your team? I mean, clearly something was messing with a font tag....
In actuality it wasn't a font issue. After a few hours, the net was widened a bit and I ended up on a mail. It kind of reminded me of my MSLU days when a string was converted to the wrong code page when just a few characters were wrong. So based on that, I responded thusly:
I would suggest looking at the source on the page to see what might be next to those font tags, and check the IE detected encoding to see if matches the page's encoding -- it may be a CJK font name that is being misunderstood and combined with bytes of the less than sign.
The page was sent on to me. So I set the encoding in IE (which for me was going through AutoDetect thinking the page was in Windows 1252) to be Chinese Traditional (Big5), and suddenly all of the news items that had bullets (0x95 or U+2022) wrapped in <font> tags had seen the bytes of the bullet and the the less than sign turned into a question mark.
Now as it turns out she was actually having the page Auto Detected as being Chinese Simplified (GB2312), but the results were the same -- U+2022 U+003c (•<) which for me was being read as 0x95 0x3c was for her being converted to "?" (since 0x953c is undefined on both code pages 936 and 950, in the former a lead byte with an illegal trail byte and in the former an unused lead byte with no assigned trail bytes).
The page itself:
had no charset meta tag and clearly the server was not communicating the charset. We both had the AutoDetect checkbox set (IE6 for me and IE7 for her), but clearly it was not detecting much to distinguish the page from the bias of our own individual locale settings.
Wouldn't the illegal sequence have been a good indication that the AutoDetect guess was wrong? And isn't the lack of any charset bad too? And that lack of other communication about the charset from the server?
Of course it was a page from MSN, so I figure we found at least three bugs in various Microsoft offerings from the exercise, which was actually a lot of fun, too! :-)
This post brought to you by "•" (U+2022, a.k.a. BULLET)
# CornedBee on 28 Oct 2005 5:39 AM:
# Will on 28 Oct 2005 7:06 AM:
# Ben Bryant on 28 Oct 2005 7:43 AM:
# Maurits [MSFT] on 28 Oct 2005 2:20 PM:
Jimmy on 31 Oct 2011 8:30 AM:
It could be a bug of IE or simply the error of the coder. I wonder why people still use the old way to stylize though. CSS is a lot more efficient and it can eliminate almost formatting errors.
Jimmy from learnhowtomakearesume.com