Hot or not? Well, I say 'Not', at least from an international text perspective!

by Michael S. Kaplan, published on 2007/09/06 03:01 -04:00, original URI:

Somebody told me tonight at dinner that that I have a huge fan base.

Funny, I didn't even know I had any huge fans. Well, if you are one of them then welcome. :-)

Anyway, I got home and someone who is a regular reader of this blog from Turkey told me via the contact link that they think the famous Hot or Not web site did not handle international text all that well. She figured I should check it out.

Sparing no expense (well, the site is free, technically), I headed out to investigate the mastter.

The job was admittedly made easier since I actually have an account on the site.

It was originally put up by someone else as a joke (no one has yet fessed up to it but I have my suspicions). I was eventually able to get ownership of "my" page, but even after I got ownership I left it up, mainly out of curiosity as to how I would rate with that picture that people around here seem to hate so much....

(For those who are curious, it rates a 6.9 after 92 votes, which according to them means I am hotter than 66% of the men on the site. Though I think this says more about the others than about me!))

Anyway, after arranging a "match" with the reader who reported the problem, I sent her a message with a bunch of dotted uppercase I's and dotless lower case ones. Basically with the following text:

I think you may be right -- it looks like this site can't handle the undotted lowercase 'i' very well, can it? That would make Turkish support a problem!

Quick test of some dotted uppercase and dotless lowercase I's to see if they work:


and then I sent it. She looked at it and told me it was corrupt.

When I looked at the message in the archive, I found that she was right! It looked like this:

but it is easy to diagnose a problem that simply shows corrupt text (rather than notdef glyphs or question marks); it is an encoding problem.

when I right-clicked on the page and looked at the source, lo and behold the page was marked as being cp1252:

  <meta http-equiv="Content-Language" content="en-us">
  <meta http-equiv="Content-Type" content="text/html; charset=windows-1252">

If I right click and change the encoding to UTF-8 and then everything looks right again:

Clearly they could do a lot better if they did not mark their pages as cp1252, or if nothing else converted the text to NCRs so they could work independently of the setting.

Given that they have counted over 12 billion votes from around the world, maybe they can work to better support other languages?

Until then, I have to say the answer to the Hot or Not? question when asked about the site is NOT, according to SiaO, until they can better handle other encodings....

I am not putting a link to the profile over there as this post is not intended to increase the hit count or the vote (or to decrease it either!). Since the picture has kind of reached a steady state in voters it is probably almost time to take it down anyway.

Just think of this as an example of the tremendous lengths that the investigatory branch of SiaO will go to in order to identify internationalization issues! :-)


This post brought to you by ꉹ∨¬ (U+a279 U+2228 U+00ac, a.k.a. YI SYLLABLE HOT, LOGICAL OR, NOT SIGN)

# Pavanaja U B on 7 Sep 2007 12:10 AM:

Yes. You have a huge fan base. And I am one of them. I am your fan due to these facts -

1. You are quite knowledgeable in i18n.

2. You disseminate that knowledge. If you were to be eligible for MVP award, definitely you could have been given one. I guess you were a MVP once upon a time.

3. You are not like other MS employees who are known for "beating around the bush". Majority fo them don't admit the bugs in MS products, they don't answer the queries with a straight forward answer (digitally - "YES" or "NO")



