People notice when the sorting doesn't work

by Michael S. Kaplan, published on 2005/09/28 12:01 -04:00, original URI:

Bill Poser was talking about Multilingual Google earlier, and I noted an interesting bit toward the bottom:

Using Google in another language is a fun way to try out a language you don't know real well. It's easy to switch to a language you do know well if you get stuck and it isn't all that complicated.

I do have one small complaint (beyond the fact that they don't yet have all of my favorite languages), which is that they are evidently sorting the list of languages the same way no matter what language they are in, in the order of the Unicode codepoints. This yields unexpected results.

For example, on the Catalan list Arabic comes last, after Zulu, because the Catalan word for Arabic is Àrab and the À, whose Unicode codepoint is 0x00C0, follows all of the ASCII letters. Z is 0x005A. If Google really wanted to do things right, they would sort the names using the appropriate collating rules for each language.

This does indeed seem a little unfortunate to me. I know how hard I worked to get the language list on to be sorted according the chosen UI language (as we all know, browsers only have one language setting, so like everyone else I overloaded!).

Of course that may be wrong too if one knows what the user locale is, and the setting is different from the UI language. But both are preferable to Unicode code point order!

I will talk more about this particular user expectations topic soon -- it is an interesting one to me. :-)

# Maurits [MSFT] on 28 Sep 2005 12:36 PM:

You seem to have disdain for the Accept-Language header (see your comments in the post you link to.)

Why? It seems pretty well-thought-out from reading the RFC. And browsers support it:

And changing the settings does correctly modify the Accept-Language header, even to the q= factor:

default -- Accept-Language: en-us,en;q=0.5
modified -- Accept-Language: en-us,en;q=0.8,af;q=0.5,ar-iq;q=0.3

Internet Explorer:
default -- Accept-Language: en-us
modified -- Accept-Language: en-us,af;q=0.7,ar-iq;q=0.3

If there was one thing I could wish for, it would be a way to configure the browser to have per-site language preferences. I suppose that would be an IE8 feature request... along with per-site javascript preferences, per-site ActiveX preferences, per-site cookie preferences etc. etc. etc...

See Firefox per-site cookie policy:

# Michael S. Kaplan on 28 Sep 2005 12:42 PM:

My disdain has to do with the fact that we need more than one setting. But there is only one, so there is too much variety of usage.

# Maurits [MSFT] on 28 Sep 2005 2:04 PM:

Hmmm... interesting. What do you propose?

# John Bates on 28 Sep 2005 6:07 PM:

The site is a great example to show bosses how proper i18n support can work.

BTW, a minor bug: the UK English page ( shows two "English - United Kingdom" instead of one plus a English - US.

# Michael S. Kaplan on 28 Sep 2005 9:53 PM:

Thanks, John. Very kind words....

It is one of the big reasons behind having the site up there (I also wrote up some of the tehniques in an MSDN Magazine article).

And I fixed that bug with the name. Good catch. :-)

# Michael S. Kaplan on 28 Sep 2005 9:54 PM:

Maurits -- I am just pointing out the problem, not the solutions. Though obviously some additional settings would be helpful here.

# Maurits [MSFT] on 29 Sep 2005 11:36 AM:

Re: Accept-Language...

I still don't see the problem. Maybe an example would be useful.


FYI, the <nobr> tags in the language selection box throw Firefox 1.5b1 for a loop. Haven't checked Firefox 1.07.

# Michael S. Kaplan on 29 Sep 2005 12:11 PM:

Did you read the original post I had on this? The problem is that there are many possible language settings one might need, and they all get overloaded into this one:

* the user locale
* the user UI language
* the content

thus something using "fa" for Farsi might be something using Farsi formatting for dates, something in the Farsi language, or something about Fasrsi written in some other language entirely.

I know what it is supposed to be; the fact that people use it for any of these is indication enough of the need for another setting....

# Michael S. Kaplan on 29 Sep 2005 12:12 PM:

Ok, as for the Firefox bug --

It is a valid tag. I thought Firefox was 100% standards compliant? :-)

# Maurits [MSFT] on 29 Sep 2005 2:14 PM:

<nobr> is not a standard tag - It's a Microsoft extension, like <marquee>
"This object is an extension to HTML"

The W3's validator doesn't catch this at first because you don't include a doctype...

But if you manually specify HTML 4.01 Transitional via the DOCTYPE dropdown, it reports "element nobr undefined"

Still no excuse for Firefox. If Firefox can't handle the <nobr> Microsoft-ism, Firefox should ignore the <nobr> tags altogether.

# Maurits [MSFT] on 29 Sep 2005 2:28 PM:

Yes I read your original post. I just don't fully understand the distinction you're making.

The browser's UI locale should not be betrayed to the server, that's a privacy violation.

If the user wants to see a web page in Farsi, Accept-Language is appropriate, no?

If you make a distinction between the language of the "web page UI", and the language of the "web page content", then there are many ways to do this... but are there any compelling use cases for this? If so, what?

If so, then further development of the HTTP language settings may be appropriate. Or perhaps it would be more appropriate to simply enhance the usability of the browser's Language settings.

For example, instead of digging through Internet Options, how about a drop-down box of preferential languages on the IE status bar? This should show the names of the languages in the user's UI locale, not in the language of the web page.

# Maurits [MSFT] on 29 Sep 2005 2:49 PM:

Hmmm, somebody's already made the "language drop-down" for Firefox, as an extension:

I wonder if IE's Accept-Language header is one of the things that can be customized by extensions?

# Maurits [MSFT] on 29 Sep 2005 3:45 PM:

Untouched since 2003... *sigh*

go to newer or older post, or back to index or month or day