IE7 exposes a flaw in Apache's support of basic language filtering described in RFC 4647

by Michael S. Kaplan, published on 2006/10/18 15:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/10/18/839439.aspx


Richard Ishida is concerned, as his I think IE7 is going to break language negotiation on many servers post indicates.

I do disagree with his conclusion, however.

Clearly, a bunch of people will blame Internet Explorer 7.0, just as he did. But that is nothing new since people are much more likely blame Microsoft if something goes wrong no matter whose fault it really is. :-)

But in this case, the whole point of the tagging mechanism in RFC4646 and RFC4647 is to allow for matching by successively looking at subpieces of tags. It is not a model that Microsoft follows in its own clientside managed and unmanaged resource tagging, but it does not disrespect that model in web situations.

If I am wanting Mexican Spanish content, then es-MX is the right tag to use, and es is what any reasonable web server will fall back to. As RFC4647 clearly indicates:

3.3.1.  Basic Filtering

   Basic filtering compares basic language ranges to language tags.
   Each basic language range in the language priority list is considered
   in turn, according to priority.  A language range matches a
   particular language tag if, in a case-insensitive comparison, it
   exactly equals the tag, or if it exactly equals a prefix of the tag
   such that the first character following the prefix is "-".  For
   example, the language-range "de-de" (German as used in Germany)
   matches the language tag "de-DE-1996" (German as used in Germany,
   orthography of 1996), but not the language tags "de-Deva" (German as
   written in the Devanagari script) or "de-Latn-DE" (German, Latin
   script, as used in Germany). 

This scheme is of course available in earlier versions of the standard as well.

Now I know Microsoft is new to this whole "following international standards and adding nothing proprietary" game, but shouldn't Richard's actual post claim what my post title does (IE7 exposes a flaw in Apache's support of basic language filtering described in RFC 4647) rather than what he did post?

With that said, I think his suggestions make sense. But I imagine it is always easier to listen to suggestions when the tone is not quite so accusatory. :-)

 

This post brought to you by e (U+0065, a.k.a. LATIN SMALL LETTER E)


# Mihai on 18 Oct 2006 5:47 PM:

<<people are much more likely blame Microsoft if something goes wrong>>

Just statistics.

If 95% of the market is Microsoft, then if something is wrong there are 95% chances the cause is Microsoft.

# Michael S. Kaplan on 18 Oct 2006 7:10 PM:

Not actually true, Mihai -- since most of the programs and utilities and device drivers that run on Windows are not from Microsoft! :-)

# Richard Ishida on 18 Oct 2006 7:17 PM:

Hello Michael. I'm not accusing or blaming anyone of anything.  I'm just pointing out what I think will potentially become a problem for a lot of people with existing setups on servers such as Apache 1.3. I want to make sure that that is considered. The IE team are doing a lot of good things with IE7. Cheers.

# Michael S. Kaplan on 18 Oct 2006 7:47 PM:

Right, but is it fair to say "I think IE7 is going to break language negotiation on many servers" if it is Apache that has the flaw? IE7 is perhaps facillitating here, but they are hardly breaking anything; the break was always there if these servers were not handling the RFC correctly....

Isn't this the point where someone points out the advantage of open source software being so much more agile since the bug fixes can happen sooner? :-)

# Nick Lamb on 18 Oct 2006 8:57 PM:

Apache is a web server, it implements HTTP/1.1 as per RFC 2616 which specifies its own language matching rule. The rules suggested in RFC 4647 could be included into a standard (such as a hypothetical HTTP/1.2) by reference, and are reasonable choices for any protocol where matching is obviously needed but no rules are specified in the standard. But they obviously don't apply to HTTP/1.1, SPARQL or to anything else that has its own rules about language matching.

The Apache documentation is pretty clear about the rules that are actually used, and it was linked from the original article so there's not much excuse.

# Ben Cooke on 19 Oct 2006 2:53 AM:

Rather than this "everybody hates me! I think I'll go eat worms!" attitude, I think it's more accurate to say that everyone blames whoever breaks the status quo. Even if the status quo is wrong.

For example:

* Any time someone proposes a new protocol that doesn't work through NAT, everyone cries foul at that protocol for being broken, when in reality it is NAT that is broken.

* Any time Microsoft releases a new OS that breaks old apps (which, since Microsoft invests so much time into testing this, happens less often than you'd expect) everyone breaks MS rather than the old apps, because it all worked until MS released the new OS, even though the new OS is complying to the letter with the API.

* Everyone (except the geeks who know better, of course!) blames Firefox for not being compatible with websites they visit that were designed for IE, even though it's those sites that are at fault for not complying with the standards. Consequently, Firefox is forced to emulate IE's non-standard behavior to be viable in the market.

There are lots of things out there that work "well enough" until someone comes along and does something "more correctly" and makes a fool out of the person who was too lazy to implement something properly in the first place. People don't like looking like fools, and third parties just see that "it works in IE6 but not in IE7". I'd say the lesson here is to do things right the first time and then this kind of situation won't arise.

# Nick Lamb on 19 Oct 2006 5:44 AM:

Also, reading the IEblog article and RFC 4647 together, an Accept-Language range like "es-MX" as emitted by IE7 is neither an exact match for "es" on the web server, nor a prefix match. The RFC 4647 matching is in this sense equivalent to that in RFC 2616 here.

In plain English, if you ask specifically for Mexican Spanish, then it's not standards compliant to send you some other variant of Spanish. Similarly if you ask for a JPEG, you won't be sent a GIF. If you just want "any kind of image" you can ask for that and if you're happy with any kind of Spanish you can ask for that too, indeed the HTTP specification gently reminds readers that user agents need to put the work in to ensure the user gets what they're expecting here, apparently that work wasn't done in IE7.

Perhaps the terminology isn't familiar to Michael, which explains his mistake, but it's pretty disappointing if the IE team doesn't have anyone that actually knows about HTTP or language tag matching.

# Michael S. Kaplan on 19 Oct 2006 10:18 AM:

This is not the understanding that many people (including the authors) have about the rules of basic language filtering, nor does it match the intent of 4647 or its predecessors, that consider prefixes to tags to be a valid *fallback*.

Looking further into 4647 in the lookup section (3.4) it gives such an example explicitly:

  Example of a Lookup Fallback Pattern

  Range to match: zh-Hant-CN-x-private1-private2

  1. zh-Hant-CN-x-private1-private2

  2. zh-Hant-CN-x-private1

  3. zh-Hant-CN

  4. zh-Hant

  5. zh

  6. (default)

Notice #5 there? And honestly if the claim is that "zh-Hant-CN-x-private1-private2" can be expected to find "zh" in a lookup, how can "es-MX" not be expected to find "es"?

Earlier in the process of the 3066 update, Microsoft attempted to get its own "language-region-script" tags added as aliases to the "language-script-region" tags in the RFC, and after discussion they rejected this idea, since they preferred the simplicity of this subtag matching for fallback, and not adding the complication of having to do changes to order, which would break simple parsing algorithms. And they were quite clear on the intent and purpose....

# Nick Lamb on 19 Oct 2006 11:20 AM:

Michael, you're looking at section 3.4, which is for lookup, not filtering. Accept-Language is a filter header, so section 3.4 wouldn't be appropriate even if HTTP/1.1 relied on 3066 or 4647 rather than, as I've already said, specifying its own rules.

Lest it be thought that the consequences just weren't clear to the authors of the HTTP/1.1 specification, here's what the document itself says...

Note: When making the choice of linguistic preference available to the user, we remind implementors of the fact that users are not familiar with the details of language matching as described above, and should provide appropriate guidance. As an example, users might assume that on selecting "en-gb", they will be served any kind of English document if British English is not available. A user agent might suggest in such a case to add "en" to get the best matching behavior.

# MickMon on 7 Nov 2006 3:10 PM:

OK - so there's clear agreement here that there is in fact an issue, albeit disagreement as to whose 'fault' it is.

Michael - a couple of questions:

1. Do you have any idea as to when the next IE7 update will be released?

2. Would you be able to address this issue in that update, and somehow amend how languages are specified?

Thanks,

~mm

# Michael S. Kaplan on 7 Nov 2006 3:24 PM:

Well, I'm not entirely clear that the nature/existence of a problem and its solution are agreed upon by all, but even if so and if everyone thought that IE's behavior should change, I am not even on the IE team -- seems like an issue to raise in the IE blog. :-)

In any case, I hear a lot of noise about how open source is more agile and they can get fixes in sooner and all that. So why can't the Apache web servers step up here and prove it? It would be a great case of "putting their money where their mouth is" here...

# Jeff on 31 Oct 2007 7:37 PM:

Who is Michael Kaplan anyway?  He said, "In any case, I hear a lot of noise about how open source is more agile and they can get fixes in sooner and all that."

I say, Microsoft has not been agile since DOS days.  I used to call Microsoft and get someone that actually made a fix to code within two or three days.  The last time I called was a couple years ago.  She barely spoke English so I added an especially southern drawl to everything I said.  Finally she had a supervisor call me the next day.  (That was strange.)  The supervisor hadn't a clue - I can't even recall the topic it's been so long ago.  In other words, calling Microsoft has become a joke - even if you have your credit card handy!

If Michael Kaplan has the interest in an Open Source feature, I'm sure the Open Source world would be happy to consider his updates or the updates he funds would be considered for inclusion in future Open Source releases.  Please make sure none of that crappy Microsoft code gets into the contributed code.  Those donkeys at Microsoft likely stole most of their code from UNIX anyway.  Then they turn around and claim ownership.  

I've bought hundreds of Microsoft packages in the past.  Never again!  I would fire anyone that put Vista on their machine.  XP almost got stability correct after two major updates and many subsequent monthly fixes.  Let's enum: a few DOS versions, Windows 3.1, Windows 95, Windows 98, Windows ME and  Windows XP.  Did I miss any?  Is that 20 years of failure before an almost stable OS was created?  

Incredible!

# Michael S. Kaplan on 31 Oct 2007 8:13 PM:

When you have over 80,000 employees, you are more complicated. Great newsflash!

Did you have a response to the actual issue raised?


Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day