IE7 exposes a flaw in Apache's support of basic language filtering described in RFC 4647

by Michael S. Kaplan, published on 2006/10/18 15:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/10/18/839439.aspx

Clearly, a bunch of people will blame Internet Explorer 7.0, just as he did. But that is nothing new since people are much more likely blame Microsoft if something goes wrong no matter whose fault it really is. :-)

But in this case, the whole point of the tagging mechanism in RFC4646 and RFC4647 is to allow for matching by successively looking at subpieces of tags. It is not a model that Microsoft follows in its own clientside managed and unmanaged resource tagging, but it does not disrespect that model in web situations.

If I am wanting Mexican Spanish content, then es-MX is the right tag to use, and es is what any reasonable web server will fall back to. As RFC4647 clearly indicates:

Now I know Microsoft is new to this whole "following international standards and adding nothing proprietary" game, but shouldn't Richard's actual post claim what my post title does (IE7 exposes a flaw in Apache's support of basic language filtering described in RFC 4647) rather than what he did post?

With that said, I think his suggestions make sense. But I imagine it is always easier to listen to suggestions when the tone is not quite so accusatory. :-)

Hello Michael. I'm not accusing or blaming anyone of anything. I'm just pointing out what I think will potentially become a problem for a lot of people with existing setups on servers such as Apache 1.3. I want to make sure that that is considered. The IE team are doing a lot of good things with IE7. Cheers.

Right, but is it fair to say "I think IE7 is going to break language negotiation on many servers" if it is Apache that has the flaw? IE7 is perhaps facillitating here, but they are hardly breaking anything; the break was always there if these servers were not handling the RFC correctly....

Isn't this the point where someone points out the advantage of open source software being so much more agile since the bug fixes can happen sooner? :-)

Apache is a web server, it implements HTTP/1.1 as per RFC 2616 which specifies its own language matching rule. The rules suggested in RFC 4647 could be included into a standard (such as a hypothetical HTTP/1.2) by reference, and are reasonable choices for any protocol where matching is obviously needed but no rules are specified in the standard. But they obviously don't apply to HTTP/1.1, SPARQL or to anything else that has its own rules about language matching.

The Apache documentation is pretty clear about the rules that are actually used, and it was linked from the original article so there's not much excuse.

Rather than this "everybody hates me! I think I'll go eat worms!" attitude, I think it's more accurate to say that everyone blames whoever breaks the status quo. Even if the status quo is wrong.

For example:

* Any time someone proposes a new protocol that doesn't work through NAT, everyone cries foul at that protocol for being broken, when in reality it is NAT that is broken.

* Any time Microsoft releases a new OS that breaks old apps (which, since Microsoft invests so much time into testing this, happens less often than you'd expect) everyone breaks MS rather than the old apps, because it all worked until MS released the new OS, even though the new OS is complying to the letter with the API.

* Everyone (except the geeks who know better, of course!) blames Firefox for not being compatible with websites they visit that were designed for IE, even though it's those sites that are at fault for not complying with the standards. Consequently, Firefox is forced to emulate IE's non-standard behavior to be viable in the market.

There are lots of things out there that work "well enough" until someone comes along and does something "more correctly" and makes a fool out of the person who was too lazy to implement something properly in the first place. People don't like looking like fools, and third parties just see that "it works in IE6 but not in IE7". I'd say the lesson here is to do things right the first time and then this kind of situation won't arise.

Also, reading the IEblog article and RFC 4647 together, an Accept-Language range like "es-MX" as emitted by IE7 is neither an exact match for "es" on the web server, nor a prefix match. The RFC 4647 matching is in this sense equivalent to that in RFC 2616 here.

In plain English, if you ask specifically for Mexican Spanish, then it's not standards compliant to send you some other variant of Spanish. Similarly if you ask for a JPEG, you won't be sent a GIF. If you just want "any kind of image" you can ask for that and if you're happy with any kind of Spanish you can ask for that too, indeed the HTTP specification gently reminds readers that user agents need to put the work in to ensure the user gets what they're expecting here, apparently that work wasn't done in IE7.

Perhaps the terminology isn't familiar to Michael, which explains his mistake, but it's pretty disappointing if the IE team doesn't have anyone that actually knows about HTTP or language tag matching.

This is not the understanding that many people (including the authors) have about the rules of basic language filtering, nor does it match the intent of 4647 or its predecessors, that consider prefixes to tags to be a valid *fallback*.

Looking further into 4647 in the lookup section (3.4) it gives such an example explicitly:

Example of a Lookup Fallback Pattern

Range to match: zh-Hant-CN-x-private1-private2

1. zh-Hant-CN-x-private1-private2

2. zh-Hant-CN-x-private1

3. zh-Hant-CN

4. zh-Hant

5. zh

6. (default)

Notice #5 there? And honestly if the claim is that "zh-Hant-CN-x-private1-private2" can be expected to find "zh" in a lookup, how can "es-MX" not be expected to find "es"?

Earlier in the process of the 3066 update, Microsoft attempted to get its own "language-region-script" tags added as aliases to the "language-script-region" tags in the RFC, and after discussion they rejected this idea, since they preferred the simplicity of this subtag matching for fallback, and not adding the complication of having to do changes to order, which would break simple parsing algorithms. And they were quite clear on the intent and purpose....

Michael, you're looking at section 3.4, which is for lookup, not filtering. Accept-Language is a filter header, so section 3.4 wouldn't be appropriate even if HTTP/1.1 relied on 3066 or 4647 rather than, as I've already said, specifying its own rules.

Lest it be thought that the consequences just weren't clear to the authors of the HTTP/1.1 specification, here's what the document itself says...

Note: When making the choice of linguistic preference available to the user, we remind implementors of the fact that users are not familiar with the details of language matching as described above, and should provide appropriate guidance. As an example, users might assume that on selecting "en-gb", they will be served any kind of English document if British English is not available. A user agent might suggest in such a case to add "en" to get the best matching behavior.

OK - so there's clear agreement here that there is in fact an issue, albeit disagreement as to whose 'fault' it is.

Michael - a couple of questions:

1. Do you have any idea as to when the next IE7 update will be released?

2. Would you be able to address this issue in that update, and somehow amend how languages are specified?

Thanks,

~mm

Well, I'm not entirely clear that the nature/existence of a problem and its solution are agreed upon by all, but even if so and if everyone thought that IE's behavior should change, I am not even on the IE team -- seems like an issue to raise in the IE blog. :-)

In any case, I hear a lot of noise about how open source is more agile and they can get fixes in sooner and all that. So why can't the Apache web servers step up here and prove it? It would be a great case of "putting their money where their mouth is" here...

Who is Michael Kaplan anyway? He said, "In any case, I hear a lot of noise about how open source is more agile and they can get fixes in sooner and all that."

I say, Microsoft has not been agile since DOS days. I used to call Microsoft and get someone that actually made a fix to code within two or three days. The last time I called was a couple years ago. She barely spoke English so I added an especially southern drawl to everything I said. Finally she had a supervisor call me the next day. (That was strange.) The supervisor hadn't a clue - I can't even recall the topic it's been so long ago. In other words, calling Microsoft has become a joke - even if you have your credit card handy!

If Michael Kaplan has the interest in an Open Source feature, I'm sure the Open Source world would be happy to consider his updates or the updates he funds would be considered for inclusion in future Open Source releases. Please make sure none of that crappy Microsoft code gets into the contributed code. Those donkeys at Microsoft likely stole most of their code from UNIX anyway. Then they turn around and claim ownership.

I've bought hundreds of Microsoft packages in the past. Never again! I would fire anyone that put Vista on their machine. XP almost got stability correct after two major updates and many subsequent monthly fixes. Let's enum: a few DOS versions, Windows 3.1, Windows 95, Windows 98, Windows ME and Windows XP. Did I miss any? Is that 20 years of failure before an almost stable OS was created?

Incredible!