Culture names are not region names (and vice versa)

by Michael S. Kaplan, published on 2005/01/02 02:12 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/01/02/345393.aspx


Back in May of 2004, someone asked the following question in the newsgroups:

I'm programming in ASP.NET. How could I convert my date to UK format ?

Thanks in advance.

To which someone else suggested the following code:

DateTime x=DateTime.Now;
string s=x.ToString(System.Globalization.CultureInfo.CreateSpecificCulture("uk").DateTimeFormat);

To his credit, this code was compiled and ran; in the end he was convinced he had the solution.

However, there are many problems with this code:

So why did the code run, you may skeptically be thinking? Though probably not if you thought about the title of this post!

Well, it has to with the overlap of ISO 639 (the two letter language codes) and ISO 3166 (the two letter region codes). Or in his case, using the wrong abbreviation in the wrong method and having it find some random region that happens to use the abbreviation (in this case the Ukranian neutral locale was being created).

This is kinda unavoidable since

  1. there are only so many two-letter combinations
  2. language and country names are not always the same
  3. there is at least an attempt to make the string choice for each somewhat meaningful
  4. the two standards are handled by two different committees that do not work together on assignments

See the following table for a list of the overlaps. The bad overlaps where you will get wrong results are marked in red.

Code

Language

Region

ar

Arabic

Argentina

az

Azeri

Azerbaijan

be

Belarusian

Belgium

bg

Bulgarian

Bulgaria

ca

Catalan

Canada

de

German

Germany

es

Spanish

Spain

fi

Finnish

Finland

fo

Faroese

Faroe Islands

fr

French

France

hr

Croatian

Croatia

hu

Hungarian

Hungary

id

Indonesian

Indonesia

is

Icelandic

Iceland

it

Italian

Italy

lt

Lithuanian

Lithuania

lv

Latvian

Latvia

mk

Macedonian

FYROM

mn

Mongolian

Mongolia

nl

Dutch

The Netherlands

no

Norwegian

Norway

pa

Punjabi

Panama

pl

Polish

Poland

pt

Portuguese

Portugal

ro

Romanian

Romania

ru

Russian

Russia

sa

Sanskrit

Saudi Arabia

sk

Slovak

Slovakia

sv

Swedish

El Salvador

th

Thai

Thailand

tr

Turkish

Turkey

tt

Tatar

Trinidad and Tobago

uz

Uzbek

Uzbekistan

For the full lists of cultures and regions, see the CultureInfo and RegionInfo help topics.

This very non-bug has been reported internally by people wondering why creating a RegionInfo from a neutral (region-less) culture like "AR" was succeeding, as they were not thinking about how when you pass it to a RegionInfo you are not getting "Arabic" you are getting "Argentina".

Of course, this is just a symptom about a larger problem that affects international testing in general, which I will be talking about in a future post. Think of this post as describing the tip of the iceberg. :-)

1 - In this specific case the method would quite literally return the wrong results even if you passed the right language associated with the UK -- en, because the default culture returned from CreateSpecificCulture would be en-US, not en-GB.

 

This post sponsored by "o", "ο", and "о" (U+006f, U+03bf, and U+043e; a.k.a. LATIN SMALL LETTER O; GREEK SMALL LETTER OMICRON; CYRILLIC SMALL LETTER O).


# Barry Dorrans on 2 Jan 2005 1:27 AM:

The whole GB/UK thing is one great big mess up caused by ISO and whoever requested GB in the first place.

They are actually different things. The UK is, roughly speaking, "The United Kingdom of Great Britain and Nothern Ireland" (plus some little islands dotted around the coast).

Domain names use UK, as politically speaking that's correct, otherwise anyone in Northern Ireland would be left without a region specific domain suffix.

However UK is not a valid country code in ISO 3166, it's GB or GBR. Janet (the academic network) where asked to persue ISO to see if GB could be switched to UK, but ISO politely said now. ISO don't seem to want to issue UK to anyone. For example Ukraine was given UA (although perhaps that makes sense culturally, I don't know what ukrainain for Ukraine is).

The arguement over .UK has been going on for at least 20 years. It would have been nice to have EN-UK as an "alias" for "EN-GB" in the .net framework, even if it wasn't strictly ISO compliant.

Just in case there are people out there who can't help reading imperialism into geography, I should point out the "Great" long predates the Empire, and is a geographical description to differentiate Great Britain from Little Britain (Brittany).

# Michael Kaplan on 2 Jan 2005 5:33 AM:

Well, it is best not to obsess about the two-letter code's letters *too* much.

Changing a code would affect a ton of processes and computer systems even for a country not often used -- for this one it would obviously be huge.

Here at Microsoft we get a lot of grief for when we do not following standards, and when there are times (like this one) that a standard will work but there may be a slightly "nicer" solution for customers, we have to weigh the cost/benefit of going with the standard. In the end it is just two letters, and they are in many cases arbitrary (UA for the Ukraine? Feh!).

# Christian Mogensen on 2 Jan 2005 11:24 AM:

Aliases are evil in cases like this.
What happens when the alias "en-uk" gets sent to another machine? Most likely the wrong thing. (Pounds get treated as dollars, centigrade as farenheit...)

So bravo for Microsoft for sticking to the list, even if it seems like an odd list.

I wonder if an enum would have been a better solution than the string codes we have ended up with?

# Michael Kaplan on 2 Jan 2005 11:39 AM:

The enum might have been possible, but it would not have been a very extensible solution when one considers the support for custom cultures that is being added to the 'Whidbey' release of the .NET Framework!

# Barry Dorrans on 2 Jan 2005 1:25 PM:

Custom cultures is an interesting one; you just know someone will do EN-UK within 20 seconds. You really can't win this one though, EN-GB means that there is, strictly speaking, no culture code for Northern Ireland.

Still, hopefully that'll make euro migration easier when it finally happens.

# Michael Kaplan on 2 Jan 2005 1:44 PM:

Indeed! And people will be able to do things such as that quite easily.

I'll be putting up some sample code that shows custom cultures off sometime soon, so people can see them in action.

# Michael Giagnocavo on 2 Jan 2005 4:30 PM:

Are custom cultures good as they sound? Meaning we can finally do es-US?

(I hope MSN picks up on this feature, as I'd like to give them my geographic location (GT) but without being sent Spanish content.)

BTW, why was ㅇ(U+3147 Hangul Letter Ieung) discriminated against for sponsorship?

# Michael Kaplan on 2 Jan 2005 5:50 PM:

They are (in my opinion!). I do not know to what extent MSN is using managed code and cultures, and custom locales are still a ways off....

No discrimination, here -- its up to the characters to sponsor things. :-)

# Norman Diamond on 4 Jan 2005 5:54 PM:

Since the string "UK" is not the string "UA" and the string "en-UK" is not the string "en-US", why did those function calls yield results? Wouldn't straight error returns help programmers figure out that the correct solution is something else?

By the way, in the human readable (longer string) version of "en-gb", do you know what "B.K." stands for? It sort of looks like someone took the second half of "G.B." and the second half of "U.K.".

# Michael Kaplan on 4 Jan 2005 6:03 PM:

Please read what was posted -- they are picking up the UKRANIAN LANGUAGE (which is UK) not the UKRAINE REGION (which is UA).

Tf you call the wrong method and you pass yh wrong parameter, how surpised should you be if you get the wrong results?

And how can we know that it was a mistake, anyway?

# Norman Diamond on 4 Jan 2005 11:13 PM:

> Please read what was posted -- they are
> picking up the UKRANIAN LANGUAGE

When I read what was posted, I saw this:

> (in this case the Ukranian neutral locale
> was being created).

Taking these two together with one of your other postings, I'm gradually getting the idea that "neutral locale" and "language" are synonyms at Microsoft, but this wasn't intuitively obvious when first reading.

> you pass yh wrong parameter, how surpised
> should you be if you get the wrong results?

Sure, but I didn't realize that UK was a valid language code.

> And how can we know that it was a mistake,
> anyway?

Of course, if it's a valid language code then you must accept it. I both didn't know that UK was a valid language code and didn't yet know that "neutral locale" meant language, so inferred that a match was being chosen based on the first "U". Indeed I am glad to learn that was not the case.

# Michael Kaplan on 5 Jan 2005 12:11 AM:

UKRANIAN == a language

UKRAINE == a region

Ukranian - Ukraine == a full locale

Ukranian - Neutral == a neutral locale, which is to say regionless locale.

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2011/07/06 Maybe there is a movement of people who want to send an ever-so-respectful BITE ME

2005/01/04 Why international test is an art (and why there are few fine artists)

go to newer or older post, or back to index or month or day