Where are the other Tamils?

by Michael S. Kaplan, published on 2006/05/18 23:40 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/05/18/601531.aspx


Suren asked, after reviewing Vista:

Hello,
There is a Language called Tamil. it's in India & Sri Lanka as Offcial language. but in vista or in IE7 . It is only in india . what i mean is in english there is english US, UK, india, etc.. but in Tamil, there is no Tamil Sri Lanka only Tamil Iindia. Even in IE7 when you run the IE7 for the 1st time you get a webpage asking your counry & Lang. In that field there is no  "Sri Lanka - Tamil" only "Sri
Lanka - Sinhala" is there. Please fix this.

Also, when we change the lang to tamil & type the latters are appearing half size( what i mean is if english is here "-" tamil font size is here _.

To better understand the problem, a person who knows Tamil in MS should read this or you should ask him.

Thank you

Well, I will start with that first issue, the fact that only locale is listed for Tamil (India, as Suren mentioned), despite the fact that e.g. the Ethnologue describes it as being commonly spoken in Malaysia, Singapore, Sri Lanka, Mauritius and Réunion (I really only knew of the first three!).

As I pointed out in Does MS pull new locales out of their [insert stronger language here]?, the issues involved in locales being added are complicated; even though it appears as nothing more than an item on the locale list, there is a lot of data behind it that has to be verified with people and tested and so forth. And there is only so much time, so looking at the over 6000 languages of the world, it is clear that even the 200 locales of Vista will not make too large of a dent in the big language list.

But as I have pointed out, we are all about opening it all up and getting out of the way, so let's see if we can take all that stuff about custom locales and try to do something here....

We'll take some managed code that uses the CultureAndRegionInfoBuilder, RegionInfo, and CultureInfo classes to mix a few of the built-in cultures and regions, make a few other changes to the locales to make them work right, and create some new locales!

The code would be something like this:

using System;
using System.Globalization;

namespace CustomLocales {
    class CustomLocales {
        [STAThread]
        static void Main() {
            CultureInfo ciTamil = new CultureInfo("ta-IN", false);

            CultureAndRegionInfoBuilder caribMY = new CultureAndRegionInfoBuilder("ta-MY", CultureAndRegionModifiers.None);
            RegionInfo riMY = new RegionInfo("en-MY");
            caribMY.LoadDataFromCultureInfo(ciTamil);
            caribMY.LoadDataFromRegionInfo(riMY);
            caribMY.CultureEnglishName = "Tamil (Malaysia)";
            caribMY.CultureNativeName = "தமிழ் (மலேசியா)";
            caribMY.CurrencyEnglishName = riMY.CurrencyEnglishName;
            caribMY.CurrencyNativeName = "ரிங்கிட்";
            caribMY.RegionNativeName = "மலேசியா";
            caribMY.NumberFormat.CurrencySymbol = "RM";
            caribMY.ThreeLetterWindowsLanguageName = "TAY";
            caribMY.IetfLanguageTag = caribMY.CultureName;
            caribMY.Save("ta-MY.ldml");
            caribMY.Register();

            CultureAndRegionInfoBuilder caribLK = new CultureAndRegionInfoBuilder("ta-LK", CultureAndRegionModifiers.None);
            RegionInfo riLK = new RegionInfo("si-LK");
            caribLK.LoadDataFromCultureInfo(ciTamil);
            caribLK.LoadDataFromRegionInfo(riLK);
            caribLK.CultureEnglishName = "Tamil (Sri Lanka)";
            caribLK.CultureNativeName = "தமிழ் (இலங்கை)";
            caribLK.CurrencyEnglishName = riLK.CurrencyEnglishName;
            caribLK.CurrencyNativeName = "ரூபாய்";
            caribLK.RegionNativeName = "இலங்கை";
            caribLK.NumberFormat.CurrencySymbol = "உரூபா";
            caribLK.ThreeLetterWindowsLanguageName = "TAL";
            caribLK.IetfLanguageTag = caribLK.CultureName;
            caribLK.Save("ta-LK.ldml");
            caribLK.Register();

            CultureAndRegionInfoBuilder caribSG = new CultureAndRegionInfoBuilder("ta-SG", CultureAndRegionModifiers.None);
            RegionInfo riSG = new RegionInfo("en-SG");
            caribSG.LoadDataFromCultureInfo(ciTamil);
            caribSG.LoadDataFromRegionInfo(riSG);
            caribSG.CultureEnglishName = "Tamil (Singapore)";
            caribSG.CultureNativeName = "தமிழ் (சிங்கப்பூர்)";
            caribSG.CurrencyEnglishName = riSG.CurrencyEnglishName;
            caribSG.CurrencyNativeName = "வெள்ளி";
            caribSG.RegionNativeName = "சிங்கப்பூர்";
            caribSG.NumberFormat.CurrencySymbol = "$";
            caribSG.ThreeLetterWindowsLanguageName = "TAS";
            caribSG.IetfLanguageTag = caribSG.CultureName;
            caribSG.Save("ta-SG.ldml");
            caribSG.Register();
        }
    }
}

(As a by-the-way, if you look at the above code you will see the downside of TrueType fixed width fonts that Uniscribe will use font substitution on since the Tamil text screws up the fixed width!)

Note that the code not only registers the custom cultures on the machine; it also saves out the LDML file that can be delivered to other machines with that code from Sample code for getting ELK cultures on other platforms.

So, just take the code, save it to a file as tamil.cs, and then compile it from the command line with the following line of code:

csc /r:sysglobl.dll tamil.cs

And once you do that, the landscape in Regional and Language Options will change a little bit:

Those three new locales are there on the list!

(The asterisk is there to signify that these are custom locales)

Of course you can also do all of this with Microsoft Locale Builder too, but I thought I'd show how easy it is to do with a little bit of code, too, for those who are so inclined. :-)

Regarding those other questions, I'll take them up in a separate post, since they are more general typography issues....

 

This post brought to you by "இ" (U+0b87, a.k.a. TAMIL LETTER I)


# Phylylp on 19 May 2006 1:59 AM:

OK, slightly OT... but what are the languages you know, Mike?

# Michael S. Kaplan on 19 May 2006 5:46 AM:

Moi? :-)

Well, I know a little bit of Spanish, a small bit of Tamil, a middling amount of conversational Hebrew and a higher degree of biblical Hebrew, and a smattering of Japanese....

# Kris on 19 May 2006 12:35 PM:

Vannakam! How about adding Telugu to your repertoire? ;-)

# Michael S. Kaplan on 6 Feb 2008 4:20 PM:

Not knowing any native speakers makes it harder to contemplate...


referenced by

2010/07/18 Where's the other Urdu?

2008/09/09 Where is "Tamil - Sri Lanka" exactly?

go to newer or older post, or back to index or month or day