On getting a quality Tamil translation. And what happens after....

by Michael S. Kaplan, published on 2010/06/23 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2010/06/23/10028788.aspx


About a week ago, I found out that one of the presentations they want me to do in Coimbatore at the World Classical Tamil Conference‎ is a keynote.

Quite an honor, I accepted....

I realized that I needed a new presentation for that.

Something shorter, with fewer slides.

With a few extra days in Chennai at my disposal, I talked to the hotel duty manager (who was always unfailingly polite and who I almost felt bad for not needing more since she was always so willing to be helpful) about whether there was someone I could talk to if I wanted a few of my slides translated to Tamil.

Really just two slides with a lot on them and a few with just a word or two.

Under 100 words in English, total....

She asked me to send the text and she would see what she could do (although a fluent Tamil speaker, she like many working at the hotel are actually from the north and as a group they speak Tamil quite well but their reading and writing is not a good).

There ended up being a two-part process.

First she got translations from a contact she had who admitted to being baffled by some of the content though did their best to do it.

Let me leave no doubt here; it was a quality translation.

Then, she introduced me to another duty manager who was a native Tamil speaker. He sat with me in my room for the two hours before checkout time and asked me questions and then the translation ended up being heavily reworked.

We used the Tamil version of the Microsoft Indic Language Input Tool, and although there were a few points where the transliteration he was most used to had to be adjusted (once adjustments were made it was -- with just 1-2 exceptions -- the first candidate on every word), in short order we had a much better quality translation based on an actual understanding of the content.

Let me leave no doubt here; it was a quality localization.

Something I learned: it was also really wonderful to see the Indic Language Input Tool able to be so quickly used by someone who had never either seen it nor heard of it. He was also able to quickly make adjustments as he saw the way it was generating candidates (I suspect he would have preferred to be able to modify the tool's scheme rather than modify his intuitive transliteration instincts as a native speaker, though he never complained about it!).

As someone who I suspect spends more time in other languages these days, it was a fascinating field test of the tool and definitely akin to a more complete and formal positive experience than my own with Bengali.

Something else I learned:

Tamil localized text takes up more space than German localized text.

We usually recommend that dialogs should not be overpacked and that for any technology that doesn't autosize the dialogs (like Win32) developers should leave room for expansion -- about 30-40%.

Tamil seems to need more in many cases.

Although I had heard the claim made once by a tester a few months ago, seeing it firsthand makes me want to look at some of the LIP content and try to get formal numbers on the percentages.

I don't know, maybe Tamil would be a great pilot language in some cases!

Now there are deeper lessons here related to the issues that both the original translator and the Le Meridien "localizer" had with trying to move the original text from English to Tamil that I will also talk about.

But that will have to wait for some future blog. For now I have to see what I can do to try to capture some of what I learned in the original presentation I'm doing (and the one for Unicode!), because I feel like I understand the viewpoint a lot better than I did a week ago.

A monh ago.

A year ago.

And ten years ago, when I first talked with Dr. Om Vikas at a Unicode Technical Committee meeting about what Tamil was looking for out of Unicode, and not really getting.

How little we all knew then; how much we can know tomorrow....


Jen on 23 Jun 2010 8:13 PM:

On the http://explore.live.com site for Tamil, we had to go through ALL SORTS of gymnastics to get the Tamil content to fit!  Kannada, Malayalam, and Gujarati were bears as well.  They all put German and Swedish to shame.

We're definitely going to be looking at Tamil very early from now on...

Michael S. Kaplan on 23 Jun 2010 10:29 PM:

Jen -- very cool! Now we just have to update all the docs for devs....

Do you have some translations handy I could get size data from? :-)

Mihai on 24 Jun 2010 9:54 AM:

In my opinion this search for size percentages is misguided, don't waste your time on it (unless you do it just for fun).

Designing dialogs, of web sites to fit everything will never work.

Languages with short strings (i.e. Chinese) will look horrible, while the longest languages will still not always fit (allowing a 50% extra for German will still not fit a 333% increase that you see in add -> hinzufügen)

Yes, I know there are guidelines that give you different percentages for different string lengths, but is cumbersome, and they are still statistical based.

The best option is to use auto-layout where available (html, wpf, winforms). For win32 dialogs follow some of good guidelines, but don't expect to eliminate all resizing for languages.

Michael S. Kaplan on 24 Jun 2010 4:57 PM:

I just like to have the figures to impress on people how their dialogs will have to change (if they leave some space it gives localizers better clues on where to make more space happen and bigger numbers can convince more easily than smaller ones!).


Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day