If you have to ask "How do I say ____ in ____ language?" then maybe you shouldn't!

by Michael S. Kaplan, published on 2012/02/07 07:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2012/02/07/10264760.aspx


I am not a localizer..

I have a little bit of knowledge in ssveral different languages, but I don't know any well enough to be able to a proper job taking the words of one language and expressing them well in another.

However, due to what I know of a bunch of different topics, localizes will sometimes contact me tp find out what a cetain term means -- so they can take that knowledge and try to find out the right term to use in the target language.

My 'favorite" question of this type is one I get from time to time, asking about UNICODE CHARACTER NAMES.

We localize some of the strangest things in Windows, and this one is the really the weirdest.

So many of the names are actually transliterations of letter names into English with regular English terms used throughout them, so they don't always make a ton of sense in translation....

But let's take a look at this, a look at the Unicode character names in Windows Character Map:

Thus, U+0ddd, aka

aka

Sinhala Vowel Sign Kombuva Haa Diga Aela-Pilla

might be

VOYELLE DIACRITIQUE SINGHALESE KOMBOUVQA HAA DIGA ÆLA-PILLA

 in French, or

Sinhala-Vokal Komuva Haa Diga Aela-Pilla

in German, or

Sinhálský Zmak pro samoshlásku kombura há diga aela-pilla

in Czach, or

Sinhala-Vokal Kombyva Haa Diga Aela-Pilla

in German, or

Κεϋλανικό σημείο φωνήεντος Kombuva Haa Aela-Pilla

in Greek, or

Signo de vocal sinhala kombuva haa diga aela-pilla

in Spanish, or

Sinhala, segno di vocale Kombuva Haa Diga Aela Pilla

in Italian, or

Sinhala klinkerteken Kombuva Haa Duga Aela-Pilla

in Dutch, or

Sinal de vogal sinhala kombuva haa dga aela-pilla

in Brazilian Portuguese, or

Sinhala - Sinal de vogal Kombuva Haa Diga Aela-Pilla

in European Portuguese, and so on.

I even found that Simplified Chinese dared to go where even Japanese and Korean feared to tread:

And Turkish, which decided to stay out of Sinhala, did choose to muck around a bit with the characters it used itself:

Okay, I guess a see a model here -- many of the languages that will localize all character names leave the transliteration piece alone, while still translating the terms that may make more sense.

Actually, it's kind of a shame that e.g. localizers for the Sinhala LIP can't follow the Turkish model here and translate its own letter names, the ones that the Unicode standard is transliterating.

It would just be a few characters for each language -- perhaps they could take the exemplar characters in their language?

That would be really cool!

I think I'll go talk to the EOL people -- maybe its too late for this version, maybe not. The cost of just a few strings for these languages isn't huge, and it would be really cool to go this route.

I'll even offer to provide ranges they can localize!

Why limit it to just the big boys? :-)


Richard Deeming on 7 Feb 2012 9:59 AM:

But what would it be in English? :o)

John Cowan on 7 Feb 2012 10:08 AM:

Parturient montes nascetur ridiculus mus.  Which localizes to "The mountains labored, and gave birth to a ridiculous mouse."

Really, localizing Unicode character identifiers is about like localizing "snprintf" to "snimprimerf" in French.

Simon Buchan on 7 Feb 2012 2:07 PM:

I find it curious the French translators decided to uppercase it. I wonder if they are using another source for the translation/transliteration of the names?

Daniel Cheng on 7 Feb 2012 5:32 PM:

The Chinese version just means "Sinhala Vowel Sign", it did not try the transliteration part.

@John Cowan on 8 Feb 2012 6:06 AM:

>> Really, localizing Unicode character identifiers is about like localizing "snprintf" to "snimprimerf" in French.

Which incidentally was what Excel 5.0 (IIRC) did in its version of VBA.

Having (in Italian) to type "PER I=0 A 5" or "SE A<0 ALLORA" instead common FOR and IF statements was really weird.

Random832 on 8 Feb 2012 11:52 AM:

Of course - why not fix the blatant errors (you've blogged some of these in the past, I see no need to list them here) that some unicode character names have in English? There's no particular reason that the string returned by an undocumented API, only used for interactive display (and search) in one system utility, must be the same as the Unicode Character Name which is constrained by various other concerns to remain the same forever.

Michael S. Kaplan on 8 Feb 2012 5:03 PM:

I would hardly equate sprintf to the user-facing strings in CHARACTER MAP. To Unicode, they may be identifiers, but not how we use them!

Michael S. Kaplan on 8 Feb 2012 5:10 PM:

I suspect the reason French is UPPERCASE is due to them picking up the official French character names from Unicode....

cron22 on 18 Feb 2012 3:44 PM:

Very interesting.  I mean, sometimes transliteration's just not possible.  


Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2012/03/02 The evolving Story of Locale Support, part 20: Yes, it's Bangla. Not Bengali!

go to newer or older post, or back to index or month or day