How reasonable it is to translate something is directly proportional to the likelihood someone will see it

by Michael S. Kaplan, published on 2010/04/18 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2010/04/18/9997395.aspx


We all know that English is not the only language spoken in the world.

Even for software developers.

Okay, not all of us know that last part, I routinely talk to people who make certain assumptions about how much English a particular segment of customers might know.

Even the assumption that everyone in India making more than $2 a day knows English turns out to be untrue (I vividly recall the woman next to me on the plane during my last India trip whose language other than Tamil that she knew was French, not English.

So she did not know English, I did not know French (beyond counting and a couple of phrases that might have gotten me slapped). So I found myself struggling to communicate in the only language we came even close to sharing - Tamil.

If we had to spend a month on the plane, I'd be a fluent speaker of Tamil now. :-)

Anyway, my point? That not everyone speaks English.

When I bat about phrases like Extent of Localization and talk about Language Interface Packs those are simply compromises: if it didn't cost money or take hard-to-find expertise to accomplish, companies like Microsoft would localize every bleeding word of every product shipping everywhere in the freaking world.

The reason we don't, the reason no one does, is that it is expensive.

So we make those trade-offs.

But even the trade-offs are complicated.

I mean, when someone asks a question like:

Do we have any data that demonstrates the potential value to users v. cost of localizing error messages?

the intent is clear.

Someone is trying to determine how important it is to localize a certain bit of a software product.

Even the simplest question like this one simply spawns in my mind more questions, in order to give an honest opinion. Questions like:

  1. What are the target languages and markets in question? (we have anecdotal knowledge of places many developers prefer some other language, often English)
  2. Will the errors be seen mostly by end users or developers (kind of like #1 but trying to find out if we are putting the burden on those other developers)
  3. Does the product generally have errors occur as a very exceptional occurrence or will they tend to be common (obviously if they are rare the cost may not be worth the benefit)
  4. Are there tiers of error messages that would have different answers to questions 2 and 3, or even 1? (if certain problems are more common among certain markets or customer segments then the answers may be different for different customers)
  5. What has been done in the past with similar products doing the same thing?

Extend this to all the other segments of the user interface, factor in differences of the types of products and who uses them, and so on.

You would need extensive formal research studies to get real answers.

Of course the difficulty in terms of time, coast, and reliability of doing formal studies to get answers to these questions are in some cases insurmountable: the cost will easily be more than the market could ever contribute in revenue in decades, for just doing the studies, let alone the actual localization!

So the problem remains -- how to decide how much to localize if there is no good way to know how important it is to do so?

There are some good, easy rules of thumb that can help, though.

Like in general there is a bias in favor of localizing all top level UI. This plays neatly into answering the point #4 I raised while kind of giving reasonable context for #2 and #3.

Thus if we ignore the "market specific guesses" since they are largely anecdotal and even our best contacts in markets cannot help us since they speak English and thus often have no real frame of reference to compare the relative importance of issues to people who are so completely unlike themselves, we make the problem at least a little easier to frame.

Stated simply, the re-framed principle is easy enough to grok:

How reasonable it is to not localize something is inversely proportional to the likelihood of someone seeing that thing.

Or if you like the version I put in the title better, you can go with that, instead.

Because once again, in an ideal world we would be shipping a Babel fish and a Star Trekian Universal Translator to every single frigging customer in the world.

All these other conversations are about how to make sure that resources are invested sensibly enough that products do not cost more to make than they will later be able to earn back.

Answers.

Sigh.

It is so hard to formulate reasonable answers beyond that.

Well, I mean other than attacking the source a bit!

Attend me for a moment while I do this. :-)

For example, I tend to see some of the weirdest and most obscure and hard to understand error messages one could ever imagine.

They seem to contain English words but use unrecognizable jargon and sentence structures with which I am entirely unfamiliar.

I want to ask them to translate it into ENGLISH so I can understand it; the idea of trying to translate it to some other language where for most users the best one could hope for is the same experience but in their own language (be it French or Japanese or whatever).

I would say: fix the product to make it more useful in your own native tongue before foisting it in the rest of the world so that any time you put up text the user either:

And that is just for the original English where we still fail, long before we add other languages to the mix.

Which makes the answer to the original question easier: clean your own house first, then come asking about how to make the localization cheaper. :-)

But maybe a little less snarky than that, since usually the person asking the question and when it is being asked make the odds of the software being redesigned unlikely....

A study about my dissatisfaction at an error message I cannot understand even though it is my language, and how it compares with the dissatisfaction of someone who gets the error in a language they do not know at all, might even find that I am even less satisfied than the other person -- at least they can blame it on [a possibly otherwise good product] not being localized, whereas I have no one to blame but the original core product's poor usability! 

How expensive is to localize a product?

Well, it depends.

First answer me just one question:

How intuitive and understandable was the product, to start with?


# smors on 18 Apr 2010 1:22 PM:

For error messages that most likely will only be seen by developers (who are often capable of speaking english), such as stacktraces, there is another consideration that is important. That is the awesome power of Google.

I will occasionally get an error message translated into horrible danish. Plug that into Google (or Bing) and get absolute silence. If the software can get coerced to give the error in english, Google suddenly spits out an infinte amount of answers. This can be helped somewhat by including some kind of error code, which Oracles database is absolutely fantastic about, which is fortunate because their danish translation is horrible.

# Michael S. Kaplan on 18 Apr 2010 2:53 PM:

You are also assuming connectivity, I take it? :-)

# Michael S. Kaplan on 18 Apr 2010 5:33 PM:

Beyond that, in reference to translation quality issues, that is separate from the ideal (where I was assuming perfect translation!) and should be fixed by finding better localizers....

# smors on 18 Apr 2010 11:11 PM:

Yes, I am assuming that I will have some kind of connectivity when developing or debugging software :-) And the issue about being able to search for the text of an error message is very much a real concern.

As you said yourself, the number of times a message will be seen by anyone is an important consideration when deciding how much to invest in localizing that message. I develop mostly for the web nowadays, so the error messages I see are often from IIS, apache etc. When the project is done, those messages should be hidden from the endusers. In other words, the localized danish versions of error messages are mostly seen by developers, and the number of danish speaking developers is rather small.

# Michael S. Kaplan on 18 Apr 2010 11:44 PM:

Well one probably has to put on the "whole world" hat and "wide variety of products" scarf, not to mention the "cross section of a number of different customer demographics" gloves to assess it from my point of view when I was writing this....

Though I'm glad to know the situation is well understood for Danish IIS developers! ;-)

# Yuri Khan on 19 Apr 2010 1:05 AM:

I, too, prefer error messages in English, because of their inherent googlability. I also prefer English error messages for users who might ask me for help, for exactly the same reason; and English UI, because that’s what all the various howtos and guides are going to refer to.

# Chuck on 19 Apr 2010 10:50 AM:

有没有一个选择在这里的一些偏见,虽然意见,因为是英文的博客?

# Michael S. Kaplan on 19 Apr 2010 12:50 PM:

Hey Chuck - there is indeed some bias here!

# Mihai on 22 Apr 2010 9:39 AM:

Might be surprising, but sometimes the translated messages are more understandable than the original.

I assume this is happening when you have a good translator: he will try to understand the meaning, and make it better. Usually because he is a native speaker, a trained linguist, and a "regular, non-geek guy", and with the main focus on language. Then you have an native speaker editor. Then another native speaker making linguistic testing of the translated application.

Compare that to the typical developer who does "linguistic stuff" on the side (the main thing is the code), sometimes is not native speaker of English, and a geek at core.

# Mihai on 22 Apr 2010 9:43 AM:

One of my recommendations for error messages intended for developers ("Heap corruption detected after normal block" :-) is to replace them with a generic message and an error code ("Critical error 412. Please report, pry, and restart the application" or similar :-)

Saves money on localization, more readable by the regular user, and the number is easy to look-up by a non-native tech-support or developer.


go to newer or older post, or back to index or month or day