Silly money equivalency games work both ways (aka Making your localizer's life easier, Part 3)

by Michael S. Kaplan, published on 2010/02/28 07:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2010/02/28/9970358.aspx


As series go, this one is not happening nearly as fast as I might have liked.

The first one, How many ways can a developer say 'File Not Found?' (aka Making your localizer's life easier, Part 1), happened in the end of December, 2007.

The follow-up, We're back and we're embarrassing ourselves? (aka Making your localizer's life easier, Part 2), didn't get made public until the end of February, 2008.

Now we are to part three, and it took two years to get there.

This seems like too slow of a place for a regular series. I'll see what I can do about that....

Anyway, if memory serves it was Shakespeare who said

Brevity is the Soul of Wit.

Now in modern times we have such busy lives that we often take a specific subset of the meaning here, along the lines of

Don't Waste My Time.

and when a software developer finds themselves doing the same thing over and over again, they feel like they are doing something wrong, something inefficient.

Like if they are putting a word, like say Music into several places in the user interface, it offends some sense of developer tidiness to have the word repeated over and over, perhaps in separate binaries, loaded over and over from these different places.

If they stumbled across part 1 of this very series they might ask themselves How many ways can a developer say 'File Not Found?' and feel silly saying a simple word over and over again, the same way. The word is the same every time - Music. What could be simpler than that?

They may even be thinking about all those reminders of the cost per word per language to localize and think they might be saving Microsoft a few dollars if they just have the resource once and loaded into those various places.

Occasionally, a true geek calculate it. You know, take their salary, divide it into the time spent doing this little exercise, and compare it with that per word per language cost to figure out if they literally saved Microsoft some money that afternoon.

On a Friday? I could totally see that happening.

Of course there is a small problem here. a problem with this "improvement" the developer has figured into the user interface with the saving all the repeats of the word Music.

The problem is that the developer is dead wrong.

Regular reader Mihai actually spoke to this issue a bit in a comment to that very first blog in the series:

It is a good idea to make a difference between "reuse" (or duplication) and "consistency."

Duplication is good, consistency is good, reuse is bad.

Duplication: to have the same string in several places.

Consistency: to have all instances be the same.

Reuse: to merge all identical strings into one and use that one (usually "to save money").

It is good to have the same string repeated. You need to say "Print" 50 times? Then have it 50 times! This gives the freedom to the translator to do what it's right for his language. Costs more? Maybe. But it will cost even more if you merge them (work), then you have a bug filed saying that titles and buttons need different translations.

The challenge is to keep strings separate and consistent in the same time.

Usually translated versions end up being more consistent, because are done by linguists, with access to the full set of strings, with proper tools, then edited with consistency as one of the point to check.

The English version is just bunch of strings put together by various programmers, some of them with English as a second language, usually without a spell-check, and over a long period of time.

and it is a valid point.

A fix for even that problem with the zillion ways to say "File not found" can easily be to just make sure the same words are used for all of them. How is a user well served by needing to look at so many ways of saying the same thing?

and getting back to our Music example for a moment.

And let's look at the Croatian language pack for Windows 7:

We'll ignore the fact that the word glazbe isn't capitalized; the problem here is that it should be glazba, or actually Glazba in this case.

There are actually other parts of the user interface where Glazbe, or even glazbe, might be appropriate.

Oh, and also that radnu površinu in the dialog above should probably be radna površin in this case, and radna površina in others.

You get the point.

Now would you like to guess how many times of these strings that are the same appear in the resources?

Once each.

Now I have talked about the capitalization issue and the need to let localization be flexible about it before so I won't harp in it now.

But the fact that there is the need to express the same word in different ways in some languages even if not in English? That I will harp on here.

I will however ask you to recall that geek who figured out that he saved the company a few bucks by sharing that string....

If you factor in the PR hit when people started complaining about the bad grammar that makes the product look bad (which they did because it did), the cost to investigate the cause, fix it, test the fix, then localize it properly (in other languages too since those are new strings to be added now), if that developer gave back his equivalent salary for a week the score might still not be even.

Not that he should be charged; it is just that silly money equivalency games work both ways!

Of course the original notion of wanting to avoid the extra words to get translated has some merit, it is just that the developer does not have the context to understand all of the cases where those strings that are the same strings may not really be the same strings. And no matter how costly it is to translate the same word, it is much more costly to have to fix this kind of problem later when it happens....


Jon on 28 Feb 2010 7:54 AM:

A little knowledge is a dangerous thing. If you have no knowledge of the waste of space that duplicated strings causes then you won't cause a problem. If you have a lot of knowledge about string duplication and language issues then you know not to cause a problem. Having a little bit of knowledge is dangerous. This is where managers and mentors and more experienced developers should have come in and stamped down on the naive developers "brilliant" idea. And this developer won't have been the first, nor the last to have a "brilliant" idea. These bright young things need to be controlled to ensure that they don't wreck havoc.

John Cowan on 28 Feb 2010 10:57 AM:

Same saying, generalized version:

DON'T BE A PIERIANSIPIST.

Josip Medved on 28 Feb 2010 11:07 AM:

"Radna površin" is never a good choice in Croatian. It is root of word that doesn't have meaning. :)

I strongly support everything else you said. :)

Michael S. Kaplan on 28 Feb 2010 4:03 PM:

John, my objection to the word "pieriansipist" is that it outside of the vocabulary of those to whom it would best apply! :-)

Mihai on 28 Feb 2010 4:49 PM:

It can be worse. There is at least one library (very popular, but I will not name it), where the English string is also the string identifier.

No duplication "by design". Also defective by design.

And it is incredible how much effort it takes to convince someone that it's bad design (I mean, how can such a popular library, used in thousands of applications and tens of languages, be wrong?)

Random User 43792 on 4 Mar 2010 9:05 AM:

The obvious solution is to (somehow) require developers to be fluent in all languages, including historical and yet-to-be-invented dialects. Maybe then they would know what strings are truly the same. :-)

On a more serious note: does that potentially make automated string pooling one of the most ill-advised "optimizations" in a language? (I suppose maybe not as long as it sticks to constant, non-resource strings. Those would be hard/impossible to localize anyway.)


go to newer or older post, or back to index or month or day