How many ways can a developer say 'File Not Found?' (aka Making your localizer's life easier, Part 1)

by Michael S. Kaplan, published on 2007/12/23 10:16 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/12/23/6843609.aspx

Conventional wisdom from those enlightened in the ways of writing localizable applications suggests that developers should do their best to avoid reusing strings in different contexts where a localizer might need to provide two different translations for the string due to their different contexts.

A great example of this is one I mentioned before in Microsoft Access, where the names for various properties in forms and reports is essentially duplicated in two different contexts (strings to be used in code versus the property sheet). The reason for the architected duplication is that in Japanese the full-width strings are required for the former (functional) case but is considered pretty ugly in the property sheets, where the half-width form is preferred.

Now despite the basic truth of this, there are times that duplication is plain and simple duplication.

And in some of those cases (and even in the above example) while duplication is okay, non-identical duplication is not.

For example all of the following strings were found in one of the projects for Windows (I do not recall which version offhand):

Now in fairness here some of them if tracked down might be for error codes in completely different technologies. But the simple fact that the actual strings vary so widely mean that not only does the base English product have to carry around the 11 ways to say the same thing but that localizers have the opportunity to carry the same inconsistency over without having the opportunity to use translation memories or even in some cases translation glossaries (if the strings are different enough).

Now note that this localizability tip is also a good usability tip around building a consistent user interface in your project that is either not yet localized or is never localized -- even if the strings were hard-coded in the source saying the same thing eleven different ways is not a great way to build the mose usable experience!

It is just in a software project's best interests to try to be smart enough about reuse that strings that are the same (whether or not they are expected to be used in the same contexts) use consistent, identical strings....

All of the characters in Unicode have taken off for Grand Cayman for the Christmas holiday weekend
(they are staying at the Marriott Grand Cayman Beach Hotel in case you are there and are curious at all the characters hanging out by the pool!)

It is a good idea to make a difference between "reuse" (or duplication) and "consistency."

Duplication is good, consistency is good, reuse is bad.

Duplication: to have the same string in several places.

Consistency: to have all instances be the same.

Reuse: to merge all identical strings into one and use that one (usually "to save money").

It is good to have the same string repeated. You need to say "Print" 50 times? Then have it 50 times! This gives the freedom to the translator to do what it's right for his language. Costs more? Maybe. But it will cost even more if you merge them (work), then you have a bug filed saying that titles and buttons need different translations.

The challenge is to keep strings separate and consistent in the same time.

Usually translated versions end up being more consistent, because are done by linguists, with access to the full set of strings, with proper tools, then edited with consistency as one of the point to check.

The English version is just bunch of strings put together by various programmers, some of them with English as a second language, usually without a spell-check, and over a long period of time.