'Where is the dictionary?' you might ask

by Michael S. Kaplan, published on 2007/01/14 06:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/01/14/1464038.aspx


Jonas Beckeman asks, both in the Suggestion Box and in the MSDN Forums (thread here):

--Common literals dictionary--

I've been told that there's no place in Windows where common literals are stored (e.g. localized strings for "OK", "Cancel", "Copy" etc), and that they're all stored separately for each app. I simply cannot believe that's true - and if it is, what would be the reason?

http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1103467&SiteID=1

Well, despite in-thread claims of "nobugz" to the contrary, I am not even A MSFT localization expert, let alone THE MSFT localization expert. :-)

But I'll give answering the question a shot....

There is no common runtime dictionary of localized terms in Windows.

There are plenty of glossaries of various Microsoft products that are made available to software development houses through MSDN memberships, and obviously inside of Microsoft those same glossaries and probably more are available to localization vendors to help provide continuity in terminology spanning versions. For people external to Microsoft, it is not  completely free, but it costs significantly less than the cost of creating the resources....

But in the end these are just helpful hints to localizers, none of which replace the importance of having localizers on projects, to decide which terms they want to accept from tools such as glossaries and which ones they want to reject, based on all of the factors that affect localization decisions.

Given the fact that good localization is a design time task and not a simple runtime task, what purpose would be served by serving the needs of bad localization efforts by making it easy for applications to do their localization through some sort of runtime auto-translation layer of individual terms in a dictionary?

Language is simply not that easy!

 

This post brought to you by (U+104c, a.k.a. MYANMAR SYMBOL LOCATIVE)


# Nick Lamb on 14 Jan 2007 7:56 AM:

Of course you can't solve the translation problem with a mindless function call, translate("More",lang) isn't going to achieve what the naive programmer hopes -- but are all these translations really duplicated? Is there a separate transalation for each of the identical 'Cancel' buttons and 'File' menus in the entire OS, never mind 3rd party software? ie is the French translation of "Close" [as in dialog box] into "Fermer" repeated dozens or hundreds of times in different places?

That would seem like an almost equally big mistake in the opposite direction. Granted Windows doesn't seem to provide very good stock resources anyway, but for what there are it seems like built-in translations would be more than a nice-to-have.

One of the better pieces of thinking coming out of fd.o was their response to the realistation that existing "icon theme" architectures harmed re-use. An API that gets you a stock "Home" icon isn't that useful to developers when it might be a spider's web or a slice of apple pie. They realised that you needed to standardise not just the icon names, but the actual iconography. So "Home" is "a house". In the high contrast theme for visual disabilities it's probably a very simple two color outline, whereas in a kid's theme it might resemble Barbie's dream home, but it's always a house. Suddenly it's worth actually using the stock icons API in your application, not just as a way to drop something in for mockups. Finding ways to maximise re-use to the benefit of the end user experience, that's the name of the game.

If you want an example of a system which really has tried providing a "base dictionary" for each language with translations of common terms, you should be able to find a LiveCD of Zeta, the BeOS derivative developed by the now defunct yellowTAB. One cute thing in there (which doesn't make up for the dictionary) is that users can easily change UI language on the fly. Mostly a party trick, but a good one.

# Michael S. Kaplan on 14 Jan 2007 8:05 AM:

Well, lots of the resources are shared (like all menus across Office, default right-click menus in EDIT/RICHEDIT, and so on). And like I said, there is a design-time sharing that is significant in any large software house (plus between them since Microsoft shares its glossaries).

The total amount of sharing that does happen is non-trivial. There is little actual benefit to dynamic sharing (in fact there is a hit due to getting resources from multiple locations which implies multiple things to load!).

# Michael Friedman on 14 Jan 2007 9:59 AM:

I think you're missing a point here...

Yes, languages are mostly not that simple... but in some cases they are.

I think you would be hard pressed to find a language in which certain standard menu options and button texts could not have standard translations.  I'm thinking of things like "New Window", "Open", "Save", "Save As".  That's because their semantic meaning is the same in every application.

Now, that puts a burden on the developer... you had better not use the common "Open" string if you mean anything other than "Open a file".  Don't use it for "Open a Folder" or "Open a Box".  But if you do have these standardized strings then you are making a big stride forward in standardizing application look and feel across multiple languages.  And that adds a lot of value.

# Michael S. Kaplan on 14 Jan 2007 12:50 PM:

I think you are missing a point of mine. :-)

Those items ARE shared for LOCALIZERS, not for DEVELOPERS. Which is who they should be shared for? :-)

# Mihai on 14 Jan 2007 4:32 PM:

Just to show how tricky it is, all your examples are wrong:

- All the other examples ("Print", "Close", "Open", "Save", etc.) cannot be shared. Cause: in most Latin languages a command (button, menu) is translated differently that a description (title, label). So in French it will be "Imprimer" as command, but "Impression" as description (think "Do Print!" vs "Print")

- Even if you know is a button, is not enough. The "New" button can depend what is the new thing created (new file, new document, new account). The translation might depend on the gender/number/case of the thing created. Sometimes it cannot be translated as "New", but the translator will have to add "New X" to the button.

- The shared "Home" icon as a house might no be the right thing. In some languages "home" might be translated differently (like "base" or "origin" or whatever)

And these kind of decisions are NOT developer decisions (as Michael pointed out).

And there is no economy, believe me! Translation companies use CAT (Computed Aided Translation) tools, with TM (Translation Memory). The translator gets suggestions for 100% match (or even for fuzzy matches), and he gets to decide.

If you dig a bit, you will learn how that works, and ask for a discount on repetitions. Don't expect to pay nothing for repetitions, because the translator has to check it and accept it, even if it is 100% match.

# Mihai on 14 Jan 2007 4:38 PM:

Ok, now on the other side: I can see where some of the request point: standard dialogs/message.

With Windows so friendly to all the languages, people can run Japanese application on Arabic machines and edit Russian documents.

And most of the time looks really bad to have a localized application with English buttons in MessageBox or in standard Open/Save/Color selection/Print. Pretty much all the stuff in comctl32.dll and comdlg32.dll

# roxfan on 14 Jan 2007 4:43 PM:

Microsoft used to make full localization glossaries (with ALL strings for ALL programs) for numerous languages available on their FTP, but now they're gone. What's available now is a shorter glossary of terms:

http://www.microsoft.com/globaldev/tools/MILSGlossary.mspx

Apparently the full glossaries are still available to MSDN subscribers.

# Michael S. Kaplan on 14 Jan 2007 5:24 PM:

They are (though perhaps "all" might be a slight overstatement, then and now!). :-)

# GregM on 16 Jan 2007 9:52 AM:

What would be really useful is to be able to find the strings used for things like MessageBox(), GetOpenFileName(), PrintDlg(), etc, so that when we have to replace them to add something that they can't do, they'll have the same text as used by the OS.  This provides a more consistent experience for the user.  We can already get the "standard icons" for MessageBox(), why not the standard strings?

# Michiel on 16 Jan 2007 4:12 PM:

The one case that would be useful is for standard buttons. Non-english speakers are probably all too familiar with the infamous MessageBox'es where the question language and button language do not match.

The alternative is probably documenting MessageBoxEx.

# Jonas Beckeman on 6 Feb 2007 8:54 PM:

Thanks for taking the time to answer my question.

* I think it's a moot point that a dictionary could be incorrectly used by careless developers. Any company who cares about their product would verify and correct any mistakes before going gold, but the dictionary would be a good starting point. And for less well-funded indie programs - well, *some* kind of localization, albeit flaky, would be far better than losing every non-english-speaking user.

* There are lots and lots of literals that are 100% consistent across applications, at least in the languages I've used. And often, having access to system messages/literals would be very useful. What is the Network Connections item on the Start menu called? What's the Start menu called?

* A dictionary can be more intelligent than just a word->word lookup. Context can be provided. Either the context is as simple as "button", "title", or the common menu XPaths "File | Save as..." (would take care of the vast majority of problems), but it doesn't take much work to create a simple, context sensitive, grammatical system that can handle the most common casus, gender, plural etc forms for the simple literals I need.

Just use localizers, you say. Hm, I thought we were all in the business of automating repetitious tasks? ...computers, you know?

# Michael S. Kaplan on 6 Feb 2007 9:21 PM:

Jonas,

Microsoft already does all this, for IT'S products, at DESIGN time, which is faster than runtime loading of an extra outside resource. Your request is to ask every one of those prfoducts to reimplement their working code to load strings from a new place, in order to assist everyone else outside of Microsoft.

That is a feature request, and it is hardly a clearcut case of bad design if Microsoft chooses not to do this (since the bug is fixed from their point of view, already?).

# Jonas Beckeman on 7 Feb 2007 4:11 AM:

I've understood the situation, so I'm not really asking for anything; I just want to point out what's wrong here. From what I've seen of how things work at MS, I can imagine there's been no active decision in this matter to *not* make a system dictionary, things have just rolled along.

I'm an open-source kind of guy, and I don't hesitate to start projects to fix things that the big companies did wrong, or missed, but in this case there's nothing anyone can do except Microsoft.

You've chosen to not make basic translation easy for minor developers (or even larger companies who target all available *or future* languages), and this may be a correct decision business-wise, but I resent the idea of bad design being favored over good design for economic reasons.

(BTW, loading external resources would be a performance problem - you've got to be kidding. This would be neglectable if done even half right)

# Michael S. Kaplan on 7 Feb 2007 5:30 PM:

Currently some resources have to be loaded, this has some minor performance hit. If a second location must also be initiliazed then that hit is doubled. It is not unreasonable to point out that this slows things down by some unknown amount and there should be discernable benefit to justify changing everyone's code across all of the various technologies.

There is no justification for such a change.

As for being kidding, when I see applications like Outlook load hundreds of DLLs, I wish MORE people were like me feling that this is not a good idea, and less people were claiming that they should just load another resource.

Strings ARE shared across Microsoft apps, so Microsoft already has the benefits here that you are talking about. We also provide our glossaries to developers who wish to do the same. It is not bad design to do it this way; it is simply a different design than the one you are championing.

I did not claim that your way is bad design; I claimed that providing a dictionary for such things can make it easier for people to design a localized app badly. The current method requires a bit more effort and review which is more likely to not lead to those developers who would otherwise take such shortcuts.

Essentially, beyond being an "open source kind of guy", you also want a specific architectural choice as well, and if Microsoft does not choose to do it this way then all that they do share with developers is insufficient. This is not an invalid position to take, but please understand that not everyone agrees that your architecture is the best one for all apps, and thus not everyone would use it (beyond the fact that no one would simply change for the hell of it if the app works today!).

Finally, although some of the situation grew this way organically, other parts were and are conscious architectual decisions. You don't have to believe me when I say this, but I'll just point out that some things are true whether you believe them or not. :-)

# Jonas Beckeman on 7 Feb 2007 6:52 PM:

The problem with developer resources is that you need to have all of the languages accounted for at *develop* time. With a common dictionary, a lot (enough, in many cases) of the literals would come for free at whatever language the end user's OS is running in!

My app would be reasonably usable for a chinese guy who doesn't speak any english - and I wouldn't even have to think about it! Then, if I see a real market for that language, I would take the time to do a proper translation. With the current situation, those people probably wouldn't be able to use my program at all.

The dll for handling a simple dictionary wouldn't be more than a few kB, so loading that wouldn't be a problem. The entire dictionary would, of course, *not* be loaded all at once, but literals would be fetched when needed.

And there's no reason to not think about changes only because what has been done in the past. Why would all existing apps need rewriting just because a dictionary is added?

If companies sell bad apps, then the ordinary market rules will make themselves known (unless the company has a monopoly;)). The app will sell less, customers will complain - either they fix it or someone else will take their place. That's their problem, not yours.

It's never too late to start implementing such a feature. It would've been nice if it was included in Vista, but at least there's the autoupdate functionality. Each language would weigh in at less than 100kB, and that's a lot of literals when compressed.

Just give it a serious thought. Think of it like this: less Windows apps (yes, no common dictionary leads to this) for the chinese market creates an opportunity for other OSs that may implement such a feature. The ratio of available apps for that market will be less charitable in Window's favor, which leads to greater market shares for other OSs.

# Michael S. Kaplan on 7 Feb 2007 7:00 PM:

I guess we will just have to agree to disagree, since each of us is sure that (a) the other is wrong, (b) the other does not understand what they are being told, and (c) the other is not going to change their mind based on anything they are told.

Each of us is sure they are correct, so perhaps it is time to move on now -- you have your answer to your original question (if you want Microsoft's "dictionary" then get the glossaries as that is all there is), so further conversation is not going to accomplish anything....

# Jonas Beckeman on 7 Feb 2007 7:51 PM:

I partly agree. I still think you haven't presented any valid argument for why it shouldn't be done. The only reason I can genuinely understand is that you don't want to lose some of the revenue for MSDN subscriptions.

I think it's very short-sighted to focus on MS apps. You're in for a great battle in countries like China, and there it doesn't come down to your apps, it's about your OS.

But hey, even though I really like some of MS' stuff (.NET, XNA, WPF is OKish), most of those things are being mono:ed so I'm soon going to fine with putting my time and effort into another platform.

So long

# Michael S. Kaplan on 7 Feb 2007 10:27 PM:

Sigh.... desire to not encourage bad design, lack of desire to suggest pedople rearchitect existing solutions, the fact that it is entirely out of sthe scope of what Microsoft was providing, the fact that almost no apps can use the exact strings unmodified for their app without actually causing more confusion than they create, the fact that it is just plain bas design to do a "fake localization: job by having a few random strings haphazardly translated in this way rather than a planned professional localization job....

(pauses to take a breath)

Never mind, this is still going nowhere. It's done.


go to newer or older post, or back to index or month or day