by Michael S. Kaplan, published on 2006/07/03 16:30 -04:00, original URI:

Back in the end of May, Dare posted some thoughts about the language of search in his post Our Multi-Lingual World and Search Indexes. It seems that he and the Windows Desktop Search folks have both come to the conclusion that they "...should consider simply choosing the indexer based on the locale/language of the Operating System."

We'll ignore for a minute how this is a very non-specific setting that will almost never be correct since the meaningful possble choices for a user are the user locale and the user UI language -- neither of which are based on an OS level setting.

Let's focus on the fact that this is hugely ignoring scenarios that are becoming more and more common -- such as those behind using the Windows Multilingual User Interface on a server to support many individual users, or the bigger and bigger number of multilingual users (evident even in the small sample of the nine comments to Dare's post!).

I am forced to wonder whether the solutions that Google and Live appear to be using involving the building of huge multilingual interfaces and perhaps throwing away the cases where no meaningful content is found are such a bad idea. I mean, it is obviously a brute force approach, but as long as computers seem to keep failing in MT then it is obvious that they can also fail in language guessing scenarios that are not brute force operations.

And I definitely would like to see fewer people relying on the locale/language of the OS....


Ruben on 3 Jul 2006 8:02 PM:

I think the problem Dare touched upon is quite wide spread. There's so much 'multi-lingual' or 'locale-sensitive' software around that feels so much like software designed by, say, English speakers that think they know how non-English speakers work. (I.e., like themselves.) Invariably, that's not how it works though. Which is a pain.

I think an extended visit to mainland Europe should clear that up.  (Large parts of Europe are somewhat bi-ligual as far as computing is concerned. Except France that is; the French do try to be as mono-ligual as possible.)

And a visit to South-East Asia should clear up all those preconceptions about how 'alphabets' work.

Michael S. Kaplan on 4 Jul 2006 1:56 AM:

I will go a step further than that, because in my opinion a lot of the so called "solutions" are inadequate even for the English speakers who have multilignual data....

Erik Fortune on 5 Jul 2006 4:44 AM:

I'll second that, heartily!

Nobody shoud care about system language except possibly the servicing engine and whoever reads the system event logs.

-- Erik

