SIAO is still underwhelmed by search engines (all of them)

by Michael S. Kaplan, published on 2007/07/16 03:16 -04:00, original URI:

You may have read in Arial Unicode MS effectively [bites|sucks|blows] about how Microsoft MVP Omi Azad likes to point out bugs....

Regular reader Cristian Secară likes to do the same kind of thing. :-)

Just the other day, he sent the following to me:

With Google I'm not sure what to believe: just the other day Google was able to find words written with cedillas when searching them in pure ASCII (and vice versa), but today this no longer appears to work (maybe today they run an emergency server :). But as far as words with commas are involved, both and Google will only find them when spelling exactly.

Also fails to recognize the Romanian „ and ” curly quotation marks, but works as expected with the English “ and ” curly quotation marks. At the same time, Google works well with all of them.

I tend to do a lot of comparison/contrast type searches between search engines using various international features, so I know exactly what Cristi is talking about here. Google is constantly tweaking things (Windows Live search does too, but the changes that I notice seem to be more batched up and episodic).

I do find it disappointing that no one is doing a better job with the cedilla/comma below thing, though I have higher hopes for the Live Desktop Search since they have actually asked me about Romanian in the pat (I suspect they will do better on Vista than on prior versions, for the obvious reason that this is when the underlying updates were made!).

For what it is worth, I suspect that Live Desktop search will also do better with the Romanian quotes. :-)

Which is not to say that web search shouldn't do better here.

It should. For all of the various search engines.

Though until they start jumping through those Unicode normalization hoops, I suspect the extra language features may have to wait in line....


This post brought to you by (U+2e9e, a.k.a. CJK RADICAL DEATH)

no comments

go to newer or older post, or back to index or month or day