à ≠ a (unless à = a)

by Michael S. Kaplan, published on 2007/02/25 04:21 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/02/25/1756443.aspx


The mail I got read:

Hi

I'm Laurent Gébeau , FRench MVP on Windows. I met you at the last MVP summit in seattle, and learned a lot about localisations. Since that I love to read you blog , very instructive.

I'm actually using Vista on a daily basis, and got one issue : I'm not a very good man with ortograph, so I use to write words withouts accenst (é, è à ç ...).

And I can see that in Vista if I look for ete or ETE, it won't find été.

Another exemple (and that's the one which make me discover it : in control panel if I type reseau i don't find anything (should have been réseau).

Is it really expected (by design) or not really expected ? In my point of view I find this very bad .

I ve bugged it in beta of vista : # is 215797.

Thanks

Laurent
 

This is for the most part completely by design, since there is a very elemental truth that

à != a

unless the comparison is done while ignoring diacritics, and sometimes not even then.

It is a challenge to balance on Windows the needs of users who want these distinctions to not be present and the ones who do (some in the latter group can even get quite offended by the abuse of language that ignoring the distinction appears to be, to them!).

One could argue that in search it makes sense to have the ability to ignore these distinctions as an option, irregardless of whether it is allowed in ordering or non-search identity questions. And indeed this is a very sensible idea for a search algorithm to consider, as an option. It is available in some applications such as Microsoft Word, and obviously it can be used programmatically in both managed and unmanaged code if you call the real collation functions from the NLS API or the SystemGlobalization namespace....

 

This post brought to you by à (U+00e0, a.k.a. LATIN SMALL LETTER A WITH GRAVE)


# Kemp on 25 Feb 2007 9:24 AM:

Irregardless? ;-)

In regards the post, I would personally say that diacritics matter, being there because they are actually different letters, except for in the case of searching where letting people be lazy can be a very nice thing to do.

Also, my first post on this blog despite reading it for a looong time now :-)

# Zebra on 25 Feb 2007 4:36 PM:

In a way, I think it's better that Vista doesn't find été or marché... I like a simple system than something that does too many things automatically that in the end we don't want. It is often very annoying to change a set-up that is automatically doing something that we don't want. Let's keep it simple, user-friendly!

# LC on 26 Feb 2007 12:46 AM:

When I am searching for "F" I am so annoyed that Windows doesn't return matches for "E". After all E is just an F with an extra line ;-)

# Wilhelm Svenselius on 26 Feb 2007 1:52 AM:

I think the current behaviour is rational and sound. I live in Sweden, so we have these odd characters 'å', 'ä', and 'ö'. Yet I don't know a single person who would ever replace them with 'a' and 'o', because those are considered ENTIRELY DIFFERENT LETTERS. This is even reflected in the sort order, where 'åäö' comes at the very end of the alphabet.

Furthermore, some Swedish words actually turn into completely different ones, with a different meaning, if you substitute 'åä' för 'a' and/or 'ö' för 'o'. So it would definitely be a bug if Windows/.NET allowed you to mix these up.

# Mike on 26 Feb 2007 3:08 AM:

While native users of a language might not consider replacing them, other language users may be trying to locate text with diacritics, especially using keyboards that they are not familiar with, or simply being unaware of how to compose with dead keys.

(I was at a hotel in the north of Sweden earlier this month, and watched a number of tourists struggling to use the lobby computer because they had never stuck a keyboard requiring an Alt-Gr key before, and couldn't get to the @ symbol).

In addition, these words may get borrowed into English (say) without the diacritics and it's very useful to have the option to be diacritic-blind in the search for complete coverage. While my French is very imperfect I actually use a web-search on the unaccented form of a word to locate how the accents should properly appear. I can't do that with rigid searches.

Consider also that Microsoft software applications may be auto-correcting your text to accented/diacritic form even though you may not have a clue how to apply those manually. Two examples would be 1) Word which has for years done cafe->café; and 2) the Windows Media Player library function which may overwrite your MP3 file names and metadata with (the true) accented forms of song-titles. As it stands you cannot locate these files anymore with the accent-free search.

So "simple" for one language user, is not for others.

# Michael S. Kaplan on 26 Feb 2007 3:14 AM:

Very good points, all -- stay tuned for some further thought on this matter, which should be posted soon. :-)

# Mike on 26 Feb 2007 3:21 AM:

I should also add - from much recent experience booking hotels around Northern and Eastern Europe - that URLs often remove all the diacritics, so if I was searching for all instances of the name of a place, I would not find both the accented form and the "internet form" in a bookmark/favourite or list of web resources.

To pick a Swedish example, the website for the island of Djurgården is www.djurgarden.net

# Laurent Gébeau on 17 Mar 2007 8:32 PM:

hi

Thanks for publishing !!

This have been corrected : in control panel if I type reseau i find results

BUT

the results are not the same as if I type réseau !!! and different from "network" too

That's sounds very strange.

I also have to tell that I really loves your previous presentatuon in 2005, I was very frustrated not to continue to talk and exchange with your team this year :-(

# Michael S. Kaplan on 18 Mar 2007 2:23 AM:

Ah, I would say next year let me know -- we had a day long presentation that we did with one MVP and all the various teams in the group, and if others were interested I am sure we could accomodate. :-)


referenced by

2007/02/26 The search for someone who does Search correctly

go to newer or older post, or back to index or month or day