French collation: When diacritical becomes diabolical

by Michael S. Kaplan, published on 2004/12/31 02:08 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2004/12/31/344739.aspx


A few months ago, I was talking with a customer who simply could not understand the sorting results she was seeing (in this case in a table in MS Word 2003). She distilled it down to a small repro; basically she took a small list of words:

word

meaning

cote

dimension

côte

coast

coté

with dimensions

côté

side

(at this point I knew both the language and what was causing her to have problems, and you may know too!)

What she noticed was that if she marked the left column as being French text (she tried several French choices, including France and Canada), the order was like this:

cote

côte

coté

côté

while if column was marked as being English then the results looked like this:

cote

coté

côte

côté

She could not understand any sorting rules that would explain the way that the words were sorting in the "French" table.

So I talked with her about Académie française and how they have a specific preference related to the way letters with diacritics are to be sorted (I also mentioned incidentally that I thought they had abolished the use of the circumflex in the early 1990's in many words but honestly I did not know if it applied to these two words that use the SMALL O WITH CIRCUMFLEX!).

The specific rule I was talking about here is that diacritics are evaluated from right to left rather than from left to right. Thus côte comes before coté, rather than after it as it does in languages like English that evaluate them from left to right. Because the word côte has no ACUTE on the "e" at the end of the word while coté does. In English and most other languages, the evaluation starts on the left and therefore the CIRCUMFLEX or lack thereof on the "o" is the controlling factor in ordering.

You can see it described in the French sort order exanple in Appendix D of the first edition of Developing International Software for Windows 95 and Windows NT.

This particular rule is interesting in that in all of the native French speakers with whom I have spoken, I never found anyone who could explain the rule to me. In their defense they were pretty much all aware that there were special rules used in dictionaries, but if you think about it there would seldom be a time that one could not find the word one wanted in a dictionary that used this rule. After living with the language for a lifetime, I am sure things like this are simply understood subconsciously when they occur. This phenomenon is common in almost all languages and they pretty much all have rules that native speakers understand even if the speakers cannot articulate the rules.

Another interesting factoid about language that can be seen here has to do with the fact that this use of "reverse diacritics" is seen in every French locale supported by Windows. It is fascinating to see the influence that the "mother country" of a language can sometimes have on changes that are made to other places where it is spoken.

When changes are made, whether by longtime organizations such as the Académie or by direct legislation, other countries will in many cases tend to pick up those changes. To me, the reasons behind such language reforms spreading this way are fascinating to contemplate. It is certainly not any kind of sovereignty or true languge "ownership" issue (and in future posts I may discuss specific cases in other languages where changes were at times intentionally not picked up!).

But I am at times amazed at the way that people will appear to see language as transcending the petty things. Its the kind of behavior that makes me interested in linguistic issues. :-)

 

This post brought to you by "Ô" and "É" (U+00d4, a.k.a. LATIN CAPITAL LETTER O WITH CIRCUMFLEX, and U+00c9, a.k.a. LATIN CAPITAL LETTER E WITH ACUTE)


# Sébastien Nussbaumer on 31 Dec 2004 1:00 AM:

Welcome to France :)
I didn't even know such order rules existed in my language ! BTW, the meanings of words 2 and 3 have been mixed up :
- "côte" means "coast"
- "coté" means "with dimensions" (in fact, it's used to tell that an object has had its dimensions set precisely. It's mainly used in industrial drawing when you design some piece, of an engine for example. The object is "coté" from the moment the designers has decided which are its dimensions)

# Serge Wautier on 31 Dec 2004 1:41 AM:

Michael,

Interesting. I'm a french speaker but I didn't know that rule ! Neither did I know about abolishment of the ^ ...

My feeling is that in this case (the 4 words you mention), sorting the french way seems phonetically better than the default sort: ô is phonetically much closer to o than é is to e. How this explanation would lead to the 'reverse diacritics' is beyond me though.

PS: Small translation bug :
coté = with dimensions
côte = coast

# David Brabant on 31 Dec 2004 2:39 AM:

Hi Michael,

Just a minor correction: the translations you provide for the French words are wrong. See here:
http://www.french-linguistics.co.uk/cgi-bin/frdict.pl?login=&password=&entry=cote
About sorting in French, I can talk about my own experience and what we (in my company) usually do for our own software: we just ignore accents. For most applications (beside dictionaries maybe), that's perfectly OK :-)

# Michael Kaplan on 31 Dec 2004 7:55 AM:

Thanks for helping with the translation bug -- I looked bsck at her mail and see it was completely my fault. :-)

I have found multiple mentions sbout the spelling reform, including http://www.tiscali.co.uk/reference/encyclopaedia/hutchinson/m0005887.html ....

# George on 31 Dec 2004 1:04 PM:

I don't think this diabolical, though. It is very weird, especially when people who speak French have never heard of the rule.

Michael S. Kaplan on 14 Feb 2008 10:35 AM:

Mod Mekkawi on 4 Jul 2008 8:45 AM:

#  re: French collation: When diacritical becomes diabolical

Thanks so much for this information, all new to me, including the elimination of the accent circumflex.

I agree w/ David said above, except for dictionaries and the most demanding academic needs, I would ignore accents when sorting words with diacritics.

BTW, anyone knows a way to sort in alpha order an html list where the lines are hyperlinked? -- Mod Mekkawi, Howard U, Washington, DC

Tom on 3 Sep 2011 6:00 AM:

Yes, these changes are awful for linguists and translators, we really have to battle to keep updated... www.alafrench.com for French translation services by the way.


referenced by

2006/11/17 Korean + French == ????

2005/08/10 Double compressions -- Hungarian goulash?

2005/07/20 More on sort elements

go to newer or older post, or back to index or month or day