by Michael S. Kaplan, published on 2006/02/12 18:05 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/02/12/530610.aspx
I pointed out in the post Some sort of order to collation that it is easy to dismiss linguistic issues when one is thinking about collation. As Steven Pinker pointed out in The Language Instinct:
...for the same reason that alphbetical order is similar across the Hebrew, Greek, Roman, and Cyrillic alphabets. There is nothing special about alphabetical order; it was just the order that the Caananites invented, and all Western alphbets came from theirs.
But it is not as simple as that. Looking at the beginnings of the alphabets for Hebrew:
vs. Greek:
vs. Russian:
vs. English:
and so on, we are as struck by the similarities (e.g. in Hebrew there is actually both a בּ (bet) and a ב (vet) that show up after the Alef, just as there is both a Бб (be) and a Вв (ve) after the А in Russian) as by the differences (e.g. there are two 'v' sounds in Hebrew, neither of which are anywhere near the 'v' in English -- or that in Hebrew א (alef) is silent while in most other languages it is not).
Obviously there is a commonality here that is not accidental, but just as obviously the actual letters present and the order of those letters has changed over time in different languages.
There are many possible reasons for change here, and looking at the differences from the original order from the Caananites gave us and any language today, many of the reasons for changes in order have either an orthographic or a phonemic basis.
Which brings us to something of a linguistic basis, doesn't it? :-)
Now this is especially true as we look at languages that pick up the use of a script such as Latin, Cyrillic, or Arabic and find the need to add letters. Because obviously they need a place to put those additional letters within their alphabetic order, and there are obvious reasons to choose a linguistic basis for that ordering.
Now this ordering may conflict with what a user of the script but not of the language may have for a letter -- thus ڇ (tcheheh) will seem to many Arabic language readers like a ح (hah) with four funny dots in it, similar to how I (as a speaker of English) might look at ṻ (u with macron and diaresis) as a u with some funny smudges on top of it.
Am I wrong? Certainly.
Is that Arabic reader? Yes.
But in the context of both the English and Arabic languages, we are both 100% correct.
And while the decision of where I would place them in an ordered list will likely be after ح and u,on the arbitrary basis they look a bit like them, it is not really going to be the same for languages that might make use of the characters.
Where they might be placed in the alphabetical order of a language that makes use of ڇ or ṻ is likely be very different. Since our answer was on the basis of ignorance of what the letter is, it would only make sense that their knowledge of the letter and what it does will guide their notion of where it belongs alphabetically.
This is an issue that I will be posting about in the future, with some more specific examples, giving both the "ignorant" and "knowledgable" viewpoints....
This post brought to you by "ڇ" (U+0687, a.k.a. ARABIC LETTER TCHEHEH)
# Gabe on 13 Feb 2006 12:21 AM:
referenced by
2008/08/28 Collation backstory?
2008/02/23 Despite progression, the bug calls out to me quite LAOdly