by Michael S. Kaplan, published on 2005/11/13 03:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/11/13/492179.aspx
A wise man (well, I think it was the comedian Emo Phillips, does he count?) once spoke the following little fable:
I had an argument with my father. I argued that Plato was the father of philosophy. My dad of course took the opposite position, that I should wax the kitchen floor.
I said: "Well, the kitchen floor doesn't exist! At least not in the permanent sense that the concept 'floor' does."
He said: "Do you think the concept 'your skull' exists?"
I said: 'Yes'. And then he surprised me by juxtaposing the two concepts.
Someone was trying to tell me about it the other day but I made it clear I had already heard it (my sources of knowledge are numerous but perhaps not impressive).
Later on, I decided I would juxtapose some things in a blog post. :-)
Here goes....
The concept of alphabetic case is interesting. And so is the concept of linguistic collation. So let's juxtapose those two concepts for a moment.
Which comes first -- uppercase or lowercase?
Well, in a binary sort, the answer is simple -- uppercase comes first. Every time. It is how code points are encoded in Unicode. Period.
In a dictionary, the uppercase also often does come first (or they are put together as multiple definitions in one entry).
In linguistic collations on Windows, in most locales1, lowercase by convention comes first.
Like I said in the post Why do the high surrogates have the low numbers?, however, it is simply a conceptual construct.
When you deal with collation in terms of weights, it is easy to take the uppercase letters as being somehow heavier since they are usually (bordering on always) bigger and taller.
I have had people tell me that they think this is incorrect; they believe that it should always be the other way around. But for the most part that is simply rebelling against the construct we are using, and preferring a different one.
So, those of you out there who think uppercase should be sorted before lowercase, what is the conceptual construct you are using?
Just curious....
1 - Bonus points for anyone who knows which collation(s) under Windows break this rule without testing them first!
This post brought to you by "ṏ" (U+1e4f, a.k.a. LATIN SMALL LETTER O WITH TILDE AND DIARESIS)
# Vorn on 13 Nov 2005 3:16 AM:
# Jerry Pisk on 13 Nov 2005 3:23 AM:
# Norbert Lindenberg on 13 Nov 2005 4:26 AM:
# Pavel Šrubař on 13 Nov 2005 4:50 AM:
# Baciu Valentin on 13 Nov 2005 6:47 AM:
# Dean Harding on 13 Nov 2005 5:47 PM:
# Michael S. Kaplan on 13 Nov 2005 6:13 PM:
# Petr Kadlec on 14 Nov 2005 5:15 AM:
# Jerry Pisk on 14 Nov 2005 2:04 PM:
# Michael S. Kaplan on 14 Nov 2005 3:30 PM:
# Jerry Pisk on 14 Nov 2005 5:27 PM:
# Michael S. Kaplan on 14 Nov 2005 7:23 PM:
# Centaur on 15 Nov 2005 3:10 PM:
referenced by
2010/03/09 Coloring outside the lines in the a-ness of the Hungarian Technical Sort
2010/03/06 Burn Windows Burn (aka If we want to unsay *this* one, we cannot say "Mu")
2007/12/06 In SQL Server, A-Z, A-z, a-Z, and a-z may not mean the same thing!
2006/11/01 If you add enough characters to a sort, intuitive distinction can suffer
2005/11/30 Expectations around collation
2005/11/26 Technically it *is* a hungarian sort
2005/11/18 Some sort of order to collation