Sorting "Collate" all out

by Michael S. Kaplan, published on 2008/12/06 15:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/12/06/9181413.aspx


It was recently suggested to me that the terms COLLATION and COLLATE are confusing to people.

Not people over here, I'd guess.

And probably not so much the people who work in SQL Server or Access or really any database program -- anyone who places so much stock in keywords like ORDER BY is definitely in a class of their own, conceptually.

But it is a somewhat obvious fact that the set of people who can be considered PEOPLE is a superset of both the set of people who HAVE EVEN HEARD OF THIS BLOG and the set of people who have THE VAGUEST NOTION OF DATABASES.

Thus my supposition of the latter two sets not being confused is hardly evidence against the claim that the terms COLLATION and COLLATE are confusing to people.

They might be right.

I think that COLLATION and COLLATE are the kind of words most people have a vague notion about, a vague notion that may not translate into an understanding when it shows up in blogs or web sites or white papers or documentation.

And to make it worse I think that they are the kind of words that people think they should understand, which will lead them to not ask the obvious question of "WHAT DOES THAT WORD MEAN?" in any place that people can identify them as someone who did not know.

That passive "shame" (for lack of a better word) is a powerful force.

It is behind the fact that my Schrödinger's Cat is Dead T-shirt does so well at starting conversations with girls whether they understand the reference or not but tends to repel guys (ref: here).The shame of not knowing something that you have a vague sense that you ought to know? You could provide power for a small New Jersey town if you could somehow harness the energy of that....

Okay, so COLLATION is confusing.

But if you you use it a bunch of places (the aforementioned blogs or web sites or white papers or documentation) then what are your alternatives?

There is of course SORTING, a term I obviously can't dislike too much since it is the biggest word I use in the title of this Blog.

But to tell you the truth I don't really care much for that term, since although it is one common way to make use out of parts the collation API functions:

and so on, NONE OF THEM ACTUALLY SORT ANYTHING. At best they either provide a means to sort if you have a sorting algorithm (eg CompareString) or they do related ancillary activities that use the same data (eg FindNLSString).

Since none of them perform the action in question, one almost does want a word that, like the operations themselves, is one stepped removed from what the functions do and instead frames the concept.

One that matches what Unicode does (e.g. the Unicode Collation Algorithm, which some more people may have heard of).

So this potentially makes COLLATION a great word here, except for the fact that it is the wrong word here.

Now Microsoft as a company is not above just shoving the word down people's throats because they feel like it (it true screw the customer fashion), thus we insert things like ORDINAL into the mix because between the two kinds of confusion:

on the whole they prefer the former and not the latter. ORDINAL fits that model and is a true WTF term here, but COLLATION does not, at least not for the average person, the average member of the set of all people who are people.

And SORTING doesn't fit since sorting is a specific act that makes it way in to methods and functions and nine of the methods or functions do it or use the word.

The model of SORTING (ALL ASSEMBLY REQUIRED) for something that everybody intuitively understands -- since everybody understand both the WHICH COMES FIRST? question and the ARE THEY THE SAME? question  -- but no function actually sorts.

And then that last point -- the WHICH COMES FIRST? question and the ARE THEY THE SAME? question, suggests the other problem -- that SORTING, while clearly part of the former question is not so much part of the latter one.

Alright, so neither word is really the right way to go. So what is?

Well, the answer is kind of built in to the excessively long explanation of the non-answers, isn't it?

I mean, if one desires a term that

then does one have to look any further than COMPARE and COMPARISON?

In the words of fictional former president Bartlett, What's next?¹


1 - The line is: "When I ask 'What's Next?' it means I'm ready to move on to other things. So, what's next?"

 

This blog no sponsor, and this sentence no verb


# Simon Buchan on 7 Dec 2008 8:08 PM:

What's wrong with 'ordering'?

# Michael S. Kaplan on 7 Dec 2008 9:40 PM:

ORDERING has the same problem as SORTING -- because when you are doing identity checks, the order is not relevant to what you are trying to do....


Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day