by Michael S. Kaplan, published on 2008/12/05 10:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/12/05/9178155.aspx

The mail I got the other day from Brett was pretty amusing (reposted uncensored with his permission, with no hint as to whether he was Anonymous1 or Anonymous2:

Regards this conversation … anonymized for obvious reasons …

Anonymous 1 [10:09 AM]:
 If you haven't had sex is six months, you're not accountable for who you sleep with.
 (imdb is your friend)
Anonymous 2 [10:10 AM]:
 weird, I couldn't find it there.
Anonymous 1 [10:10 AM]:
 I knew it was 'six months' but I didn't know the phrasing. That probably helped my search.
Anonymous 2 [10:10 AM]:
 i tried '6 months'
 really search should figure that out

Perhaps SORT_NAMES_AS_NUMBERS for next next version?

The combination of SORT_DIGITS_AS_NUMBERS and SORT_NAMES_AS_NUMBERS would be very powerful I think …

Though I think this is what it is called stemming right?  Does it belong in LCMapString() or higher layer, it’s hard to say.  I assume most stemming is done in MSSearch.  But this stemming is so basic…

At any rate, that was my random thought for the day.

Now the idea of a SORT_DIGITS_AS_NUMBERS kind of flag is the kind of thing I'd hypothesized if StrCmpLogicalW were folded into the built-in NLS collation functions, something I hypothesized way back i n2005 when I talked about What is up with number sorting? and related blogs.

In fact if you look at Larry Osterman's comment in that blog (here), his wife had apparently been trying to get Larry to "fix" this kind of problem in Outlook Express for years. :-)

I know how Larry feels, I get calls from my father almost annually asking if I can fix problems in Outlook (usual answer: NO, but sometimes I get more specific depending on my mood!). Of course it is easier for me than for Larry given the two different relationships we're talking about. Plus since I was able to get Larry a somewhat informal but kind of official answer (here), he is still in better shape to answer the question; hy father's Outlook questions are almost always about features I don't use!

I did once have a short contact with the Outlook team to write an add-in, but I don't even remember what the add-in was. But I need to stop digressing!

Back to the idea of sorting number words as numbers (I don't think the SORT_NAMES_AS_NUMBERS flag name would be intuitive to anyone other than Brett andanyone he explained the idea to, for as long they remembered his explanation!).

The technical problems here are hideously complex since even in English there are both cardinal and ordinal numbers to deal with (and the thorny issue of whether mixing them e.g. FOURTH vs. FIVE would see them treated as being the same kind of number or not).

Then when you add languages with even more different types of numbers like Japanese and the mix of different types of numbers there.

And then when numbers from different languages mix.

Which (getting back to the IM conversation), isn't there a language where sex actually means six which would mean 6 in this new sort? :-)

Though I agree it is a great idea for a feature in search, its use in collation functions that are designed for returning information for sorted lists is much more questionable.

And yes, Brett is right -- we are talking about stemming here (this is a topic I briefly talked about back in 2005 in IStemmer'ed the tide), and having this kind of feature in searching algorithms does end up feeling like a pretty basic requirement no matter how complicated it may end up being across all languages...

Now as for the whole IM and the time of day markers, I specifically offered to remove all of that and he said not to bother, so hopefully no one will wonder about all of this happening in the middle of the morning. :-)


This blog brought to you by(U+136e, aka ETHIOPIC DIGIT SIX)

Michael Madsen on 6 Dec 2008 12:01 PM:

"Which (getting back to the IM conversation), isn't there a language where sex actually means six which would mean 6 in this new sort? :-)"

Actually, there are several, according to Wiktionary (http://en.wiktionary.org/wiki/six): Swedish, Interlingua, Icelandic, and Latin.

