the importance of symbols and punctuation

by Michael S. Kaplan, published on 2011/02/17 07:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2011/02/17/10130786.aspx


It's an old joke, but perhaps a few if you haven't heard it before....

An English professor wrote the words:

A woman without her man is nothing.

on the chalkboard and asked his students to punctuate it correctly.

All of the males in the class wrote:

A woman, without her man, is nothing.

All of the females in the class wrote:

A woman: without her, man is nothing.

One thing is perfectly clear, though:

Punctuation is powerful.

Perhaps it is a bit hasty to pass that NORM_IGNORESYMBOLS flag?

Though on the other hand if I pass NORM_IGNORESYMBOLS | NORM_IGNORECASE it has the benefit of allowing me to tell all of the kids in the class that they are wrong, and flunking them.

Just don't forget that the space is also a symbol.

awomanwithouthermanisnothing

Welcome to Thailand, everyone! We didn't need the spaces anyway....

You get my point. Everyone is always so quick to ignore stuff that may be interesting or important. We could all do with a little bit less of that.

It's one of the reasons the support for collation in Windows is so unsuited for search - because its only choices are to only see with the distinctions or to ignore all of them so completely that a user is almost punished for when they are specifically looking for the distinctions. When what search really needs is both -- to be willing to ignore distinctions of all sorts but to never forget they are there, and to prefer them when you see them....


John Cowan on 17 Feb 2011 7:42 AM:

"A woman without Herman is nothing?"  Who is this Herman?

Brendan Elliott on 20 Feb 2011 1:44 AM:

If you do search based on word breaking, then the word breakers tend to remove the punctuation long before you could even think about using NORM_IGNORESYMBOLS...  Agreed that the distinctions could be useful for ranking results, although there would likely be some impact on query time and/or index size.


go to newer or older post, or back to index or month or day