by Michael S. Kaplan, published on 2005/03/13 17:16 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/03/13/394979.aspx
My post earlier today (IStemmer'ed the tide (or, Language-specific processing #2)) struck a nerve with some people. They wondered aloud to me (in email) if one combined computers that one could give voice commands with a search engine whether children would eventually be able to search the web before they could even fully read and write.
I was staggered at the suggestion, as I had been focusing on "trying to make search as smart as a toddler" rather than "trying to make search smart enough for a toddler." Plus, I think that while language recognition a-la-Dragon Dictate is getting better it is not going to be good enough to handle some of the kids that I have dealt with in the past.
But is raises an interesting question -- attend me for a moment....
Right now if I search for "micrsoft corporation" in quotes in MSN search I get 181 entries that all look like Microsoft hits (if I search for the correctly spelled company name I get 3,390,988 hits). If I do the same thing in Google I get 127 hits and a 'Did you mean "microsoft corporation"' link at the top of the page. That link finds me 9,670,000 hits.
I wonder if those 181/127 entries are typos? :-)
So if Google is smart enough to check for typos, why couldn't it be smart enough to search for incorrect grammatical constructs? Why shouldn't 'I goed to school' return a 'Did you mean "I went to school"' link at the top of Google's page?
Why shouldn't there be either an extension to IStemmer or a whole new interface that would handle the cases where language rules and language exceptions conflict with each other? Wouldn't that be a bit more linguistic?
I realize that this is probably more of interest to linguists who are dealing with language acquisition, but as computers become able to be usable by those children, wouldn't having the search engine smart enough to handle these sorts of situations be interesting? And a step closer to being able to understand the language itself?
Brought to you by "C" (U+0043, a.k.a. LATIN LETTER CAPITAL C)
Because C is for cookie, and that's good enough for me!
# Dean Harding on 13 Mar 2005 4:19 PM:
# Michael Kaplan on 13 Mar 2005 4:22 PM:
# Dean Harding on 13 Mar 2005 7:29 PM:
# Michael Kaplan on 13 Mar 2005 8:07 PM:
# Mike Williams on 13 Mar 2005 9:42 PM:
# Ivan on 14 Mar 2005 4:52 AM:
# Michael Kaplan on 14 Mar 2005 9:08 AM:
go to newer or older post, or back to index or month or day