What about search for kids?

My post earlier today (IStemmer'ed the tide (or, Language-specific processing #2)) struck a nerve with some people. They wondered aloud to me (in email) if one combined computers that one could give voice commands with a search engine whether children would eventually be able to search the web before they could even fully read and write.

I was staggered at the suggestion, as I had been focusing on "trying to make search as smart as a toddler" rather than "trying to make search smart enough for a toddler." Plus, I think that while language recognition a-la-Dragon Dictate is getting better it is not going to be good enough to handle some of the kids that I have dealt with in the past.

But is raises an interesting question --  attend me for a moment....

Right now if I search for "micrsoft corporation" in quotes in MSN search I get 181 entries that all look like Microsoft hits (if I search for the correctly spelled company name I get 3,390,988 hits). If I do the same thing in Google I get 127 hits and a 'Did you mean "microsoft corporation"' link at the top of the page. That link finds me 9,670,000 hits.

I wonder if those 181/127 entries are typos? :-)

So if Google is smart enough to check for typos, why couldn't it be smart enough to search for incorrect grammatical constructs? Why shouldn't 'I goed to school' return a 'Did you mean "I went to school"' link at the top of Google's page?

Why shouldn't there be either an extension to IStemmer or a whole new interface that would handle the cases where language rules and language exceptions conflict with each other? Wouldn't that be a bit more linguistic?

I realize that this is probably more of interest to linguists who are dealing with language acquisition, but as computers become able to be usable by those children, wouldn't having the search engine smart enough to handle these sorts of situations be interesting? And a step closer to being able to understand the language itself?

Brought to you by "C" (U+0043, a.k.a. LATIN LETTER CAPITAL C)
Because C is for cookie, and that's good enough for me!

# Dean Harding on 13 Mar 2005 4:19 PM:

It's a great idea, but I think Google's "Did you mean 'xxx'" is a bit more simplistic than IStemmers.

I am just guessing here, but I believe the way Google's hint thing works is that it compares your query to other queries that have been run, and does a simple analysis to find similar queries, spelled differently. Kind of like how Word suggests the correct spelling, except instead of a fixed dictionary like Word, it uses the "dynamic" dictionary of other people's queries (assuming most people spell their queries correctly, of course).

Using this dynamic data is better than a fixed query, because it means that Google "knows" how to spell new words and proper nouns without someone having to explicitly add them to the "dictionary".

The other problem is that natual-language searches generally don't give as good results as keyword-based searches. But I guess if natural-language searches become normal, the word-based correction could be extended to sentence-based correction...

# Michael Kaplan on 13 Mar 2005 4:22 PM:

Yes, you are almost certainly right about how Google works here -- but wouldn't it be cooler if the problem could be solved in the way that handled these other scenarios, too? :-)

# Dean Harding on 13 Mar 2005 7:29 PM:

Absolutely! I'm sure a more general solution wouldn't be too difficult, and I don't see why you can't use the same technique that it uses now i.e. once natual-language queries become more common, you can use other people's queries as a basis for doing analysis on new queries and suggest better phrasing or grammer and that sort of thing.

So you might ask the search engine "what's the greatest vet in my area?" and it'll say "Did you mean 'what's the /best/ vet in my area?'"

Now that'd be awesome!

# Michael Kaplan on 13 Mar 2005 8:07 PM:

A colleague pointed out to me that a solution here would not just be helpful for young ones aquiring their first language but also people who are acquiring their second language.

But I don't think it is trivial, and I also don't think the current generation has a better solution then just throwing a lot of reources (like machines) at the problem....

# Mike Williams on 13 Mar 2005 9:42 PM:

I think it is important for the search engine to restate ambiguous search queries as a set of choices of more direct queries. That way you have a clearer understanding of what the engine is going to do, and you can adjust your terms or frame of reference.

You may even find that asking "WHO's the best vet is even more appropriate" :-)

# Ivan on 14 Mar 2005 4:52 AM:

> That link finds me 9,670,000 hits.
I get 10,100,000 hits.


# Michael Kaplan on 14 Mar 2005 9:08 AM:

I am sure it changes from day to day (and from google site to google site!).

