by Michael S. Kaplan, published on 2007/12/06 10:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/12/06/6675217.aspx
So last night at 7:59pm I posted A Strange and Sublime HASANT and a ways down the page I put in a nonexistent Bengali word, made by adding a HASANT (aka Virama, aka hôshonto) to an otherwise valid Bengali word whose final vowel is not there despite lacking a hasant....
You'll have to look at the page to see the word, for reasons that will hopefully be apparent shortly.
On the page I pointed out that Google could find no instances of the word, a word that is "sensible" by a naive understanding of Unicode "Virama model" rules, though not by the way native speakers would use their language.
And I joked that soon you'd be able to find one site -- my blog post.
NINE MINUTES LATER, Google found my page.
FIFTY THREE MINUTES LATER Google found that awful sucky Noticas Externas splog site copy of my page (but only as one that was filtered by default and you had to click the "show all results" link to see it!).
Thus for 44 minutes, my blog had a Googlewhack! Or even longer if SPLOG sites don't count...
Well okay, not technically a Googlewhack since there was only one word involved.
But that is not the point of this post anyway. :-)
Because the word was so far into the post, most splog sites did not pick up the word....
In any case, nearly half a day later, Live Search had not yet found either page with that nonexistent word.
Now I like the new page for Live Search, and I do like the relevance stuff -- a lot. I like the new features and I like the future promise. There's nothing like knowing that Microsoft is spending money on research and technology in an area to make one feel like cool stuff will happen. I really do believe that.
But I think we may still be at the stage where Microsoft could perhaps still throw some serious hardware money at the problem to be spidering more often, finding stuff faster. :-)
Not to find my almost Googlewhack, obviously.
My search is a silly and unintentional symptom of a very real difference in the two search engines.
But to be able to find the latest info, up to the minute, maybe that is a very worthwhile feature for when you need it?
Now of course it is hardly the only feature -- I find myself using both engines these days for most searches, as they often have slightly different takes on results.
And to be honest, while this "up-to-the-minute" crap makes for a cool demo, I am not out to see Britney's latest over-exposure, so the vast majority of searches (that I want to do) don't really benefit from it.
I guess this post was all about the fact that I had an almost Googlewhack for 44 minutes last night if not for a silly two-word rule.
Are there non-Britney scenarios where the faster results really are that important? I honestly could not think of very many.
Sometimes I honestly wonder if Google's search strategy is not run by the same principles that cause middle-aged men to leave their wives, get hair pieces, and buy fast sports cars.
Or maybe they just want to track Britney "exposure" shots as soon as they happen, something that the news services are never as fast to cover as the seedier side of the web?
Just Kidding. Mostly. :-)
And I honestly don't mean to be so denigrating -- how credible is a Microsoft employee who thinks Google is a bit off? Even a Microsoft employee who is not afraid to slam Microsoft when it is deserved will be treated skeptically.
But when I think about what has to be behind those results, hardware-wise, I am staggered, and wonder what could be accomplished by taking a big chunk of that computing power and doing something more significant than that....
This post brought to you by G (U+ff27, aka FULLWIDTH LATIN CAPITAL LETTER G)
John Cowan on 6 Dec 2007 12:09 PM:
Without breaching any confidentialities, I'd say that it's not so much a matter of spidering faster or oftener, but about deciding which sites to spider more or less constantly and which can be safely explored less often.
(Technically technically, the words in Googlewhacks have to be available in Google's spelling dictionary in order to count.)
Dean Harding on 6 Dec 2007 4:53 PM:
I could've just been a fluke. I mean, maybe Google and Live Search crawl your site at the same rate, but Google just *happened* to crawl it a few minutes after you posted, and Live Search just *happened* to crawl it just *before* you posted.
One data point isn't enough to draw a conclusion :-)
Michael S. Kaplan on 6 Dec 2007 4:58 PM:
That would be pretty amazing coincidence -- Google picked three links within a few hours from two different sites (my home page and the specific blog page are the two), and Live Search still doesn't find any of them with that bogus word. I think there is a pattern here.
Note that I did not post at my usual time either -- so any process paying attention to that sort of thing would not have found it so soon either....
Asztal on 7 Dec 2007 7:17 AM:
Does perhaps your blog software send notifications to external entities when you post? I know mine sends a ping to pingomatic.com. Perhaps Google is getting a notification.
Michael S. Kaplan on 7 Dec 2007 10:55 AM:
Aha, it does! There are actually several that the site is set by default to send to automatically, though it can be overridden to send anywhere one chooses....
(It might be a fun experiment to turn that setting off on a post, and then turn it back on later -- to see how long it takes without the ping?).
I guess Live Search is not picking up such notifications....
Geoffrey Coram on 10 Dec 2007 1:05 PM:
MSDN doesn't send notifications to Microsoft's own search engine? or LiveSearch isn't picking up notifications from MSDN? Right hand doesn't know what the left is doing.
Michael S. Kaplan on 10 Dec 2007 1:15 PM:
Well, blogs.msdn.com is not exactly MSDN, though most search engines covering blogs these days do seem to be using ping servers as far as I can to determine. No idea whether Live Search is not or whether they are using different services (the latter is a genuine disconnect, the former is an architectural decision, so it remains to be seen whether this is genuinely a bug or not)....
Though my original point on the "instant" feature makes for a great demo though is really not the most crucial one for a blog that is not generally covering latest news and such.
Maurits [MSFT] on 11 Dec 2007 11:48 AM:
Another possibility is that Google Search is digesting the blogs.msdn.com RSS feed, and Live Search isn't.