27 August 2008

flavors of English on Google

i was just looking through the site statistics for this here blog. one of the most interesting and useful bits of information that statcounter provides me are the search terms that people use. i would say that 99% of these searches are done on Google — we really have drunk the pagerank kool-aid. a lot of searches are pretty lengthy and specific (e.g. "kobe bryant interview in italian" or "who is the girl in the benny lava video?"). one recent search stuck out to me, though. somebody searched for just the word "whomever", and wound up at my previous post "The Office on whomever". i thought that was pretty remarkable. i clicked through on the link that statcounter provided me and saw that the search was made on google.co.uk, and that descriptively adequate was on the front page of results, at position number 6.

then, for whatever reason, i decided to re-run the search using google.com. my post was nowhere to be found on the first page. the results were entirely different. descriptively adequate finally showed up at #14 on the list of results. what's going on? certainly google hasn't written different versions of pagerank to deal with different localizations of English? as far as cataloguing search results goes, the fact that a bunch of Americans in California wrote the algorithm shouldn't adversely affect Brits and the like.

i couldn't stop there. i ran the search on all of the English Google localizations that i could think of, and got even more different results. i've also noted the number of total results that Google estimates, which also (oddly) vary by localization.

localization#total hits
google.com147,480,000
google.co.uk68,200,000
google.ca78,180,000
google.com.au108,190,000
google.com.nz78,460,000

as i was compiling this table i remembered that Google mucks with your search results if you're signed in (which i of course had to be in order to access blogger, without which i couldn't be writing this post). i signed out, and on google.com the DA link rose to #4. i guess i should just be happy i'm on the front page on all of these searches. but there are still lingering, bizarre questions.

why does Google report different numbers of hits for different localizations?
no clue. (comments are open!)

what is causing the rank fluctuations even when i'm not logged in?
some clue. on all of the non-US localizations there is a feature "search pages from [country name]". perhaps i've got fewer australian sites linking to my blog, so my rank is slightly lower in australia than in the US or great britain.

why the hell is Google biasing my custom algorithm against my own damn blog?!
i mean throw me a bone here, guys.

and the baffler...
why do i get this on google.ca?
i mean, you're kidding, right? i'm sure that the frequency of whatever is much higher than that of whomever, but 8 million hits on a word that's in the dictionary should be enough data for google to not question my intent. and why only canadians, eh? this, of course, isn't the first time that i've seen weird spelling suggestions on Google. so perhaps they really do think they know something about English varieties that i don't?

No comments: