27 August 2008

flavors of English on Google

i was just looking through the site statistics for this here blog. one of the most interesting and useful bits of information that statcounter provides me are the search terms that people use. i would say that 99% of these searches are done on Google — we really have drunk the pagerank kool-aid. a lot of searches are pretty lengthy and specific (e.g. "kobe bryant interview in italian" or "who is the girl in the benny lava video?"). one recent search stuck out to me, though. somebody searched for just the word "whomever", and wound up at my previous post "The Office on whomever". i thought that was pretty remarkable. i clicked through on the link that statcounter provided me and saw that the search was made on google.co.uk, and that descriptively adequate was on the front page of results, at position number 6.

then, for whatever reason, i decided to re-run the search using google.com. my post was nowhere to be found on the first page. the results were entirely different. descriptively adequate finally showed up at #14 on the list of results. what's going on? certainly google hasn't written different versions of pagerank to deal with different localizations of English? as far as cataloguing search results goes, the fact that a bunch of Americans in California wrote the algorithm shouldn't adversely affect Brits and the like.

i couldn't stop there. i ran the search on all of the English Google localizations that i could think of, and got even more different results. i've also noted the number of total results that Google estimates, which also (oddly) vary by localization.

localization#total hits
google.com147,480,000
google.co.uk68,200,000
google.ca78,180,000
google.com.au108,190,000
google.com.nz78,460,000

as i was compiling this table i remembered that Google mucks with your search results if you're signed in (which i of course had to be in order to access blogger, without which i couldn't be writing this post). i signed out, and on google.com the DA link rose to #4. i guess i should just be happy i'm on the front page on all of these searches. but there are still lingering, bizarre questions.

why does Google report different numbers of hits for different localizations?
no clue. (comments are open!)

what is causing the rank fluctuations even when i'm not logged in?
some clue. on all of the non-US localizations there is a feature "search pages from [country name]". perhaps i've got fewer australian sites linking to my blog, so my rank is slightly lower in australia than in the US or great britain.

why the hell is Google biasing my custom algorithm against my own damn blog?!
i mean throw me a bone here, guys.

and the baffler...
why do i get this on google.ca?
i mean, you're kidding, right? i'm sure that the frequency of whatever is much higher than that of whomever, but 8 million hits on a word that's in the dictionary should be enough data for google to not question my intent. and why only canadians, eh? this, of course, isn't the first time that i've seen weird spelling suggestions on Google. so perhaps they really do think they know something about English varieties that i don't?

23 August 2008

Malaysian government fails to ban feature reconstruction

please, don't judge me about the inspiration for this post. the short story is "sometimes you just get bored, and who knows where you could end up on Wikipedia!" tonight it was crappy pop song articles. thence comes this quote from the "Controversy" section of the article for this summer's top hit, I Kissed a Girl.

In Malaysian radio stations, the song has been retitled 'I Kissed...' with the words 'a girl' silenced throughout the chorus in the song.
never mind the odd choice of preposition (as a native English speaker i've never heard a song in a radio station; on works fine). the fact of the matter is that this censorship is about as effective as bleeping the -hole in asshole. if you take the phrase "i kissed a girl" and eliminate "a girl", then in isolation it becomes completely open-ended. it could be "i kissed a man" or "i kissed my mother" or "i kissed a frog". too bad there are more lyrics in the song's refrain!
i kissed a girl / and i liked it / the taste of her cherry chapstick
oops! there's a gendered pronoun hanging out there, eight words later. and it needs an antecedent. and the only preceding nominals are i and it. i can't be the antecedent, because then she would have said my, and it is decidedly neuter. so it can only be...gasp! she didn't! chances are nobody's getting the wool pulled over their eyes either; Wikipedia also says that increasing numbers of Malaysians are identifying English as a first language. they can put the pieces of this not-so-tricky linguistic puzzle back together as quickly as i did. censorship falls flat again.

i think i've gotten more linguistic enjoyment out of the song than by listening to it. there's one other bit of the chorus that intrigued me. it's the other pronoun in those lines, namely it. i'm sure that the intended antecedent is "[the fact that] i kissed a girl", but i can't help but get an ambiguous interpretation where it could be topicalized and actually refer to "the taste..." is this a weird judgement? comments are always open here.

18 August 2008

we want...a count noun!

it's great that Language Log has enabled comments on some of their posts, but it's all the more frustrating when i have something pithy to say and they're turned off. this is a would-be comment in response to Arnold Zwicky's post Countification.

in describing the difference between mass and count nouns in English, he says that shrub is a count noun while shrubbery is a mass noun. while i can certainly use shrubbery as a mass noun, i rarely talk about shrubbery at all, and when i do, i'm almost always quoting Monty Python.



throughout the Knights Who Say 'Ni!' sketch, shrubbery is used consistently as a count noun, taking determiners such as a and another, and having a plural form shrubberies. i never thought of these as ungrammatical in any way, although i suppose it would add to the humor, although the pure absurdity of the scene is plenty. we of course also have Monty Python to thank for the brilliant backformation shrubber (n.) - one who arranges, designs, and sells shrubberies.

aaaaaand back!

a barrage of posts is forthcoming! i'm officially declaring my summer blogging malaise to be over as a new school year is (sadly) just around the corner. as with all my blog revival phases in the past, things will probably slow down in a few weeks, but in the meantime, here we go!