Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

With a control. Launch a fake search engine with similar spiked results but with no traceable connection to Google. If the results only show up on Bing from google-clicks and not from clicks on the control, then it's a good indicator that Bing have Google-specific code. Then run the test a few more times with greater than 100 queries (which in SE land is a miniscule test set).

From what they've said, it seems like they only tested against fake clicks on google.com. That tells us Bing are using click data but nothing more. This is a pretty simple debugging technique, which is why I'm shocked if Google didn't think to do that. I really wish I wasn't the one saying this, wish I didn't have these doubts. But I can come to no other conclusion than the ones I've outlined in the post.



If you can't come up with any other conclusion, I'm not sure you're really sorry about having those doubts. Here's a few alternatives off the top of my head:

0. There might or might not have been a control group, but its results didn't matter since:

0a) the whole purpose of revealing this was to make sure that blackhat SEOs could start abusing the system, and MS would thus be forced to stop doing it.

0b) the whole purpose of revealing this was to shame MS into stopping the practice, and for that purpose it didn't matter whether the system was specific or generic.

0c) describing the full experimental setup and the gazillion things that were tested would just have distracted from the core story

1. There was a control group and it suggested that the mechanism wasn't generic, it just wasn't mentioned because:

1a) it was held back as a gotcha in case Microsoft started lying about what the system actually did.

1b) positive evidence from the control group couldn't actually prove anything, it'd be suggestive at best.

2. There was no need for an explicit control group since:

2a) they actually observed the network traffic of the toolbar, and it only sent the relevant information for Google and not other sites.

2b) they disassembled the toolbar and found out it had Google-specific code related to this.

3. There was a control group and it suggested that the mechanism was generic, but:

3a) they thought that the the mechanism being generic didn't matter, and what MS were doing was still equally dodgy.

3b) they thought that MS would not be keen on trying out a "oh no, it's not just Google whose algorithms we're leeching off when spying on users, it's every other site too" PR strategy.

I have no idea of what actually happened. In all likelihood it was something not listed here, since these were just random ideas. But at least I think many of them are way more plausible than the silly "maverick super-senior engineers botch the job, leak a flawed story, PR coverup follows" theory.


That sort of "control" wouldn't necessarily be conclusive; Bing could be using click data weighed by a prior calculation of how trustworthy the clicked-on site is (e.g. an analogue of PageRank score), whereupon clicks on Google would rank much higher than clicks on some newly-created fake search engine.

I guess my point is, unless you have deep knowledge into the secret workings of a search engine with hundreds of inputs, simple "tests" to figure out the nature of just one of those inputs are bound to end in multiple plausible explanations for your results, unless you rig a huge fraction of the WWW. And Google has better things to do with their time.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: