About 3 weeks ago, I said it was Past Time For MSN To Pony Up To The Real Truth About Referrer Spam. Now, they have.
Since that time, I've had the opportunity to discuss some of these issues with Jeremiah Andrick, the Program Manager of Live Search Developer Tools. He and the team of Live engineers* have addressed these issues and have issued a statement, answered questions, explained what it is all about (hint: they are checking for cloaking), and noted what they've done to solve the problems created by their new spambot. They've also created ways for you to address any additional issues, problems, or concerns you have about this bot's activities.
They discuss:
- the Adsense issues
- high traffic from the bot distorting logs
- inappropriate non-site-specific terms
- and their response to all of the above
I followed up with them on some things that weren't clear to me from their response, and I wanted to pass along the answers I received. I would suggest, however, that you go read their response first and then come back here when you're done (that link will open a new tab/window for you).
1. I asked why they couldn't have just let the new bot extension crawl the cached pages they have stored on their servers. They said:
The reason we could not check for cloaking against the cached version is that only a live page can be tested for cloaking. In the case of a spammer the cached page would only show us what it showed MSNbot. We want to reiterate that not all cloaking is spam, but we don't ever recommend cloaking for any reason. To provide great results, we need to be able to identify the legitimate and illegitimate content.
2. I then said that I thought they should emphasize that blocking the new bot is effectively blocking msnbot completely, and that could result in their pages being de-indexed. They said:
We understand the issues that webmasters were experiencing that may have caused them to block the bot. However, we would ask that that they contact us with any issues through our feedback form, or through our forum before they block the bot so we have a chance to resolve them. We believe we have addressed all of the existing issues with the bot, but would like to know of any other issues that webmasters may have encountered.
3. Finally, I wanted to know if this bot follows any directives in robots.txt or does it ignore or even look at robots.txt. They said:
Yes, this robot does follow the robots.txt file. The reason you don't see it download it, is that we use a fresh copy from our index. The tool does respect the robots.txt the same way that MSNBot does with a caveat; the tool behaves like a browser and some files that a crawler would ignore will be viewed just like real user would.
I really appreciate the fact that the Live team members took our concerns seriously, and worked hard to address them.
---
* ("He and the team of Live engineers" ... as opposed to ... "He and the team of zombies"?). Sorry, that phrase, "Live engineers", just brought out the "beast" in me. Oh, go ahead and laugh. You're as geeky as I am. 😉
More Thoughts and History of This Issue:
Yell If Microsoft's Live.com Spammed You Too
I find it interesting that their response on the robots.txt question is basically, “Our bot obeys robots.txt, except when it doesn’t.”
I’m here because they are still doing it. Over 30% of my site traffic is this crappy bot run by the marvels that brought the world that gorgeous piece of crapware known as Vista.
Bitter? Yup.