“Web site policy makers are playing favorites and Google is the big beneficiary, say Penn State researchers. The research team created a search engine called BotSeer used to examine more than 7,500 Web sites, and found a pro-Google bias in terms of which search engine Web crawlers were or were not allowed access.” -Alpha Doggs
Determining Bias to Search Engines from Robots.txt
BotSeer is a search engine for robots.txt. Its goal is to provide information about and access to robots.txt files throughout the web by crawling and indexing web robots.txt files and related documents. In addition, statistics about favored robots, comments and robot behavior is analyzed and presented. BotSeer is the first search engine to provide full text search and analysis of robots.txt files.

“We have presented a comprehensive survey of robot biases on the Web through careful content and statistical analysis of a large sample of robots.txt files. Results show that the robots of popular search engines and information portals, such as Google, Yahoo, and MSN, are generally favored by most of the websites we have sampled. This implies a “rich get richer” bias toward popular search engines.”
Yang Sun, Ziming Zhuang, Isaac G. Councill, and C. Lee Giles
Information Sciences and Technology
The Pennsylvania State University

















November 16th, 2007 at 12:00 pm
[...] Read the rest of this great post here [...]