Robots are small programs who walk systematically through all websites and do something ie. index the sites for search machines ie. http://google.de
They check for a file robots.txt in the root of your site.
Here's mine for a site with a ZWiki in Plone: I wonder if it works, it takes one week that the robots (googlebot) changes its behaviour.
User-agent: * Allow: / Disallow: /external_edit Disallow: /recycle_bin/ Disallow: /IssueTracker Disallow: /FilterIssues Disallow: /backlinks Disallow: /diff Disallow: /sendto_form Disallow: /subscribeform Disallow: /login_form Disallow: /mail_password_form Disallow: /search_form Disallow: /enabling_cookies
To ban all robots, spiders etc., do the following:
- Go to the root of your site (or to the root of the part your site you want to ban robots) and add a Python Scripts, id is
ban_rule, code see below; - Choose 'Set access rule" from the AddContents? menu and fill in the form (id from above)
credits to batlogg for the tip
agent=context.REQUEST['HTTP_USER_AGENT'] ip=context.REQUEST['HTTP_X_FORWARDED_FOR']
for ban in [
Webupd,WebStripper,WebReaper,WebCopier,HTTrack,WebZIP,FrontPage,Teleport,Wget, 'OS-or-CPU']: if agent.find(ban)!=-1: raiseUnauthorized,Go away!ips=[
69.57.136.42] if ip in ips: raiseUnauthorized,Your ip has been banned.
Resources:
- http://www.robotstxt.org/wc/faq.html - These frequently asked questions about Web robots