home contents changes options help subscribe

Robots are small programs who walk systematically through all websites and do something ie. index the sites for search machines ie. http://google.de

They check for a file robots.txt in the root of your site.

Here's mine for a site with a ZWiki in Plone: I wonder if it works, it takes one week that the robots (googlebot) changes its behaviour.

User-agent: * Allow: / Disallow: /external_edit Disallow: /recycle_bin/ Disallow: /IssueTracker Disallow: /FilterIssues Disallow: /backlinks Disallow: /diff Disallow: /sendto_form Disallow: /subscribeform Disallow: /login_form Disallow: /mail_password_form Disallow: /search_form Disallow: /enabling_cookies

To ban all robots, spiders etc., do the following:

  • Go to the root of your site (or to the root of the part your site you want to ban robots) and add a Python Scripts, id is ban_rule, code see below;
  • Choose 'Set access rule" from the AddContents? menu and fill in the form (id from above)

credits to batlogg for the tip

 

agent=context.REQUEST['HTTP_USER_AGENT'] ip=context.REQUEST['HTTP_X_FORWARDED_FOR']

for ban in [Webupd, WebStripper, WebReaper, WebCopier, HTTrack, WebZIP, FrontPage, Teleport, Wget, 'OS-or-CPU']: if agent.find(ban)!=-1: raise Unauthorized, Go away!

ips=[69.57.136.42 ] if ip in ips: raise Unauthorized, Your ip has been banned.


Resources: