Some webcrawlers (i.e., Bots) are important to the well-being of your website. Some of these include GoogleBot, msnbot, Yahoo! Slurp, etc. There are also a lot of bots out there that do nothing to help you, and actually can harm you by using up valuable bandwidth.
This brief tutorial will show how you can implement a blocking mechanism for all domains on your server by using modsec rules that target the User-Agents associated with “Bad Bots”. Some of the Bad Bots that I was tired of seeing in my logs include:
- Baidu Spider
There are three easy steps to this process, as follows:
1) Create a badbots.txt file which contains the user-agents that you want to block
2) Create a modsec rule
3) Restart Apache
These instructions are for a CentOS 6.5 (RedHat-based) system and we assume that you already have Apache and modsec configured and that you have root access.
Let’s get to it!
1) Create a file called /etc/httpd/conf/modsec2/badbots.txt and insert the following:
AhrefsBot Anonymizer Attributor Baidu Bork-edition DataCha0s Deepnet Explorer desktopsmiley DigExt feedfinder gamingharbor heritrix ia_archiver Indy Library Jakarta Java juicyaccess larbin linkdex Missigua MRSPUTNIK Nutch panscient plaNETWORK Snapbot Sogou TinEye TwengaBot Twitturly User-Agent Viewzi WebCapture XX Yandex YebolBot
Once you have saved the badbots.txt file, you will want to complete the second step, as follows:
2) Create a rule in /etc/httpd/conf/modsec2/custom.conf
SecRule REQUEST_HEADERS:User-Agent "@pmFromFile badbots.txt" "id:350001,rev:1,severity:2,log,msg:'BAD BOT - Detected and Blocked. '"
This rule filters each webserver request and checks the user-agent against the badbots.txt list.
3) The final step is to restart your Apache webserver:
service httpd restart
Once these three steps are completed, you will see the denials in your server’s error logs. Each Bad Bot that visits your site(s) will be denied with a 406 error [Not Acceptable]. Once you;re happy with the way things are working, you can change from “log” to “nolog” in your modsec rule.