Stopping bad bots using Apache Modsec
Some webcrawlers (i.e., Bots) are important to the well-being of your website. Some of these include GoogleBot, msnbot, Yahoo! Slurp, etc. There are also a lot of bots out there that do nothing to help you, and actually can harm you by using up valuable bandwidth.
This brief tutorial will show how you can implement a blocking mechanism for all domains on your server by using modsec rules that target the User-Agents associated with “Bad Bots”. Some of the Bad Bots that I was tired of seeing in my logs include:
– Baidu Spider
– AhrefsBot
– linkdex
There are three easy steps to this process, as follows:
1) Create a badbots.txt file which contains the user-agents that you want to block
2) Create a modsec rule
3) Restart Apache
These instructions are for a CentOS 6.5 (RedHat-based) system and we assume that you already have Apache and modsec configured and that you have root access.
Let’s get to it!
1) Create a file called /etc/httpd/conf/modsec2/badbots.txt and insert the following:
AhrefsBot Anonymizer Attributor Baidu Bork-edition DataCha0s Deepnet Explorer desktopsmiley DigExt feedfinder gamingharbor heritrix ia_archiver Indy Library Jakarta Java juicyaccess larbin linkdex Missigua MRSPUTNIK Nutch panscient plaNETWORK Snapbot Sogou TinEye TwengaBot Twitturly User-Agent Viewzi WebCapture XX Yandex YebolBot
Once you have saved the badbots.txt file, you will want to complete the second step, as follows:
2) Create a rule in /etc/httpd/conf/modsec2/custom.conf
SecRule REQUEST_HEADERS:User-Agent "@pmFromFile badbots.txt" "id:350001,rev:1,severity:2,log,msg:'BAD BOT - Detected and Blocked. '"
This rule filters each webserver request and checks the user-agent against the badbots.txt list.
3) The final step is to restart your Apache webserver:
service httpd restart
Once these three steps are completed, you will see the denials in your server’s error logs. Each Bad Bot that visits your site(s) will be denied with a 406 error [Not Acceptable]. Once you;re happy with the way things are working, you can change from “log” to “nolog” in your modsec rule.
Enjoy!
Hi,
I am trying to make the stopping-bad-bots-using-apache-modsec procedure but my server doesn’t have a /etc/httpd/conf/modsec2/ folder, instead I can see files such as modsec2.conf, I guess my server config is different, so when I go to whm / modsecurity tools / rules list / add rule
and I add the rule
SecRule REQUEST_HEADERS:User-Agent “@pmFromFile badbots.txt” “id:350001,rev:1,severity:2,log,msg:’BAD BOT – Detected and Blocked. ‘”
when I save I get this message:
Error:The rule is invalid. Apache returned the following error: Syntax error on line 1 of -c/-C directives: Error creating rule: Could not open phrase file “-c/badbots.txt”: No such file or directory
I guess it’s a path problem so I have no clue how to do it, can you tell me what to modify in your procedure please?
Kind Regards,
Mat
You can always use the full path / absolute path for badbots.txt so that whereever it is that dosen’t matter. But should be under /etc/httpd or /etc/apache2 folder.
I have a modsec2 directory in the same path as modsec2.conf – so maybe try creating the modsec2 directory and place badbots.txt in there. Other things you can do are review modsec2 config files to see if the directory modsec2 is refenced within and also try ‘locate modsec2’ to see if there might be another path where modsec2 resides.
Dear Krnlpanic.
Thank you for this usefull string.
I added it to my Modsecurity and it logs fine n the rror_log and the audit log but in the access_log it still gives a code 200.
So I’m wondering if the bot is actually blocked since the code 200 indicates not.
other rules are blocked with a 403.
Is something wrong with my configuration?
Using Plesk 12.5
Thank you very much for your time!
Definitely looks like something’s not working right. You should be receiving 406’s, not 200’s.
Here’s what an entry in my audit log looks like:
And the corresponding entry in my access log: