KrnlPanic's Linux Notes and Tips

Working with linux since kernel version 2.0.30

Stopping bad bots using Apache Modsec

Some webcrawlers (i.e., Bots) are important to the well-being of your website. Some of these include GoogleBot, msnbot, Yahoo! Slurp, etc. There are also a lot of bots out there that do nothing to help you, and actually can harm you by using up valuable bandwidth.

This brief tutorial will show how you can implement a blocking mechanism for all domains on your server by using modsec rules that target the User-Agents associated with “Bad Bots”. Some of the Bad Bots that I was tired of seeing in my logs include:
– Baidu Spider
– AhrefsBot
– linkdex

There are three easy steps to this process, as follows:

1) Create a badbots.txt file which contains the user-agents that you want to block
2) Create a modsec rule
3) Restart Apache

These instructions are for a CentOS 6.5 (RedHat-based) system and we assume that you already have Apache and modsec configured and that you have root access.

Let’s get to it!

1) Create a file called /etc/httpd/conf/modsec2/badbots.txt and insert the following:

AhrefsBot
Anonymizer
Attributor
Baidu
Bork-edition
DataCha0s
Deepnet Explorer
desktopsmiley
DigExt
feedfinder
gamingharbor
heritrix
ia_archiver
Indy Library
Jakarta
Java
juicyaccess
larbin
linkdex
Missigua
MRSPUTNIK
Nutch
panscient
plaNETWORK
Snapbot
Sogou
TinEye
TwengaBot
Twitturly
User-Agent
Viewzi
WebCapture
XX
Yandex
YebolBot

Once you have saved the badbots.txt file, you will want to complete the second step, as follows:

2) Create a rule in /etc/httpd/conf/modsec2/custom.conf

SecRule REQUEST_HEADERS:User-Agent "@pmFromFile badbots.txt" "id:350001,rev:1,severity:2,log,msg:'BAD BOT - Detected and Blocked. '"

This rule filters each webserver request and checks the user-agent against the badbots.txt list.

3) The final step is to restart your Apache webserver:

service httpd restart

Once these three steps are completed, you will see the denials in your server’s error logs. Each Bad Bot that visits your site(s) will be denied with a 406 error [Not Acceptable]. Once you;re happy with the way things are working, you can change from “log” to “nolog” in your modsec rule.

Enjoy!

5 Comments on “Stopping bad bots using Apache Modsec

Comments are closed.

  • Hi,
    I am trying to make the stopping-bad-bots-using-apache-modsec procedure but my server doesn’t have a /etc/httpd/conf/modsec2/ folder, instead I can see files such as modsec2.conf, I guess my server config is different, so when I go to whm / modsecurity tools / rules list / add rule
    and I add the rule
    SecRule REQUEST_HEADERS:User-Agent “@pmFromFile badbots.txt” “id:350001,rev:1,severity:2,log,msg:’BAD BOT – Detected and Blocked. ‘”
    when I save I get this message:
    Error:The rule is invalid. Apache returned the following error: Syntax error on line 1 of -c/-C directives: Error creating rule: Could not open phrase file “-c/badbots.txt”: No such file or directory

    I guess it’s a path problem so I have no clue how to do it, can you tell me what to modify in your procedure please?

    Kind Regards,
    Mat

    • You can always use the full path / absolute path for badbots.txt so that whereever it is that dosen’t matter. But should be under /etc/httpd or /etc/apache2 folder.

  • I have a modsec2 directory in the same path as modsec2.conf – so maybe try creating the modsec2 directory and place badbots.txt in there. Other things you can do are review modsec2 config files to see if the directory modsec2 is refenced within and also try ‘locate modsec2’ to see if there might be another path where modsec2 resides.

  • Dear Krnlpanic.

    Thank you for this usefull string.
    I added it to my Modsecurity and it logs fine n the rror_log and the audit log but in the access_log it still gives a code 200.
    So I’m wondering if the bot is actually blocked since the code 200 indicates not.
    other rules are blocked with a 403.

    Is something wrong with my configuration?
    Using Plesk 12.5

    Thank you very much for your time!

    • Definitely looks like something’s not working right. You should be receiving 406’s, not 200’s.

      Here’s what an entry in my audit log looks like:

      Message: Access denied with code 406 (phase 2). Matched phrase "linkdex" at REQUEST_HEADERS:User-Agent. [file "/usr/local/apache/conf/modsec2/custom.conf
      "] [line "3"] [id "350001"] [rev "1"] [msg "BAD BOT - Detected and Blocked. "] [severity "CRITICAL"]
      Action: Intercepted (phase 2)
      

      And the corresponding entry in my access log:

      54.174.251.81 - - [02/Jun/2016:21:30:18 +0000] "GET /product_info.php?products_id=48043 HTTP/1.1" 406 - "-" "Mozilla/5.0 (compatible; linkdexbot/2.0; +http://www.linkdex.com/bots/)"