Crafted an ultimate bad bots list to block in a gist and example nginx, apache configuration to block them by user-agent.

Using robots.txt to block bots it might be good idea to start with. But ddding robots.txt might not be enough when they don't respect this and you want to go one step further to block the bad bots.

Block Bad Bots by User Agent Using Nginx

Refer to the example nginx.conf attached in the snippet, which block all these bad bots for any url. To block user agent for a specific path :

# checks for all bad bots only for /admin url

location /admin {
    if ($block_ua) { return 444; }
}

As the list is too big and you might not want to add this in your main nginx configuration, then you create a blacklist file and include it in your nginx config.

# file: nginx.conf
include /etc/nginx/blacklist;


# file: blacklist
map $http_user_agent $block_ua {
   default 0;
   ....
   ....
}

Blocking access by .httaccess file in Apache

Simply copy the content of apache.conf in the end of your .httaccess and enjoy. Anyway In case you want to configure on your own, I would like to share that you can use two method in apache to block bad bots.

# Method 1: RewriteEngine
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^SemrushBot [OR]
RewriteRule ^.* - [F,L]

#Method 2: BrowserMatchNoCase
BrowserMatchNoCase "SemrushBot" bots
BrowserMatchNoCase "FlipboardProxy" bots

Order Allow,Deny
Allow from ALL
Deny from env=bots

Testing your configs

# run in your terminal by chaning the agent name and site url
curl --head -A "CheeseBot" http://mysite.com

Nginx output

# You should have response like this
curl: (52) Empty reply from server

Apache Output

HTTP/1.1 403 Forbidden
Date: Sun, 05 Jul 2020 21:15:30 GMT
Server: Apache/2.4.38 (Debian)
Content-Type: text/html; charset=iso-8859-1

Caution

In this example list there are well known bots as well, but some people find them a little aggressive and not convenient for their server. Please check the list again before applying it to your server. Depending on your site, you might want to exclude some bots.