Simple WordPress anti-spam for shared hosting

I hate spam! The e-mail based stuff is bad enough (and much worse for the past few months) but it’s also invading web sites. WordPress, being such a popular package, is a major target for referrer and comment spam. In case you haven’t already come across these, they’re both techniques by which spammers attempt to place links on your site to their grimy porn / pharma / small caps or whatever sites.


Now at the simplest level these attempts are easy to block by ensuring that logs are not published and all comments must be approved before publication. But comment moderation can become a chore if you’re trying to sort one genuine message from among hundreds of fake ones, so there are some nice WordPress plugins available to help – notably Akismet, Spam Karma 2 and Bad Behavior. These three plugins each take a different approach to dealing with the problem, and they all seem to be effective. I’ve only tested Akismet and SK2 so far – both work well.

So I was reasonably comfortable with the situation (complacent?!) until I saw this thread on WHT titled Drummed out by spambots. Here’s someone with a small WordPress blog and similar anti-spam measures in place, and the sheer volume of spam requests has caused the host to suspend the web site. Now I don’t blame the host for this – if a single site is overloading the server you have to shut it down for the sake of all the other customers – but that’s not much consolation for the innocent victim of the spammers. Remember, nothing was fundamentally wrong with the blog – the spam was not published, the site was not exploited, the anti-spam system was working correctly but the server was simply overloaded by the job of handling comment-posts at an average rate of about 1 every second! In effect, it’s an amplified DoS (denial of service) attack.

For me, it’s an early warning. I run several WordPress blogs, both for my own projects and for client web sites and I really don’t want this to happen to any of them. Fortunately the discussions in that thread led to a participant “extras” proposing what looks to me like a viable solution. This post describes the implementation.

What’s wrong with existing solutions?

Generally, not much at all, but in this situation the processing power required was too great. I’m looking for a simple solution that takes minimum processing power – so no PHP.

Principles of the new system

In wordpress, visitor comments are posted to the PHP script wp-comments-post.php. Writers of evil comment-spam-bots know this and will generally go there directly, unlike real visitors who wish to comment on a post – they will always view the page first. So the aim is to change the name of that script (for genuine visitors), detect all bots going to the original script and ban them!

Implementation

So step 1 is to send real visitors to a different script – I’ll call it “my-wp-comments-post.php” but if you try this yourself choose a name of your own.

So I’ll make a copy of wp-comments-post.php called my-wp-comments-post.php in the main wordpress directory, and in my template (usually comments.php and comments-popup.php) I’ve changed the line:

<form action="<?php echo get_settings('siteurl'); ?>/wp-comments-post.php" etc.

to this:

<form action="<?php echo get_settings('siteurl'); ?>/my-wp-comments-post.php" etc.

Now, I might allow people a day or so to refresh their cache but after that I’m going to assume that anyone using the original script is an evil spammer and block them. To do this, using the idea suggested by extras in the post linked above, I’ll replace wp-comments-post.php with a new script that records the visitors IP address and then sends them an error page (403 Forbidden). I’m going to record the visitor’s IP address by creating an empty file, for reasons that will become apparent later. Here’s the simple code:

<?php
$blockdir = 'my-blocked-ips';
$ip = preg_replace('/[^d.]/','',$_SERVER['REMOTE_ADDR']);
$file = "$blockdir/$ip";
if ( $h = fopen($file, 'x') ) fclose($h);
header('HTTP/1.0 403 Forbidden');
exit;
?>

I’ve also created a directory called my-blocked-ips (again choose a different name if you want to do this) and made it writable by PHP – chmod 777 for a mod_php system. So, each hit on the old comment script records the visitor’s IP address. Now to do something with it…

.htaccess modifications

What we want to do is find the visitor’s IP address, look for a file of that name in the my-blocked-ips directory and if it’s found, block all access to the site. Here’s the mod_rewrite black magic to do it:

RewriteEngine On
RewriteBase /
#Use simple html error document in place of shtml standard
Errordocument 403 /403.html
# Ban blocked IPs (recorded as files in blocked directory)
RewriteCond %{REQUEST_URI} !^/403.html
RewriteCond /full/path/to/my-blocked-ips/%{REMOTE_ADDR} -f
RewriteRule . - [F,L]
# Prevent direct access to blocked directory
RewriteCond %{REQUEST_URI} ^/my-blocked-ips/
RewriteRule . - [F,L]

First I’ve set up a special error page for 403 errors – one that requires very little processing, so just a simple very small HTML page. Then the redirect, based on finding the visitor’s IP in the block list, and specifically excluding our new error page. Finally, just as a precaution I’ve blocked direct access to the blocked-ip directory (as it’s server-writable I don’t want anyone to place a PHP file in there and run it).

Cleaning up

It’s probably a good idea to periodically remove the files from my-blocked-ips, partly to improve performance checking for a file in that directory but mainly because innocent people on dynamic IP addresses could be unfairly banned – say if the IP address happens to be one previously (ab)used by an exploited computer. So a cron job to periodically un-ban older addresses would seem useful, something like this:

find /full/path/to/my-blocked-ips/ -type f -maxdepth 1 -mtime -2 -exec rm -f {} ;

Afterthought

Ok, I should have thought of this before but I don’t want this thing blocking the Googlebot (or any other search-engine spiders). If everything goes right it shouldn’t happen anyway, since there are no links leading to that page, but just to be on the safe side I’ve added this to robots.txt:

User-agent: *
Disallow: /wp-comments-post.php

Well-mannered bots will read that and not touch the blocking script. Spam-bots will most likely ignore it.

Does it work?

Too soon to tell. This site (like most WordPress blogs I guess) does receive a fair amount of comment spam – Akismet is showing 81 comment spams from the past 12 days. I’ll keep an eye on the system before and after the change and update this post later. I think success hangs on one important question: do the spam-bots actually look at the comments form in the pages or just default to the WordPress standard script? I’m assuming the latter, but if I’m wrong then the comment spam will continue coming in on the real contact form, in which case something smarter will be needed – perhaps based on the Bad Behavior plugin. We shall see.

Meanwhile, if anyone has relevant comments I’d be interested to see them (if only so I know the site is still working!)

Update (22 Jan 2007)

First, doing this requires some care to avoid either blocking legitimate users or letting the spam through unabated. With the site updates to version 2.06 and then 2.07 in quick succession it went through a short period in both of those states, so if any genuine visitors tried to post a comment and got themselves blocked for a day or two – sorry, mea culpa. I now have it set up with no modifications to the standard wordpress files or directories (except for my own template) which should allow for painless upgrades.

(If anyone’s interested in the file system set up let me know and I’ll post the details).

All that makes it hard to analyse results but it appears to me that:

  1. During the period when the system was working correctly, very few spammers were caught and blocked – suggesting that in most cases they aren’t simply going to wp-comments-post.php without checking that it’s the target of the comment form.
  2. The rate of comment spams overall has reduced dramatically. Even when the system was effectively disabled the site received 31 spam comments in 15 days, about one third of the rate earlier.

The best way I can interpret these rather conflicting results is that comment spammers may look for a standard WordPress install – anything non-standard and they simply move on to the next victim. If so, reducing comment spam may be easier than I thought…

I’ll update with some results after the system has been in action (and working) for a while longer.

One Response to “Simple WordPress anti-spam for shared hosting”

  1. Brian Says:

    Hmmm … some great stuff here, thanks for the resources; hadn’t realized that it was just so easy to install Akismet/Spam Karma etc!

    Re the site getting kicked off the server for 1 spam attempt per second – that’s extreme, and shows that the host didn’t know what it was doing, at least to some extent. I haven’t tested this out, so I’ll pull my head in if required, but I can’t imagine that a simple spam block should consume much resource. Perhaps they were doing an Akismet check (possibly resource-heavy) first and putting something else in front of that might have knocked off much of the attempts. I do remember hearing somewhere else that changing the comment form name was enough to knock off spam and that’s a neat trick; there are some similar things that can be done with MX records to kill off spam (‘nolisting’).

    There are some nice free firewall products available now which should block such attempts automatically (CSF); mind you, I’m not sure if they do, but it could save a server quite some CPU time if it was able to proactively firewall such IPs.

Leave a Reply