WordPress Spam Fighting

Until September 2014, I was mostly relying on Akismet to prevent comment spam on my blog. But some time last summer, I noticed that there were actually quite a few spam comments which weren’t identified as such. The reason for this was that the total number of spam comments reached about 1000 a day. So of course having a few of them slip through every day wasn’t actually much. But it was still a pain to go through them as I was only checking comments once in a while.

So I decided to find out what for methods I could use to prevent comment spam and not only rely on Akismet. This led me to write a plugin called WP Spam Fighter.

In this plugin, I’ve implemented a few methods designed to address different vectors used to create comment spam.

Spam Bots vs. Normal Users

First, most comment spam is not generated by humans but by bots just creating comments on thousands of sites. It makes it easier to trick them, since they are not designed to workaround all possible tricks used to identify them but to create as many comments as possible on as many sites as possible.

So how do you identify Spam bots ?

Well, first you have to understand how to identify a normal user:

  1. A normal user sees a rendered web page and not only the HTML code.
  2. A normal user actually reads your posts.
  3. A normal user also understands the fields contained in a comment form.

You should also note that the second characteristic of normal users also differentiate them from human spammers. Human spammers do see a rendered site and do understand the comment form fields but do not actually read your post.

Ok, so now we know how to identify a normal user, how do we use this knowledge to stop comment spam ?

A normal user sees a rendered web page

This basically means that a normal users will only fill in form fields which are actually visible on the rendered page. Spam Bots will fetch the HTML code and will not apply any CSS styles. So if you add a field in the form and make it invisible, normal users will not fill it but Spam bots might.

This leads us to a spam fighting method usually called a honeypot-based mechanism. Spam bots will usually go through all fields in the form and try to put in some value (since they do not know which fields are mandatory fields and which ones are optional).

The more the additional fields looks like a normal form fields the better the chances that even a half-intelligent Spam bot will not identify it as a honeypot.

A normal user actually reads your posts

Spam bots as well as human spammer actually do not care about the contents of your post. There is almost no targetted Spam which makes sure that the spam comments are added to a post which is actually really related to their comment. So even human spammers will just scroll down to the bottom of the post and enter some text in the form fields. Their goal is to produce as much spam as possible in the shortest period of time. So the chances are that they will try to spend as little time as possible on your page.

So a way to fight both human spammers and spam bots is to make sure that a comment can only be posted after the user has spent a certain amount of time on your page. How long this is basically depends on the type of content you have on your page. If you just post short jokes, it could be that a legitimate user posts a smiley as comment after only 20 seconds on your page. If you have long and complex articles on your site, the chances are that nobody will comment on your posts without spending at least a minute on the page.

This spam protection mechanism can be implemented by adding some JavaScript code which will return an error message when you try to submit a comment to early. Of course anything which resides purely on the client side is out of your control, so it can be circumvented by a spammer. That’s why you need to also have a check in the backend making sure that the JavaScript check has run (i.e. by adding data to the form before posting it).

A normal user also understands the fields contained in a comment form

Of course a human spammer also does. So developping a spam protection mechanism based on this will only help against spam bots not specifically targetting your site.

This just involves adding a form field and expecting a given value. It’s simplest form is having a checkbox labelled “I am NOT a spammer”. The most complex form is having some kind of captcha. Since I hate captchas, I’ve only implemented the simple checkbox option in my plugin. It’s not much additional work for a legitimate user and will still block stupid spam bots.

Additionally to this (or as an alternative), you can also implement a second similar mechanism involving automatically adding some kind of token to the form field using JavaScript (when the form is submitted). In the backend you then check the presence of this token. Spam bots ususally do not run JavaScript code on your page, so they will submit the form without this token.

Additional ways to identify legitimate users

If you want to make 100% sure that you do not get comment spam, there are a few additional methods you can use to prevent spam:

  1. Check whether a gravatar is associated with the provided email address
  2. Only allow registered users to comment
  3. Completely disable commenting

Of course, for a blog, I wouldn’t recommend disabling commenting as comments are often a valuable input both for you and for your visitors. Also many spam fighting mechanisms also make life more difficult for legitimate visitors who want to comment on a page. I personally hate Captchas and every time I get a captcha wrong on first entry or get a text I need to enter which I can hardly read, I just move away.

So forcing users to have a gravatar or registering in order to comment, could reduce the number of visitors who will actually take the time to post a comment.

Conclusion

Many of the mechanisms presented in this article are fairly simple and not at all bullet-proof. A human spammer would be able to workaround them all and it would also be possible to create a spam bot being able to work around them. What you need to keep in mind is:

  • Spam is usually not targetted. The spam comments you get are posted on millions of sites.
  • The goal of spammers is to get through with least effort.

This basically means that even though a spammer might implement a bot to workaround these mechanisms, why should he waste his time ? There are millions of sites out there and you are only one of them. Even spending an hour implementing a way to workaround you spam protection would be a waste of time.

So, of course, a human spammer could wait for 30 seconds before posting a comment on your site. But instead of spending 5 minutes posting 10 spam comments on your site, it’d make more sense to post 30 spam comments on another sites which doesn’t have this kind of protection.

On this site, I’ve configure the WP Spam Fighter plugin to enable the following mechanisms:

  • Time based protection
  • Honeypot protection
  • “Not a spammer” checkbox
  • JavaScript human check

And the results are not bad. After switching to this plugin from Akismet, the number of spam comments which went through reduced from 1 or 2 a day to 1 or 2 spam comments a month. Since I still do get a spam comment every 5 minutes (which is automatically marked as Spam by the plugin) and I only check the spam comment folder once in a while, it’s difficult to say whether there are any false positives. But there should actually not be any, since none of these mechanisms interpret comment data to identify spam. Moreover the number of comments I get on the site didn’t change after introducing this plugin.

And the great advantage versus Akismet is that this plugin doesn’t need to store or transmit any data related to the visitors of this site, which can be a problem in some countries.

Leave a Reply

Your email address will not be published.