Escalating the war on comment spam
Well it had to happen sooner or later. I guess. Someone finally managed to get through my spam defenses. They were really pretty basic. An experiment really. I had:
- renamed wp-comments-post.php
- html-entity encoded the comment form’s action attribute (the value being my renamed wp-comments-post.php)
- added a hidden input field to the form that only got written to the page via JavaScript
- updated the renamed wp-comments-post.php to check to make sure that that hidden field had been submitted and had a value
I did all this because I really don’t like captchas (and because I could). I don’t mind requiring JavaScript in order to post a comment. It’s a pretty clever way to prove that a human is sitting at a fully featured web browser submitting the comment because bots don’t usually come with JavaScript parsing engines—the overhead is too great.
The reverse strategy, using a negative captcha (which I haven’t yet employed), involves adding an additional input field directly to the form and then hiding it with CSS. If it gets submitted with any content, you’ve caught yerself a spambot.
The first batch of 90 spam comments that came through (the largest I’d ever received at once) all had the same IP address, so it was easy to blacklist. The second batch all had different IPs (wow, a coordinated attack!) but all the spam linked to hometown.aol.com subsites. Easy block, but one that made me uneasy about the possibility of blocking legitimate content.
Then yesterday morning, the spam was coming in as I was checking my site (all linking to ca.geocities.com sites). Each time I reloaded there was something new. This provided me with the perfect spam laboratory. First, I renamed my already renamed wp-comments-post.php file, re-encoded the new name in HTML entities, and updated the comment form. It had no effect. I tailed my httpd access logs and the spammers instantly started POSTing to the new script. Which means they’re smart enough to regex the page they’re spidering for the form’s action attribute—and decode it.
In order to get past my hidden JavaScript captcha, I can only speculate that their regexes were just liberal enough to gather that they needed to POST something to my very unobfuscated hidden input field within a document.write
statement. If I’d had more time, I would have written the full POST array to a database table to get some more insight into what exactly they were sending my way.
So I upped the ante by creating an external JavaScript file, that looks like this:
var commentform = document.getElementById('commentform'); if (commentform != null) { hidden_input = document.createElement('input'); hidden_input.type = 'hidden'; hidden_input.name = 'some_name'; hidden_input.value = 'some_value'; commentform.appendChild(hidden_input); }
It gets called and executed after the browser parses my comment form, which creates a new input element in the document’s DOM, sets its type, name, and value attributes, and appends it to the form. I then updated my renamed wp-comments-post.php to check for the hidden field and its expected value.
Take that spambots. Have a nice weekend.
From Nate: disable spambots by utilizing javascript to check for “humanity.”