Thu Mar 8 20:54:44 CET 2007

More on PHPBB anti-spam measures.

Time for a little update on spam posts on PHPBB. As I have explained three months ago, we had a spam problem on our PHPBB forum. I installed eXtreme anti-spam, and gave my opinion on CAPTCHA.

So, here we are three months later and the jury is out: we had two cases of spam in three months, and that because we accidently left two fora open for anyone to post. I.e. 0 case of spam in 3 months as opposed to several a day. eXtremem anti-spam works, and works very well.

You'll need to read my previous post to understand the rest of this.

Gavin posted a comment on improving CAPTCHA:

We have been chatting about this at work and one of the guys came up with the idea of a moving set of letters to fool the OCR algos. Would require some sort of animated gif or flash movie.
This may seem like a good idea to me, but I think it's missing an important point.

To me, a good anti-spam system is one that is installed on every PHPBB, shipped with the standard distribution (like the simple CAPTCHA system is at the moment). I think it's ok to require some customisation from the user, as would be the case with a built-in eXtreme: you'd have to put a few pictures and questions in. This is fundamental, otherwise (as pointed out by CAPTCHA-fans) spammers will just teach the built-in pictures and questions to their spamming engines and the system will break.

This is why I think any CAPTCHA-based system is doomed. As soon as it's shipped with the standard PHPBB distribution, you can trust the spammers to start to try and break whatever system will be built in.

With eXtreme, no two sites would have the same pictures and questions. Unless they manage impressive research into AI, it just cannot be broken (and if they manage to break it, me thinketh the spammers would get better pay at any research lab :) ). With CAPTCHA, at best you'll end up with a customisable set of blurring algorithms. Gavin's colleague's idea is really just another blur.

And that's the problem: it's trivial to read a GIF frame by frame, and then you fall back in the usual OCR problem, which we've established can be made to be better than humans. A Flash movie might be vaguely harder to crack, but at the end of the day if you browser can display it, so can the spammer, and if your eye can recognise letters in it, so can a trained neural network (and that's not highly advanced AI).

Hence, I remain on my opinion that CAPTCHA cannot work, and that spam must be solved by some easily customisable challenge system.