One of the reasons I don’t provide a comment facility is that it is very hard to prevent spammers from abusing your weblog. The easiest way to prevent spamming is to work with a registration system: you can only post a comment if you have registered. This has the disadvantage that anonymous people, or people who accidentally came at your site and found something interesting, can’t post comments.
One method of making it possible for anonymous users to post comments is asking the user to type in the text appearing in a so-called CAPTCHA. But if the captcha is too simple the spammer’s automated systems will recognize the text in the picture and are then still able to log in and put spam in the comments. Captchas also have the disadvantage that people that have to rely on, for example, braille readers, cannot post anonymously.
Apart from the braille readers I think the captchas can be best implemented in the following manner:
- Make sure that the letters in the captcha are all of a different color and are overlapping. Overlapping letters make it harder for OCR software to guess the ‘word’.
- Don’t always ask the user to type in the whole word, but rather one of the following questions:
- Type in all letters from the captcha (mentioned that already, just being complete).
- Ask the user to provide a subset of the letters in the captcha. For example the first and the third letter. The subset should not be fixed, but be chosen at random.
- Ask the user to provide the color of one or more of the letters in the captcha. Since not all the weblog readers will be native speakers of your weblog’s language, it would be best to provide check boxes so that language mistakes can be prevented.
This ofcourse combined with the possibility to create user accounts so that if a spammer succeeds in breaching these barriers, the decision can always be made to shut off anonymous posting.
Another possibility is to also work with a blacklist that contains words and sites that are not allowed to appear in a comment.
Creating captchas on the fly might not always be possible. Especially not since I will be relying on Bash. So it might be a nice idea to write a tool that can create a lot of captchas and a list with the text and colors that are used in each off-line. These can then be sent to the site and the comments plugin would only have to choose a random captcha and a random question (see above) and check the answer with the information from the file. If these ‘standard’ captchas are refreshed on a regular basis it should provide some nice protection from the spammers.
It seems that the tools from ImageMagick provide all the functionality needed so I will have a look at them this weekend.