Voting is quite basic element of any site that claims to be social. This is a basic tool to get opinion of visitors on topics and works as viral marketing too. However, sooner or later your visitors will question if your script is secure enough to avoid fake voting bots. So let’s cover couple ways how to avoid bots voting for your topics.
Unwanted votes fall into 2 main categories: intentional and unintentional. Unintentional ones might be search engines that try visiting your pages and similar. Intentional ones are made by people that want to screw voting statistics. Unintentional voters can be avoided using robots.txt exclusion, though one should not rely on it. You will not avoid intentional voters that way.
First, the vote should be a form element or javascript link. The reason is quite simple – a regular link is likely to be visited by search engines and it will screw up your voting results. Also, it is quite easy to add a link to any command line browser script. Although basic checking of browsers is a good thing, it will not work – the browser variable can be set freely in many of these scripts and it will protect from search engines and not malicious voting. Still, blocking most of the popular site grabbers in your .htaccess file might be a good idea.
Putting a limit on votes from the same IP has problems on its own as well. Although this makes voting more secure, it is quite unsuitable for local sites. There are cases when there are many users behind common firewall and share same IP address. This would make your site unusable for them. Also, it is quite easy to change one’s IP address with proxy while writing script.
Checking cookies is the most popular way of handling votes, though writing script handling them is quite easy as well. The most interesting variation of such voting script was implanting voting cookie from an image in the voting page and later checking for the value in the voting script. This prevented most basic form of voters that handle single page only.
Also, it is a good idea to check referrer of the vote (aka page the user came from). If the referrer is blank, you can be very sure that the user is either bot, or came to vote directly.
Captcha is quite good at preventing false votes as well, though it does not protect from manual multi-voting. The problem is that it ads complexity to the voting process so less people are ready to express their opinion.
An alternative for captcha is using serials in the voting forms. These serials are checked with the value stored either in cookie or db upon voting. Sadly, this will not protect from better voting script.
So, how to choose right method for you?
Here some suggestions:
- Decide how much secure your form has to be. What will force people wanting particular result ? In some cases you do not have to use protection against intentional voters.
- Do not use captchas for simple votes – you will get less input on topic
- Upon vote, log everything into db: Time, referer, browser, ip, cookie value. Later you can use that data to see if some of the data repeats too often.
- It will not hurt to check server logs. Typical voting script will not bother to load CSS/images/JS. Look for queries from IP that does not load any images or scripts and just votes.
- Look for spikes in the data gathered.
2 Comments
Morten Skogly · May 18, 2009 at 7:42 am
Good post! I agree that it is more important to make it easy for legit, real human voters to participate than to block bots!
But I was wondering:
Do you have any tips for writing php / sql that filters the votes in different ways. My script logs ip and time of course, and usually people behave very well, but on a recent project we had at least two automated bots working the vote for a while.
I’m not sure what our filter should be, but perhaps we could allow 1 vote pr ip for every forth hour, but I don’t quite know how to write that as a select or combination of sql and php.
Would love to get some input if you have the time…
m.
Giedrius · May 18, 2009 at 10:41 am
SELECT count(*) from votes where ip =’$_REQUEST[REMOTE_ADDR]’ post_time > NOW() – INTERVAL 1 day; Would give the count of votes from same IP during 24 hours.