Visual CAPTCHA (Completely Automated Turing Test To Tell Computers and Humans Apart) are used in most of the websites which allow users to sign up/register. They are used to allow only humans inside the system and deny access to any automated robot. In this post, I give an overview of CAPTCHA security and methods to break it.
This content may be out of date. Visit updated content at http://www.diovo.com/?p=12
This blog is being moved to http://www.diovo.com Please visit diovo.com hereafter. Thank you for visiting.
Working When there is a need to differentiate between a bot and a human, the website/system that is interacting with the user will present him/her with an image containing some text. The user should enter the text shown in the image into a text field and then the server will allow the user into the system. The basic assumption here is that the recognition of textual information from images is difficult for a computer, while it is easy for human beings.
eg: CAPTCHA image from google “Add your URL” Page (http://www.google.com/addurl)
Strength of a CAPTCHA
Strength of any particular CAPTCHA depends on the algorithms and parameters used for generating the CAPTCHA image. The different characters in the image are rendered in different ways in a CAPTCHA. Some methods used are:
- Translation of Characters(Changing Position)
- Scaling of Characters
- Rotation of Characters
- Adding Background Clutter
- Adding Foreground Clutter
- Local Warp
- Glabal Warp
- Intersecting Random Arcs
- Non-intersecting Random Arcs etc.
All these methods are used to make recognition difficult for an automated bot. But generally, all these methods increases the difficulty in recognition for humans also.
Breaking a CAPTCHA
A very interesting thing to note is that computers are far better than humans in single character recognition. See the research paper “State of single Character Recognition” [by Kumar Chellapilla, Kevin Larson, Patrice Simard and Mary Czerwinski of Microsoft Research] for details. According to this research, computer based system can recognise characters in any CAPTCHA system better than humans. The only problem is that it is about single character recognition. Humans are better than computers in segmentation (breaking up an image into smaller segments containing single characters). But this too may change by advancement in technology.
So this means that if we can do segmentation (Retrieve portions of image containg single characters) in an image, we can say that we have succesfully cracked a CAPTCHA. For recognising the characters, we can use conventional neural networks. Contrary to general belief, neural networks are not that difficult to master. They are very simple to implement too. Thus, breaking a CAPTCHA essentially boils down to the problem of segmentation.
In my following posts, I intend to find methods to break CAPTCHAs from some popular websites.