![]()
By Dr. Eric Glover, Classification Architect, Searchme
We have all seen the CAPTCHAs that pop up before we can do something on a site that we’ve never used before. They are often in the form of “bent text” mixed with some type of distraction (a wavy line, background image etc.)
Why have CAPTCHAs? Imagine that a web site has an online blog that allows anyone to comment. It is easy for spammers to automatically generate thousands of comments, possibly outweighing the legitimate comments. This motivates the need for some easy technology that can tell humans apart from computers. CAPTCHA, or “Completely Automated Public Turing test to tell Computers and Humans Apart,” currently trademarked to Carnegie Mellon, is designed to define this exact task.
CAPTCHAs and comment spammers are engaged in an ongoing war, with the CAPTCHA creators looking for ways to vary the images before spammers can create way to defeat them. (See the recent boingboing.com << http://www.boingboing.net/2008/07/15/apres-captcha-le-del.html>> article.) In general, however, why are these seemingly simple CAPTCHA “tests” so hard for computers and so easy for people?
CAPTCHAs take advantage of the differences in how humans and computers approach particular problems – in this case a symbol identification task. For the challenge of “which letters or numbers do you see in this image,” computers usually employ a few typical approaches to determine what “letters” are in an image. For example, OCR (Optical Character Recognition) is often used to pull the words out of printed documents or faxes and put them into a text document. Typically OCR works by identifying a region of interest that might contain a word, and then selects the word(s) that are most likely, given the pixels in the selected region.

Roughly speaking, a computer using the current algorithms operates from the bottom up – it examines the individual pixels and then decides which of all of the possibilities is the best match. If you have a big letter “E” (like what you might see on an eye exam at the DMV), the computer sees a bunch of pixels/lines and estimate probabilities – say 90% “E”, 40% “F”, 10% “8” – and then decides whether it is confident enough to declare this an “E”. On the other hand, if you put up a picture of a person’s face, it would say that there is no letter there, since the probabilities are too low.A human, however, is very good at recognizing patterns and taking advantage of context. So imagine that the “E” is superimposed on a background of squiggly lines. A human sees squiggly lines and the letter “E;” but the computer might not be able to see the big picture and get confused by the pixels near the “E.”
In the case of the popular CAPTCHAs, the generator takes advantage of particular manipulations that do not make the problem hard for humans but do break the ways in which OCR works. Common techniques include adding “noise” (such as a extra lines or a background image), rotating the letters (in many dimensions), crowding (putting letters too close), and using non-letters (i.e. pictures of cats vs. dogs).
Each of these methods interferes with the specific ways that OCR systems operate. When two letters overlap (due to crowding), the OCR software has a very hard time separating out the letters and can’t tell them apart. Combining crowding with rotating makes it even harder for a computer to see that a particular segment is part of a different letter because of the angle. Adding noise, either in the form of backgrounds or extra lines, is easy for a human since a person sees a line and subtracts it in his or her head, but a computer sees a region of pixels and cannot see the larger context. Likewise, with a background – inserting a big letter “E” on a picture of a person – a computer gets confused and doesn’t know where to look; it can’t tell which pixels are part of the letter and which are the face.
In order for a CAPTCHA to work, however, it is important that there be a sufficient number of variations of the “interference.” For example, if a site only adds two-pixel-high horizontal lines, a human can program the computer to look for lines of exactly two pixels and remove them the same way that a person does naturally. Likewise, if a CAPTCHA only picks dictionary words, it is very easy for the computer to “generate and test” all dictionary words and guess which is most likely, instead of trying to read each letter.
Many CAPTCHAs have been broken, some by using humans (promising people free stuff if they answer a CAPTCHA from some other site), and some by using machines (taking advantage of the specific way a particular CAPTCHA is generated, as discussed above). In addition, making a CAPTCHA system fail does not require getting the right answer 100% of the time; getting it one in ten times by guessing can still allow for a lot of spam.
In sum, CAPTCHAs are hard for computers because the current algorithms used for OCR operate at the pixel level and try to match pre-defined patterns or shapes. This makes computers poor at handling variations of letters or detecting their presence in specially-crafted noise, whereas a human can look at a whole image, apply context and recognize letters through noise. It remains to be seen whether or not artificial Intelligence will ever catch up.



















