Strong CAPTCHA Guidelines
Jonathan Wilkins - <jwilkins[at]bitland[dot]net>
December 21, 2009
An introduction to developing secure CAPTCHA (Completely Automated Public Turing test to tell
Computers and Humans Apart)1 systems. In addition to describing common weaknesses in CAPTCHA
puzzles, focus is placed on the system as a whole, including replay detection and attack detection.
When abuse is detected on a site, CAPTCHA seems to be the knee jerk response to limiting it. Developers
have seen the typical warped character type of challenge currently in common usage and jump to implement
one of their own quickly.
Here’s a sample used by a WiFi Hotspot.
Simply running it through ocropus2 yields the correct text:
j w i l k i n s @ s i l e n c e : ˜ $ h o c r=0 ocrocmd h o t s p o t −c a p t c h a . j p g
These easily OCR’ed puzzles usually work for a period of time, depending on what asset is being
protected. Most small sites using a commonly attacked message board software could eﬀectively protect
themselves with a single hard coded question like ’What color is an orange?’. As long as attackers have no
real cause to examine that particular site, the scripts they use to post spam in comments will break and the
forum will remain spam free3.
2See a description in appendix A
3Though this wouldn’t properly be called a CAPTCHA.
It is for everyone else that the following content is written.
There are three major components involved in building a strong CAPTCHA solution. First, the basis
for the puzzle or challenge must be something that is truly diﬃcult for computers to solve. Second, the way
puzzles and responses are processed must not introduce any ﬂaws. Lastly, the system should have adequate
logging so that it is easy to determine when an attack is happening and what the nature of the attack is.
It is also worth realizing that CAPTCHAs are not the solution for every problem.
At best, they
increase the cost of a given task to that of paying people4 to solve the puzzles and the overhead to manage
this process. However, for completely unauthenticated transactions, they’re an important tool and can be
The puzzle must be very diﬃcult for computers to solve and relatively easy for humans. Simple character
recognition isn’t one of those problems, despite the fact that developers and users are so used to seeing it on
the sites they frequent. On those sites that are successfully using warped text, the real problem preventing
scripting is segmentation5.
Software is still not as good as humans at determining where one character ends and the next one
begins. The basis for a strong text based CAPTCHA is ensuring that segmentation is hard. In fact, once
segmentation is solved, computers are much better at recognizing individual characters than people are6.
This means that characters should have some overlap and any decoy lines should be the same thickness
and texture as the lines used in the letters. They should also run in the same direction as the strokes that
compose the letters. Many people use techniques other than overlap and decoy lines to obfuscate challenges.
The general approach seems to be to generate random text and apply a grab bag of simple ﬁlters or image
processing operations. Many of these alterations are quite simple to reverse and only serve to confuse
Noise that doesn’t resemble the text
Many of these operations are easily reversible or have other issues. One more amusing one created a
very noisy colorful image where the widely spaced and unwarped letters to be recognized were outlined in
black. It also happened that the outlined characters were the only things that were black. Given this it is
a trivial task to eliminate all non-black pixels and run it through oﬀ the shelf software to have a very high
4whether directly or through access to resources
5 http://research.microsoft.com/∼kumarc/pubs/chellapilla hip05.pdf
6 http://research.microsoft.com/∼kumarc/pubs/chellapilla ceas05.pdf
Ocropus has no trouble with this third version:
j w i l k i n s @ s i l e n c e : ˜ $ h o c r=0 ocrocmd t e c h n i c o l o r . j p g
Others employ noise lines that were much thinner than the challenge text. By applying two basic image
processing techniques called erode (which thins the edges of all objects) and dilate (which thickens them)
software can automatically eliminate these lines7. What happens is that the erode eliminates all thin lines
and then dilate mostly restores the original thickness of the characters.
Thresholding is another trivial operation which is very eﬀective at removing common types of noise. By
setting a given value as a dividing line between black and white (all values below become black, all above
become white) most color noise is eliminated.
Ocropus gives us:
j w i l k i n s @ s i l e n c e : ˜ $ h o c r=0 ocrocmd t h r e s h o l d −a f t e r . png
a b c d c i g h
which isn’t perfect, but it hasn’t been trained on this font. The more important thing is that it had no
diﬃculty with the segmentation.
7These are called minimum and maximum in Photoshop
Modifying the Whole Image
Many CAPTCHA puzzles place a random string on a given background and then apply a simple warp
to the whole image. Certain types of warp are easy to reverse. For instance, a spiral type of warp can be
removed by applying its inverse. If this spiral is often used, an attacker can simply automatically try various
values and see which one yields the best result.
j w i l k i n s @ s i l e n c e : ˜ $ h o c r=0 ocrocmd s w i r l −r e v e r s e . png
Some CAPTCHAs which employ noise lines to make segmentation harder still allow excess spacing
between characters. This allows an attacker to perform a rough slicing attack where they look for bounding
boxes of an approximate size (the size of the average individual character plus a small fuzz factor for warping)
at oﬀsets from the edge of the last bounding box and then reducing where possible. In puzzles that have
noise lines that are thinner than the actual characters this can be quite eﬀective, especially with erode and
Knowing the above, it is possible to eliminate the reversible techniques commonly used to obfuscate a
challenge (which pose no burden to automated attacks, only legitimate users) and yield a challenge that is
both easier for humans to read and increases diﬃculty for software.
Designing strong puzzles
Rotation and warping of individual characters
Rotation and warping of individual characters seemed to provide the best resistance to OCR when
combined with overlap and this is backed up by prior research8.
By applying these to each character
individually, the attacker is forced to perform segmentation before being able to try inverting the function.
Creating each character (and decoy lines) individually on a transparent background, applying warp and then
compositing gives the best results.
The following examples use warped character segments for noise to make feature detection much less
eﬀective. For the more diﬃcult examples, it is believed that OCR has such a low solve rate that requiring
7/8 correct characters is acceptable.
In order to ensure that the diﬃculty of OCR is as high as possible, it is important to make sure that there
are many possibilities for each character. For instance, if the character set is [0, 1], the recognition task is
much simpler for software. Use of a full alphanumeric character set is best though typically some characters
are eliminated for usability. Some sites eliminate vowels to avoid oﬀensive terms randomly appearing. Others
eliminate characters that are very frequently confused such as the number 0 vs the letter O.
To increase usability it is helpful to provide the user with some guidelines for the puzzles such as saying
that the characters are not case sensitive and there are no numbers. This doesn’t decrease the diﬃculty for
automated attackers as they will be able to determine these rules through observation.
A recommended character set is included in the appendices.
Font selection can also introduce ﬂaws. Fonts can be broadly divided into serif and sans-serif fonts. Serifs
are the small features at the ends of individual characters. Some serif fonts have unique features for certain
characters that can make recognition much easier. In the below image, the Q, J and also C are particularly
susceptible to this sort of analysis.
For this reason, Sans-serif fonts are recommended.
Use of dictionary words make analysis much easier. When OCR can extract the majority of the characters
it is a simple matter to run the result through a spell checking library and choose the most likely suggestion.
This is especially true as the words get longer. The word ’they’ has many words within a one character edit
distance9 such as ’them’, ’the’, ’then’ and ’whey’. Xenophobic has fewer. This is a feature of most languages,
Use of a Known or Public Source of Puzzles
Using a publicly available source of puzzles has its hazards. For instance, if the puzzle revolves around
answering common sense questions, such as ’What sounds do dogs make?’, a system like OpenCyc10 may be
integrated. If a custom set of questions or images are used, the diﬃculty can be reduced to that of rebuilding
the knowledge base. Since this is less diﬃcult11 for a spammer than for the developers building the original
data set, this is not an ideal solution.
It is also critical to remember that an attacker is often perfectly happy with a very low automated
solution rate. Even if they are only able to solve 1 in 100 challenges automatically, they are content to just
throw resources at the puzzle since most of the resources they are using don’t belong to them in the ﬁrst
place. This is due to the low cost of compromising computers and building or renting bot nets. Given this,
the attacker doesn’t have to rebuild a complete set of solutions, just enough to get this minimal success rate.
For instance, with a 10,000 machine botnet (which would be considered relatively small these days),
given broadband connections and multi-threaded attack code, even with only 10 threads per machine, a
0.01% success rate would yield 10 successes every second, which would provide the attacker with 864,000
new accounts per day if they were attacking a registration interface.
Audio CAPTCHAs are often used to make puzzles accessible to vision impaired users. This generally
doesn’t satisfy relevant accessibility requirements and legislation as users may have multiple impairments.
Audio CAPTCHAs are also generally more susceptible to automation as speech recognition is a simpler
problem than image segmentation.
Depending on the site’s relationship to users and how much information is available, other methods may
be much more eﬀective while avoiding the weaknesses of an audio challenge. For instance, automatically
phoning the user or sending a text message12 may be reasonable for protecting certain assets if the user is
9Also known as the Levenshtein distance.
11Since spammers are generally using low cost labor overseas, sometimes employing rooms full of people who are simply
typing in CAPTCHA solutions.
12Here the system would be relying on the expense and diﬃculty of replacing a phone number and would, naturally, require
a mechanism for disabling accounts and blacklisting the associated phone number if they are later caught abusing the service.
unable to complete a visual CAPTCHA. Here, authentication essentially replaces CAPTCHA.
The goal is to make the eﬀort required by attackers higher than the eﬀort needed for solving the visual
CAPTCHA while avoiding exposing the site to a DOS if the spammer were to shift their attention to the
The usual warnings about randomness apply. Use a strong source of random numbers instead of weaker
sources such as the standard C library rand() function. Also important to note is that using modulo on
random numbers is usually unsafe. Imagine using a six sided dice to generate numbers between 1 and 3
(1D6 mod 3). This is safe, as there is still a random distribution of numbers, however using that same dice
to generate numbers between 1 and 4 (1D6 mod 4) will yield twice as many 1’s and 2’s as 3’s and 4’s.
Case Study: Breaking reCAPTCHA
reCAPTCHA takes two words scanned from books that their OCR engine is unable to identify and then
adds noise and warping, composites them then displays the resulting image to the user. They use a basic k
of n system to ﬁgure out whether the submitted answers are correct:
Each new word that cannot be read correctly by OCR is given to a user in conjunction with
another word for which the answer is already known. The user is then asked to read both words.
If they solve the one for which the answer is known, the system assumes their answer is correct
for the new one. The system then gives the new image to a number of other people to determine,
with higher conﬁdence, whether the original answer was correct. 13
The reCAPTCHA system breaks many of the above guidelines. Firstly, they’re using English text with
a few exceptions such as where a word is broken across two lines (Ele- phant for example). This means that
we have a pretty easy way to check whether a given OCR operation has been successful.
They’ve further weakened the system by allowing oﬀ by one errors to be accepted as correct. This saves
us in the case that a line traverses a letter in such a way as to make it problematic to guess, ’lone’ vs ’tone’
for example. Skipped characters are also acceptable, for instance, when the challenge is ’base deﬁned’, ’base
deﬁne’ is accepted as is ’bass deined’. An edit distance of 1 for each word is accepted. One can even skip a
word entirely, if the word is short.
For the above challenge ’previous’ is an accepted solution.
The noise lines are generally close to the same weight as the font but they are generally only horizontal.
Certain erode/dilate matrices are very eﬀective at removing only horizontal lines.
I downloaded and hand solved 200 reCAPTCHA challenges in early 2008. Since the words are widely
spaced, it is easy to split the challenge into it’s component parts. Word separation was performed using
the blobs.rb code in Appendix E. Then code was developed that attempted various erode/dilate matrices
on each individual word before OCR’ing14 the result. The Levenshtein distance between each guess and the
correct answer was recorded. I then took the best matrices and used them to apply to new challenges.
The solver code iterates through the above described list of eﬀective matrices and stores each answer.
The answer list is then checked against a word list and if the word is known, it is stored with a count of
how many times that result was given by the OCR engine. If it is not a known word, then it is run through
aspell and the most likely guess is chosen and added. After each matrix has been attempted, the resulting
words and their counts are evaluated. The longest word with a non-trivial count is usually correct. If there
is no likely candidate, this challenge can be discarded and the next one attempted. Since the reCAPTCHA
system is using dictionary words as well as the names of people and places (aside from word fragments like
the previously discussed ’phant’), if it doesn’t exist in our word lists, it is probably incorrect and it’s worth
not submitting so as to not draw attention to our submissions.
Here is some sample output from the solver:
budget ( 192)
budget ( 192)
widget ( 2)
Longest w/ count:
budget ( 192)
premier ( 42)
premier ( 42)
premier ( 42)
Longest w/ count:
premier ( 42)
Two typical successes.
14 With Ocropus
85 ( 33)
momma ( 1)
Longest w/ count:
85 ( 33)
Here we have a complete failure. ’25’ is short and while it did have some hits, it wasn’t a solid guess.
’telephoned’ had no hits.
defenders ( 1)
norman ( 25)
defenders ( 1)
Longest w/ count:
norman ( 25)
”con-” is only a word fragment so we don’t have any likely solution, though we did get defenders.
Interestingly, it wasn’t the matrices that removed horizontal lines entirely that were most eﬀective.
Training a new OCR engine against these segmented and damaged characters will likely result in an even
higher success rate than simply using the Ocropus engine.
Short words were the most problematic. If the solver saw ’it’ as ’if’ due to noise lines, it was very likely
to consistently get it wrong. For this reason, short words were rejected and a new attempt was made instead.
Running against 200 challenges, this method solved 10 correctly - a success rate of 5 percent. It further
got one word correct in 25 other cases. If we presume that in half the cases the failed word would be the
unknown word for reCAPTCHA, this gives us a total success rate of 17.5 percent.
Also worth noting, ocropus alone solved 0 of the 200 challenges. When ocropus was provided with the
challenge split into single word portions it was able to get 5 single words, a success rate of 1.25 percent.
Some changes were made to the reCAPTCHA system since this analysis was originally performed.
However, it appears that the changes that have been made weaken rather than strengthen their system. The
major change has been to eliminate the line that ran through the challenge. When ocropus was run against
100 challenges fetched on December 16th, 2009, after they were split into their two halves, it was able to
solve one word out of two in 23 cases (23/100). This is much higher than the 5/200 base line solve rate seen
Some analysis was also done against the live service. 40 puzzles were requested and passed through
tesseract without doing any of the erode or dilate that was needed against the original data set. The output
was passed in as a solution without even the beneﬁt of aspell. Of 40 attempts, two submissions were accepted
as correct. This equates to a 5 per cent success rate for simple OCR, which conﬁrms that the new puzzle is
in fact weaker. While it remains possible that reCAPTCHA is doing something on the back end to prevent
large scale automated solutions, the strength of the puzzle does not appear to be part of the equation.
Generation and Processing
The second part of a secure CAPTCHA system revolves around how challenges are generated and
processed. Processing should disallow replay of previously submitted puzzles, prevent multiple guesses of
the same puzzle and limit the lifetime of all puzzles.
There’s a common desire to reduce the cost of a CAPTCHA system by avoiding any database access.
CAPTCHA puzzles which place the solution in a cookie after a simple (reversible) encoding are all too
common. Since this is sent to the user, it can be unencoded far more easily than attempting OCR.
Further, if a hash is used instead of a simple encoding, it is susceptible to a dictionary attack. For
example, if the challenge is only 4 bytes long and consists of lowercase a-z, then an attacker can trivially
build a list of hashes and their values, enabling the attacker to simply recognize the hash rather than having
to OCR the image. 15
Even if the solution is properly protected (encrypted in a cookie using AES in CBC mode, with a random
IV, HMAC and using a key known only by the site and only used for CAPTCHAs), as long as there is nothing
keeping track of which puzzles have already been submitted reuse of previously solved puzzles is an easy way
to automate the system. In order to avoid replay attacks (where a solved puzzle is used repeatedly) a list of
solved challenges must be maintained16.
When answers are submitted they must be checked against this list and rejected if they have been
attempted already. If there is no list, then an attacker only has to solve a single challenge to continually
bypass any protection the CAPTCHA is supposed to provide17.
By not maintaining a centralized list of attempted CAPTCHAs, an attacker gets an immediate multiplier
equal to the number of machines used to process responses. This is because any solution will be accepted
15While salting the hash or using multiple rounds of a slower algorithm can help, it is better not to put yourself in the position
of having to worry about this attack.
16Alternatively, a list of outstanding valid challenges may be used.
17A timestamp alone may be used to limit the lifetime of a given puzzle, but an attacker can resubmit a solved challenge
hundreds of thousands or millions of times in a few minutes.
- The Puzzle
- Common Weaknesses
- Noise that doesn't resemble the text
- Modifying the Whole Image
- Excess Spacing
- Designing strong puzzles
- Rotation and warping of individual characters
- Character Set
- Font Selection
- Other considerations
- Dictionary Words
- Use of a Known or Public Source of Puzzles
- Success Rates
- Accessible CAPTCHA
- Case Study: Breaking reCAPTCHA
- reCAPTCHA Overview
- The Attack
- Success Rates
- Generation and Processing
- Multiple Guesses
- Puzzle Lifetime
- Reuse of Puzzles
- Recommended Architecture
- Logging and Incident Response
- Other Considerations
- Avoiding challenging users
- Character Set
- Demonstration Code
- reCaptcha word separation - blobs.rb