Sketcha: A Captcha Based on Line Drawings of 3D Models
Steven A. Ross
J. Alex Halderman
University of Michigan
Princeton, NJ, USA
Ann Arbor, MI, USA
Princeton, NJ, USA
This paper introduces a captcha based on upright orienta-
tion of line drawings rendered from 3D models. The models
are selected from a large database, and images are rendered
from random viewpoints, aﬀording many diﬀerent drawings
from a single 3D model. The captcha presents the user with
a set of images, and the user must choose an upright orienta-
tion for each image. This task generally requires understand-
ing of the semantic content of the image, which is believed to
be diﬃcult for automatic algorithms. We describe a process
called covert ﬁltering whereby the image database can be
continually refreshed with drawings that are known to have
Figure 1: Example captcha based on line drawings.
a high success rate for humans, by inserting randomly into
The user’s goal is to rotate each image until it is up-
the captcha new images to be evaluated. Our analysis shows
right, choosing among four orientations by clicking
that covert ﬁltering can ensure that captchas are likely to
on the image. Each line drawing was automatically
be solvable by humans while deterring attackers who wish
rendered from a 3D model using a randomized point
to learn a portion of the database. We performed several
of view, providing for many possible images from
user studies that evaluate how eﬀectively people can solve
the captcha. Comparing these results to an attack based on
machine learning, we ﬁnd that humans possess a substantial
performance advantage over computers.
model. People are better than machines at recognizing and
Categories and Subject Descriptors
understanding images of 3D shapes (at least until the general
problem of computer vision is solved). Furthermore, the use
K.6.5 [Management of Computing and Information
of line drawings preferentially obfuscates the objects, like
Systems]: Security and Protection—Authentication; K.4.4
the distortions employed in text-based captions, potentially
[Computers & Society]: Electronic Commerce—Security
broadening the relative gap between human recognition and
that of automatic algorithms. Moreover, one study suggests
that people can recognize drawings faster than photographs,
and with equal accuracy, at least in the case of pictures of
Design, Experimentation, Human Factors, Security
human faces .
Captchas exploit the gap between what humans and ma-
chines can accomplish; any simple puzzle that humans can
solve well but that is considered to be diﬃcult for comput-
security, CAPTCHA, 3D models, drawings
ers may form the basis for a captcha. The most prevalent
captchas are based on an image containing text that has
been obfuscated by a variety of distortions (warping, image
This paper introduces a captcha  called “Sketcha”
noise, overlapping letters, overdrawn lines and other shapes,
based on line drawings created from 3D models. Sketcha
The designer must choose a degree of obfuscation
requires the user to rotate each image in a set of drawings
which makes it very unlikely that an adversarial program
until every one is upright, by clicking to turn them 90-
can deduce the text. At the same time the text should not
degrees at a time (Figure 1). The set is selected randomly
be too obfuscated – it should be very likely that the human
from a pool of drawings rendered from 3D models in a
should be able to recognize the text. Some people ﬁnd cur-
Using randomized viewing parameters,
rent text-based captchas annoyingly diﬃcult. Luis von Ahn,
many diﬀerent images can be rendered from a single 3D
one of the inventors of captchas, oﬀers the rule of thumb that
Copyright is held by the International World Wide Web Conference Com-
humans will tolerate a test that they can solve about 9 out
mittee (IW3C2). Distribution of these papers is limited to classroom use,
of 10 times; if the test is more diﬃcult for humans, frus-
and personal use by others.
tration will deter them from using the service behind the
WWW 2010, April 26–30, 2010, Raleigh, North Carolina, USA.
Long-standing problems in AI oﬀer good resources for
Researchers have investigated several forms of captchas
captcha designers: we believe an adversary will not be able
based on understanding natural images. Warner proposed a
to solve the problem with greater accuracy than techniques
system called “KittenAuth” based on the ability to recognize
previously investigated by the research community [2, 3].
kittens in photographs . Such schemes are known to be
But there remains an arms race between, on one side
weak because the full database of images can be learned by
captcha designers, and on the other side both researchers
an adversary. Thus, the Asirra system of Elson et al. 
and hackers. Berkeley researchers Mori and Malik  were
uses a huge database of photos of cats and dogs (from
able to defeat the text-based Gimpy captcha in use by Yahoo
Petﬁnder.com), under the assumption that the full database
in 2002. Last year security experts announced they believe
is too large to be learned.
(In fact, such a method is
that a European hacker has compromised the text-based
often called a HIP for “Human Interactive Proof” rather
captcha in use by Google . As attackers’ methods surpass
than captcha, because the latter technically requires that
the abilities of the least-capable humans, the problem (e.g.
all algorithms and data are publicly known.)
reading obfuscated text) can no longer generate a captcha,
Golle  showed that the Asirra captcha is vulnerable to
and designers need to turn to new problems.
machine learning attacks; simply put, it is possible to design
Our approach follows that of Gossweiler et al. , who
an algorithm that can identify cats and dogs with enough
introduced the idea of image orientation forming the basis
reliability that the captcha can be solved with probability
for a captcha.
Their approach, called the “What’s Up”
0.1, which is suﬃciently often to render it ineﬀective as a
captcha, uses images drawn from popular web searches as
a potentially huge database. This design enjoys several nice
Previous systems have considered the use of 3D models
properties, including simplicity, language independence, and
in captchas. Kaplan  oﬀered an early proposal based
the web as an ever-growing resource for database images.
on manual labeling of models that is unlikely to scale due
Gossweiler et al. mention the possible extension of their
to manual eﬀort in modeling and labeling parts. The web
method to use 3D models (the basis for our captcha) which
site www.yuniti.com uses captchas based on Lambertian
oﬀer several potential advantages as a source of imagery.
renderings of 3D models, but it does not appear to have
Images selected from the web (the most obvious source for a
been subjected to a rigorous security analysis and in fact
huge database) are subject to reverse-indexing, for example
appears to be susceptible to attack using basic computer
the service oﬀered by TinEye.com.1 In contrast, by provid-
Fu et al.  describe a method for
ing renderings (especially line drawings) we oﬀer little sup-
orienting 3D models of man-made objects that might be
port for an attacker to recover the original 3D model. To
used for attacking captchas.
Fortunately, our captcha
recognize a previously seen model, the attacker must match
displays images rendered from models, rather than the
any possible rendering (from any angle) against it . More-
models themselves. Mitra et al.  propose highly abstract
over, 3D models have the potential to support a variety of
“emergence images” rendered from 3D models as a potential
rendering styles (e.g. Figure 7) to further obfuscate the im-
source of captcha, but oﬀer limited security analysis.
age, though in this paper we only study simple line drawings.
Section 2 describes how we built a prototype of the
Models can even be generated programmatically, construct-
Sketcha captcha, populating a database with hundreds
ing variety of shapes within a family, like buildings or plants,
of models and thousands of images.
This section also
based on random parameters, combinations and arrange-
introduces a process we call covert ﬁltering whereby the
ments. The computer vision literature oﬀers many tools to
database can be continually refreshed. Section 3 presents the
the potential attacker of a photo-orientation captcha, for ex-
results of several user studies (involving hundreds of subjects
ample face detection, sky detection, landscape scenes, and so
on the Amazon Mechanical Turk) to evaluate people’s
forth [15, 19]. As observed by Gossweiler et al., the captcha
abilities to solve this puzzle, concluding that for a reasonable
designer can incorporate such tools into the formation of the
range of parameters this is a viable approach. Section 4
database and thereby ameliorate the threat of such attacks,
considers two broad forms of attack on such a system – both
and the same principle applies in the 3D case as well. Nev-
learning a portion of the database by repeatedly querying
ertheless, we believe the gap is broader between what people
the system and machine learning attacks. We show that our
and computers can currently achieve with regard to recog-
covert ﬁltering process eﬀectively thwarts an attacker who
nizing the contents of line drawings. Finally, serving images
seeks to learn the pool of images. Finally, we evaluate our
of models, rather than the models themselves, oﬀers an ad-
test database in the context of a machine learning attack
vantage aside from security, wherever intellectual property
and argue that it would be diﬃcult to close the gap between
restrictions prevent redistribution of the database.
humans’ and machines’ ability to solve the captcha.
The other major diﬀerence is that the What’s Up captcha
Thus, the contributions of this paper are:
uses continuous rotation (requiring the user’s answer be
within some tolerance of the correct orientation), whereas
• a captcha based on orienting drawings derived from 3D
Sketcha oﬀers the user only four orientation options. Some
beneﬁts of the latter interface are that it is relatively simpler
• a working prototype, available at www.sketcha.net
to describe and understand, it is easily implemented in
major web browsers, and the task can be accomplished by
• the results of several large usability studies,
only taps or mouse clicks.
Chow et al.  have argued
• a way to continually update a database of such images
for using captchas that can be performed by pointer clicks,
using “covert ﬁltering,”
citing speed and simplicity on mobile devices.
• analysis showing that covert ﬁltering resists attackers
1In informal testing, we found that TinEye locates the
who try to learn the database,
correctly-oriented original for at least one of the example
images from the Gossweiler et al. study (Figure 6-6 in ).
• and analysis of a machine learning attack.
2. THE SKETCHA CAPTCHA
incentives) to collect the proper answers for every image
This section describes our proposed captcha and presents
in the database. To thwart such adversaries we propose
the details of a prototype implementation. First we address
to use three strategies: (1) a large database of models, (2)
the user interface, followed with a discussion of how we
a constant feed of new models into the database, and (3)
produce and maintain a database of 3D models and resulting
varying parameters for diﬀerent images of a given model.
These strategies are discussed below, and an analysis of
the conditions under which they are robust is oﬀered in
Our captcha requires the user to rotate a series of images
One challenge posed by both automatic addition of new
until each one is upright.
In our implementation, each
models to the database and random selection of views is that
image is shown as a very small (80x80) thumbnail with
some tasks presented to the user might become too diﬃcult
one larger (240x240) image that magniﬁes any image that
for many users. It is possible, for example, that the user
the mouse hovers over, much like in the Asirra captcha of
would be presented with an image that contains just a few
Elson et al. .
unrecognizable lines. Our solution to this problem is to show
Since the images shown are drawings of 3D objects,
a new image to a few people and test whether they orient
the viewer must generally recognize the objects in order
it consistently, before incorporating it into the database.
to understand what is their proper orientation (although
The way we test these “evaluation” images is to mix a few
sometimes one can make a good guess based on an overall
of them in with the already-vetted images presented in a
impression of the kind of object). To rotate an image by 90◦
captcha, randomly, such that a person solving the captcha
the user simply clicks on it, so there are four orientations to
is simultaneously demonstrating his humanity and testing
choose from. If the series contains n images there are 4n
the evaluation images without knowing which is which. We
combinations, for example 65,536 from only eight images. It
call this process covert ﬁltering . For example, the user
takes an average of 1.5 clicks to orient an image, so 12 clicks
may see 10 images total, where 8 of the 10 are used as a
are expected for an 8-image captcha.
captcha and the other 2 images are evaluation images. If
This interface allows the images to be served by the web
a person correctly solves the captcha based on the 8 vetted
server in a single (random) orientation, and then client-
images, then we take the given answers for the 2 evaluation
images as one person’s opinion. Once a certain number of
implementation works in several common browsers including
people have consistently oriented an evaluation image, it is
Firefox, Safari and Internet Explorer.) Thus, the bandwidth
inserted into the database. On the other hand if anyone
requirements are low per captcha, as only one orientation
chooses a diﬀerent orientation it is rejected.
need be sent per image. In our implementation, ten 240x240
This general approach of using part of a captcha process
images with an average ﬁle size of about 12kb each are sent
to do useful work was pioneered by the reCaptcha system
from the server per captcha. This is one practical advantage
of von Ahn et al. .
However, rather than using the
to choosing 4 discrete possible orientations per image, in
covert ﬁltering process to achieve an external goal (e.g.,
contrast to the captchas described by Gossweiler et al. 
interpreting digitized text) we use it to improve the strength
which allow for continuous rotation, and thus either requires
of our captcha by growing the database. This framework
more sophisticated software running at the client side (e.g.
is also related to the “collaborative ﬁltering” approach
Flash) to handle rotation or sending many pre-rotated
of Chew and Tygar , who use human input to build
Finally, we believe that clicking through only
captchas based on questions for which there is no “correct”
4 choices is easier to understand and manipulate than
answer. Taking inspiration from their work, we note that
it is not really important that people fully recognize the
object or even answer correctly, as long as they all agree
about the proper orientation of the images. In practice,
however, we ﬁnd that with renderings from 3D models,
To generate images for the captcha, we randomly select
when people agree about an orientation is almost always
models from a large database of models, using randomly
the correct upright orientation. Gossweiler et al.  also
chosen viewing parameters. The camera angles range from
brieﬂy discuss this general approach. However, they did not
60◦ above the horizon to 40◦ below the horizon, based on
implement it in their user study, nor did they analyze its
our empirical observations that it is often diﬃcult to orient
security implications. In Section 3 we perform multiple user
an image when the camera angles are too close to the north
studies to evaluate this framework with regard to human
and south poles, and that this eﬀect is stronger from below
performance, and in Section 4.1 we analyze the security
than above. In Section 3 we present data that supports these
impact of this strategy.
There are many potential sources of 3D models, including
After choosing a random camera angle, our process ren-
commercial data sets containing many thousands of high-
ders an image using the automatic line drawing system of
quality models, open source model repositories, and even
Cole et al.  which oﬀers control of line density even where
simply crawling the web for models. We anticipate that
models are very detailed in some areas. Next we crop the
as 3D scanning technologies improve, acquiring large model
images to the bounding box of the lines, and ﬁnally scale
sets will become even easier. The experiments described in
the image to 240x240.
this paper have been based on models downloaded from the
Key to the success of these captchas is that the set of
Google 3D Warehouse. This repository allows people from
all possible images, together with their proper answers,
around the world to upload models for other people to view,
should be diﬃcult or impossible to learn. In one form of
share, download, tag, and discuss, much like Flickr.com does
attack sometimes called the “Mechanical Turk attack,” an
for photos. It is currently easier to capture photos than
adversary pays people a small amount of money (or other
to create or capture 3D models, so for the moment image
each, and each of these 100 images were selected randomly
databases are much larger and growing more quickly than
from pool of images used in the study, such that two criteria
3D model databases. Nevertheless, the Google Warehouse
were met: ﬁrst, in any study a subject would see an image
contains at least hundreds of thousands of models (Google
rendered from a particular model no more than once; and
does not currently report the size) and appears to be growing
second, progress through the overall pool of images used
rapidly. This database has a predominance of buildings,
throughout the study was approximately uniform. After
in part because of the ability to geo-locate the model
the initial ten captchas, two more pages oﬀered an identical
in connection with Google Earth.
In selecting models,
interface to the initial ten pages, but repeated 20 of the
therefore, we only downloaded models that are not geo-
images shown earlier, selected randomly. This provided a
located. In addition, we selected only models with high user
measure of consistency – how many of the repeated images
ratings. We downloaded 4488 models, randomly, with these
were answered the same way the second time indicates the
care with which the user performed the task.
Not all of these models render well in our line drawing
In reporting statistics on selected image orientations, we
system, for various reasons. For example, objects that are
only considered the initial ten captchas – the ﬁnal two were
extremely wide and ﬂat, or long and thin, tend not to
used only to measure consistency so as not to bias the
produce good imagery over the range of views described
statistics towards the repeated images.
above, while other models produced mostly-white images for
Prior to beginning the task in every study, subjects were
many views. Therefore we eliminated models with extreme
given a page containing 10 arrows at random orientations
aspect ratios, where the ratio of the smallest dimension to
and asked to rotate the arrows so that they all point up,
the largest dimension was less than 0.1, after which 3851
to ensure they understood the basic interface.
models survived. Next, our drawing software rendered 20
subsequent page the instructions read simply: “Click the
views of each 3347 models, after which we eliminated models
images below until they are upright, then click ‘Next’.” Near
where the average value v of all pixels in the image was
the ‘Next’ button, progress was indicated, for example, by
too close to white (v = 1.0), according to the following
“Page 3 of 12.”
criteria. We rejected models where either: v > 0.99 in 75%
After the 12th page, an optional survey asked subjects for
of the images, or v > 0.995 in 25% of the images. Of the
gender, age by decade, highest educational degree obtained,
remaining 2574 models, we selected 400 randomly for the
and comments. The response rate was high (89%) and these
experiments described in Section 3. This entire process was
programmatic and therefore in principle could be carried out
on an larger scale without human intervention.
• gender: 54% female, 46% male;
We observe that after buildings, the next most prevalent
class of models in the 3D Warehouse is cars, or perhaps
• age: 8% < 20, 42% 20–29, 27% 30–39, 13% 40–49,
vehicles. Knowing that a substantial portion of the images
8% 50–59, 2% 60–69, 0.3% ≥ 70;
in the database come from a particular class of object oﬀers
a potential advantage to attackers, who might be able to
• degree: 4% none, 32% high school, 39% undergrad,
construct a specialized detector. (By analogy: automatic
21% graduate, 5% doctorate.
methods for orienting photos often employ face and sky
We did not ﬁnd any signiﬁcant correlation between overall
detectors as these are common in natural images [15, 19].)
performance and these attributes, indicating that this task
While our system only takes the ﬁrst step, by eliminating
is equally suited to the diﬀerent groups. While Mechanical
geo-located buildings, we could use ﬁltering, for example, to
Turk workers are likely to be more experienced web users
limit the number of cars based on the “car” keyword.
than the general population, there is little reason to expect
3. STUDIES ON HUMAN PERFORMANCE
that they would be better able to interpret and orient line
drawings. In addition we used Google Analytics to collect
This section presents the results of three experiments
broad demographic information, ﬁnding: that our subjects
we performed with the prototype implementation described
were from 21 countries in total but about 70% were from the
in Section 2, in order to evaluate how eﬀectively people
US, 20% from India, and the remainder largely from Europe;
can solve our captcha. We used the Amazon Mechanical
and that 5 languages were spoken but the vast majority
Turk as the source of participants in our studies.
spoke English. We believe these demographics are roughly
Mechanical Turk is a internet service that allows “requesters”
consistent with the overall pool of workers on the Mechanical
(such as researchers) to create small, web-based tasks that
Turk. These data are aggregated so we could not compare
may be performed by anonymous “workers.” Each worker
performance across demographics.
is typically paid between $0.05 and $0.30 to complete a
Overall the 558 participants in our studies completed
task. The number of workers on the service is such that
1,192 tasks (14,304 pages of which 11,920 were test data and
particularly attractive tasks are usually undertaken within
the others were duplicate images for verifying consistency in
minutes. Workers on the Mechanical Turk generally seem
the results). The median time was 8.5 minutes per task.
to favor tasks that take somewhere around 10 minutes to
Some workers completed multiple tasks per study (with the
constraint that they would see a model no more than once)
3.1 Experimental Setup
and some workers completed tasks in multiple studies. We
omitted data from 14 completed tasks where the consistency
In each of our studies, the task given was to solve a dozen
rate was below 12/20 and also 7 where the accuracy on
captchas consecutively, and the data we collected included
the pool of 100 images was about that of random guessing
the orientation each user selected for each image. The ﬁrst
(indicative of either a misunderstanding or foul play). These
ten captchas presented to a subject contained ten images
data were replaced by that of later subjects.
Figure 2a shows a histogram of the distributions of xi
over the 200 images in our pilot study. In this plot f (xi)
is the probability density of xi – the frequency with which
we observe x
i, measured with a granularity of 31 bins whose
values sum to 1. Notice that the most frequent case is that
all 30 people oriented the image correctly, and that this case
accounts for 42% of the overall data. This is good news
for our proposed captcha, because it means that there are
many images for which people can consistently orient them
The expected value E[x] over this distribution is
(a) Study A: 200 models × 1 image × 30 people
E[x] = Xxif(xi)/Xf(xi)
Unfortunately, the data Figure 2a have E[x] = 0.78 so
the likelihood of solving a captcha containing 8 images, for
example, is 0.788 = 0.14, in essence an unusable captcha.
These results also suggest that covert ﬁltering might be an
eﬀective approach for selecting images that will make a more
usable captcha. Suppose we show each image in the pool to
10 people, and eject every image that is incorrectly oriented
by at least one person. Of course some moderately diﬃcult
images may survive this ﬁlter process, but most will not. We
(b) Study B: 400 models × 20 images × 10 people
can estimate the eﬀect of this ﬁlter on the distribution, to
the extent that xi models the likelihood that a new person
will be able to correctly orient image i. In particular, the
chance that image i will survive the ﬁlter is simply x10
i , so
the resulting distribution would be f (xi) = x10
i f (xi).
Calculating the expected value over this distribution
X if(xi)/X i)
we ﬁnd that a random image selected from this distribution
has a probability E[x] = 0.986 of being oriented by a new
person. Thus a captcha containing 8 such images is expected
to be solved with probability 0.89 – a reasonable target rate.
(c) Study C: 200 models × 4 images × 20 people
This ﬁlter forms the basis of our later experiments.
Figure 2: Distribution of image diﬃculties in user
3.3 Filtering Studies
studies. Horizontal axis encodes number of people
To form a more usable captcha, we will resort to ﬁlter-
(out of a:30 b:10 or c:20) who correctly oriented a
ing out diﬃcult images as described in Section 3.2. We per-
particular image – a measure of the diﬃculty of that
formed two studies related to this process. In the ﬁrst (Study
image. Vertical axis shows fraction of the images in
B), we began with a larger pool of models and collected dis-
the study with a given diﬃculty. Distributions in (a)
tributional statistics as in our pilot study. We used these
and (b) are similar, while distribution in (c) includes
statistics to select images that were correctly oriented by
only images drawn from the rightmost bar in (b).
at least ten participants. We then conducted another study
(Study C) using only these ﬁltered images. The results show
that covert ﬁltering can signiﬁcantly improve the ability of
3.2 Pilot Study
humans to solve the captcha.
Study B was based on a (40×) larger pool of 400 models
Our ﬁrst study (Study A) was a pilot experiment designed
with 20 images each. In this study 504 workers performed
to collect data about the distribution of diﬃculties of images
a total of 937 tasks (9,370 test captchas). Each image was
in our database. We constructed the database as described
shown to at least 10 people, the number 10 having been
in Section 2.2, and then we selected one image from each of
estimated to be suﬃcient by our analysis of the data from
200 models for this study. We tested them with 54 workers
performing a total of 69 tasks (690 test captchas), such that
The resulting distribution can be seen in Figure 2b.
every image was seen by at least 30 people.
Observe that its shape is similar to that of Figure 2a, albeit
The fraction of the 30 people who were able to correctly
with fewer probability values because each image was shown
orient a speciﬁc image provides a measure of diﬃculty or
to 10 people rather than 30 so the bins are broader and taller
ease with with which the image can be oriented. Let us call
on average. The rightmost data point corresponds to the
this measure xi and take it as an approximate measure of
52% of the 8000 images for which all ten people who saw
the probability with which an arbitrary new person would
the image oriented it correctly. The images from this bin
be able to orient the image.
form the pool for our next study.
Figure 3: Images removed from the database due
to user ﬁltering in Study B. (a-b) recognized but
symmetric, (c-d) diﬃcult to recognize, (e-f ) typi-
cally oriented upside down, (g-h) unfamiliar objects
gave rise to incorrect orientations in either (g) bi-
modal or (h) unimodal distributions. Figure 1 shows
example images that survived this ﬁltering process.
Accuracy by angle above the horizon.
People tend to be most accurate when the camera
In Study C we randomly selected 4 images from each of
angle is about 20◦ above the horizon. Views below
200 models, where each image had been correctly oriented
the horizon (negative angles) have lower accuracy
by all 10 participants in Study B. Figure 3 show a selection
than those above (positive). Label d matches the
of these ﬁltered images. The test images were shown as a
histogram bin containing values in [d,d + 5] degrees.
series of captchas, just as in the previous experiments, in this
Images labeled with approximate angle:accuracy.
case so that each image was seen by at least 20 people. In
this study 98 workers performed a total of 186 tasks (1,860
times they try. Our observed success rate of 0.88 is in the
The resulting probability distribution is shown in Fig-
ballpark. Moreover, we have several reasons to believe that
ure 2c. The expected value is E[x] = 0.983, which matches
in practice the sketcha captcha could have a signiﬁcantly
well the value of E[x] = 0.986 predicted from the data
higher success rate.
First, the incentives in our studies
in our prior study as described in Section 3.2. Moreover,
do not quite match the incentives of a true captcha. Our
E[x]8 = 0.87 which suggests that this distribution for suc-
workers tended to proceed through a series of captcha pages
cessful rates of image orientation could be used as the basis
with two competing goals: to get most of the images right
for a reasonable captcha.
(which they are paid to do) and to ﬁnish quickly (so they
In addition, we can return to the data for the speciﬁc
can move onto their next job and make more money). With
pages of images shown to users and ask: suppose 8 of the 10
the proposed captcha, the person’s goal is to orient all of
images had in fact been a captcha – would the person have
the images correctly; if they fail they have to try again until
succeeded? We ﬁnd that averaged over all people, all pages,
they succeed. Therefore, we believe that people would be a
and all subsets of 8 images on each page, that the success
little more careful in the real setting. (One could imagine
rate would be 0.88. This number is slightly better than the
trying to design an incentive structure for the studies on
rate based solely on the image distribution, because in some
the Mechanical Turk that more closely matched that of a
cases the user simply pressed the “Next” button without
captcha, for example declining to pay people who failed,
orienting any of the images, which depresses the success rate
but we felt this would be unfair.)
for images at a higher-than-average rate while only incurring
In a production captcha system, we would also identify
the penalty of a single failed page.
and eject images that survived the initial covert ﬁltering
process, but turned out later to have a higher-than-average
failure rate, thereby further improving average performance
We draw two signiﬁcant conclusions from these studies.
over time. Furthermore, we believe the overall quality of the
First, the fully-automatic process that randomly selects
initial database, prior to covert ﬁltering, could be improved
views and models drawn from the Google 3D Warehouse
in several ways. For example, by using more sophisticated
and renders line drawings from them generates imagery that
heuristics that look for diﬃcult models such as those that
people can often orient correctly, but not often enough to
have strong symmetries such as the wheel shown in Figure 3.
be used in the kind of captcha proposed herein. Second, the
Finally, by more narrowly restricting the camera views used
ﬁltering process that rejects images that were incorrectly
in production, we see an opportunity to further improve the
oriented by at least 1 out of 10 people removes enough of
initial database. Figure 4 shows the accuracy for the images
the diﬃcult imagery that the resulting pool can be used for
shown in Study B as a function of the camera angle over
the horizon. We see that by restricting the range of angles
Recall from Section 1 the rule of thumb that people
to the range [−10◦, 50◦], we could substantially improve the
will tolerate a captcha that they can solve 9 out of 10
quality of the images in the initial pool.
Finally, we note that the median times to complete the 12
expected number of times an image must be shown in the
pages in Study C (6.5 mins) was signiﬁcantly lower than that
covert ﬁltering process before it is either rejected (because
of Study B (8.8 mins). This is not surprising, since many
someone failed to orient it properly) or it is added to the
of the diﬃcult images had been removed. These numbers
database. For example, in Study B, images were shown
indicate that a person could typically solve a 10-image
10 times, but the “mean time to failure” for those images
sketcha captcha in about 35 seconds. Moreover, it might
lowered t to 7.3.
actually be faster as the recorded numbers probably include
Next we consider how quickly the attacker can learn new
times in which some workers took breaks. Nevertheless, one
images by guessing. We observe that if the attacker guesses
limitation of this technique is that this time is probably
the answer to a captcha and it is rejected, he learns relatively
longer than the time to solve a typical text-based captcha.
little – only the fact that at least one of the test images was
not correct. On the other hand, if his answer is accepted,
4. SECURITY ANALYSIS
then he knows that every test image was correct. Suppose
that out of n images he already knew the answer for k of
In this section we consider possible classes of attack and
them, and he correctly guessed the other n
how they compromise this form of captcha. First we discuss
−k. In that case
he learned n
the attacker who concentrates on learning a fraction of the
− k new images. (We can ignore the fact that
the attacker does not know which n of the m + n images in
database of images simply by guessing randomly, without
the captcha are already in the database and which m are
regard for the actual image content. The attacker may either
being evaluated; he can simply treat them all as “correct.”)
assail the system with many guesses in rapid succession to
Since the attacker knows fraction d of the database, the
learn some of the database, or may begin by stealing a
probability of his knowing exactly k of the n images in the
fraction of the database. Under this form of attack, the
captcha is given by:
system is compromised as the attacker learns enough of the
images in the database so as to signiﬁcantly increase the
probability of solving future captchas. Next we investigate
nk = n dk(1
an alternate approach wherein the attacker does not bother
to remember previously seen images, but rather concentrates
Moreover, the probability of his guessing all of the n
on using the content of known images to train a machine
unknown images is gn−k where g is the chance of guessing
learning algorithm for selecting the correct answer for new,
one ( 1 in our interface), and in that case he learns n
previously unseen images. Under this second attack, the
images. Thus, for each attempted captcha he can expect on
system is compromised as the attacker’s algorithm increases
average to learn:
the chance of correctly solving the captcha signiﬁcantly
above that of random guessing. Finally, we consider an
attacker that uses both of these attacks in tandem.
1 = Xgn−k(n−k)pnk
4.1 Database attacks
Recall that the attacker is attempting captchas at rate αC,
Here we discuss the conditions under which an attacker
and that this must allow him to learn at least as fast as dh,
compromises the captcha by learning part of the image
database via guessing. As the attacker learns more of the
database, his chance of guessing the answer to a captcha
a = αC1 ≥ dh
improve, because he is likely to recognize some of the images.
Equation (1) and inequality (2) place a lower bound on
However, we will show that in order to maintain knowledge
the fraction α of the traﬃc to the captcha necessary for the
of any fraction of the database over time, the attacker must
attacker to sustain in order to continue knowing a fraction
sustain a substantial portion of the overall traﬃc to the
d of the database. Collecting terms it is easy to show that:
database. The dilemma for the attacker is that as he learns
more of the database, making it easier for him to guess the
captcha, it becomes harder for him to learn new images in
α ≥ 1/“1+
− k) nk
− d)n−k” (3)
order to maintain his rate of knowledge.
Suppose the attacker’s rate of traﬃc represents a frac-
While inequality (3) is messy, it is easy to evaluate in speciﬁc
tion α of the overall traﬃc C to the captcha, and that
cases, as with the parameters for Sketcha summarized after
the remaining fraction (1 − α) comes from legitimate users.
Equation (1). Figure 5 shows a plot of α as a function of d.
(Any other non-legitimate traﬃc, say from other attackers,
If the attacker starts from no knowledge of the database and
may be assigned to α for the purposes of this discussion.)
is trying to learn a fraction of it, he has to climb over the
The legitimate traﬃc causes new images to be added to the
hump from the left side by sustaining a tremendous surge of
database at some rate h due to covert ﬁltering. If the at-
traﬃc (95% at the peak). If that is not possible, the attacker
tacker knows a fraction d of the database, he must learn new
remains to the left of the peak and his knowledge of the
images at a rate dh in order to keep up.
database oﬀers him only marginal advantage over random
The covert ﬁltering process described in Section 2.2 adds
images to the database at the rate:
On the other hand suppose the attacker was somehow able
to steal the entire database. In this case he has to climb the
h = (1 − α)Cmq/t
curve on the right side in order to maintain his knowledge
where m is the number of images being evaluated in each
of the database (because as new images are added to the
captcha through covert ﬁltering (2 in our examples), q is
database, he sees them only rarely and thus it is diﬃcult for
the fraction of our pool of evaluation images that survives
him to learn at the same rate). So an attacker who obtains
the covert ﬁltering process (0.52 in Study B) and t is the
the entire database must fall down the curve from the right
Figure 5: Database Attack. This plot places a lower
bound on the fraction of the traﬃc to the captcha
Figure 6: Examples from machine learning attack.
that must come from the attacker (α, vertical axis)
We trained a SVM on half of the images in Study B
as a function of how much the database is already
and then evaluated it on the remainder.
known to the attacker (d, horizontal axis), in the
in the upper row were oriented correctly while the
steady state. If the attacker’s traﬃc drops below
lower row failed. The SVM generally classiﬁed boxy
this bound, the database will grow faster than his
objects better than organic forms, and tended to
learning rate, due to covert ﬁltering. Starting from
do better for near-horizon views like (c) than oﬀ-
no knowledge of the database (left side) the attacker
angle views like (g). Many successes and failures
must exert 95% of the traﬃc to the database to
are diﬃcult to explain, such as (d) and (h).
climb over the hump.
Even if the attacker has
managed to learn as much as 80% of the database
(right valley), he must sustain 47% of the overall
traﬃc in order to maintain this knowledge.
4.2 Machine learning attacks
In addition to explicitly learning the contents of the image
database, an attacker may use a machine learning algorithm
to build a general-purpose classiﬁer for images of the type
until his level of traﬃc can sustain the steady state. If that
used in the captcha. Automatic image orientation detection
level is below the minimum of the curve (47% in Figure 5)
is a well-studied topic when the subject is a color photograph
then he will not be able to learn quickly enough to maintain
(e.g., [15, 19]). Current generation algorithms report high
any fraction of the database and over time his knowledge
(>90%) accuracy in selecting the correct orientation for a
will dwindle. In this sense the process emerging from covert
general photograph among four 90◦ rotations.
ﬁltering can be thought of as giving the database a “self-
we believe that our images are robust against current
machine learning methods since line drawings contain less
We can also analyze the What’s Up captcha of Goss-
information that photographs. In particular, line drawings
weiler et al.  using the same machinery. Their paper
lack color, texture, background objects and scenery, as well
suggests that using three images (n = 3) provides a reason-
as high-level semantic cues like grass, sky, buildings, faces,
able tradeoﬀ between security from attacks and diﬃculty
and the like.
for humans. Suppose we add a fourth image for evaluation
Luo and Boutell  describe an algorithm for orienting
(m = 1), and that it takes on average the same number of
photographs that performs well and is fairly representative
trials to determine whether or not to add it to the database
of current methods. Their algorithm uses a host of classiﬁers
as in our examples (t = 7.3). Their paper suggests that
that leverage the aforementioned image properties like color
roughly half of the images survive this evaluation process
distribution and semantic features. However, only one of
(q = 0.5). With these parameters we produce a plot similar
their classiﬁers makes sense to apply to line drawings: the
in shape to that Figure 5, but with a lower peak (65%) and
support vector machine (SVM) based on edge detection
shallower valley. Thus we conclude that the covert ﬁltering
histograms, for which they used the technique of Wang
process should also resist database attacks for the What’s Up
and Zhang .
Therefore, we implemented the same
captcha. However, it appears that the scenario in Sketcha
SVM, which creates feature vectors based on spatial edge
where there are more components, each solved more easily,
detection histograms. Such a histogram is calculated for
oﬀers better resistance to this form of attack.
each block that results from dividing the image into a 5x5
Finally, we note that for small values of C, the bound
grid, then classifying all pixels according to their edge angle
in equation (3) may not be prohibitive for an attacker,
as calculated by the canny edge detection algorithm. For
which has security implications for web sites wishing to use
line drawings, we note that a canny edge detector will turn
captchas with covert ﬁltering. If a web site that generates
each line into two lines, one in a direction that is rotated
a small amount of traﬃc maintains its own database of
180◦ from the other. In this way all angles can be expressed
images, an attacker may be able to sustain a high rate of
in the range 0-180 instead of 0-360, knowing that edge pixels
traﬃc relative to legitimate users. Therefore, covert ﬁltering
come in pairs. Therefore, our histograms have 19 bins, of
is eﬀective in contexts where the captcha is implemented
which the ﬁrst 18 are used for edge pixels and the last one
centrally and serving many users – in this way sites with
is used for pixels that do not correspond to an edge.
low traﬃc can ﬁnd “safety in numbers” by sharing a common
To test the strength of our image database against an
attack that uses the algorithm described above, we created
sets of feature vectors for each the 800 images used in
Study C. Each image produced four sets of feature vectors
labeled for each of the four possible orientations. In this way,
the SVM correctly classiﬁes an image if it labels according
to its proper orientation. We split the data into halves, and
trained a multi-class SVM on each independently. One half
trained the SVM and we tested it on the other half; then
we swapped the halves and repeated the test. The SVM
classiﬁed images with 61% accuracy on average. Figure 6
Figure 7: Stylization. In addition to varying view-
shows examples for which it performed well or failed.
points, a single 3D model can be rendered with a
These results show that a machine learning algorithm can
broad range of stylization using methods such as
do signiﬁcantly better than random guessing. However, an
that of Kalnins et al. .
accuracy level of 61% still gives an attacker little hope of
breaking the captcha. An attacker using an algorithm with
such accuracy would correctly classify eight images in only
1.9% of cases.
We believe that line drawings of 3D models are a source
There are several defenses against a machine learning
of images that is stronger against machine learning attacks
algorithm that has high accuracy. One could resort to ﬁlter
than previously suggested image-based captchas. Compared
out images that can be solved by particular machine learning
to photographs, line drawings lack detail and cues such as
algorithms, as proposed by Gossweiler et al.. Pre-ﬁltering
color, leaving less information for computers, but our studies
images this way would have little impact on the accuracy of
show that this does not make them prohibitively diﬃcult for
humans in completing the task, for several reasons. First,
users to orient.
one would need to remove relatively few images to skew the
We ran three user studies to test our captcha and ﬁltering
statistics of the classiﬁer towards randomness, so doing this
approaches. The results show that we can use covert ﬁltering
could have little eﬀect on high success rate of humans found
to increase the human rate of success suﬃciently for the task
in Section 3.3. Second, in looking at the performance of the
to serve as a practical captcha.
SVM on this data set we see little correlation with the human
We tested the viability of machine learning attacks by im-
performance. The use of rendered line drawings also aﬀords
plementing a support vector machine. It was able to orient
us the ability to vary the rendering process to create images
our test images with modest accuracy, but its performance
that are targeted at defeating machine learning attacks.
was insuﬃcient to break the captcha. Machine learning tech-
The rendering process leaves room for extensive stylization
niques may improve in the future, but our system can adapt
and obfuscation of the object that could confuse a machine
by pre-ﬁltering the database to remove images that are suc-
learning algorithm based on edge distribution, but can be
cessfully oriented by such methods or by changing the image
made in such a way as to preserve the semantic meaning of
rendering process until the performance of the orientation
the image for a human observer.
Finally, we consider the case where an attacker combines
This project suggests a number of areas for future work,
the database and machine learning attacks discussed in
Sections 4.1 and 4.2. The analysis leading to equation (3)
supposes that an attacker who does not know an image in
• Obfuscations available for 3D. In this paper we
the database guesses it with probability equal to random
rendered models only with a simple line drawing style.
guessing ( 1 ). However, if the attacker uses machine learning
However, there are many styles available, even within
to gain advantage in this guess, one might worry that he
the realm of line drawings (Figure 7). We would like
would be able to learn the database with a much lower
to explore a range of techniques available for further
traﬃc rate than emerged from the analysis in Section 4.1. In
obfuscating the images, hopefully thwarting machine
this situation, the hump on the left of the plot in Figure 5
learning algorithms without adversely aﬀecting human
is attenuated, and therefore the attacker can climb it on
performance. For example, we could use wiggly lines
the left, but will need to sustain a level equal to about
rather than straight ones, or we could randomly add
10% of the legitimate traﬃc in order to keep up with the
lines to the image that are uncorrelated with the rest
growing database. Obviously this attacker does better than
of the drawing.
one without the aid of machine learning, but in many
contexts this remains a prohibitive barrier for all but the
most resourceful attackers.
• Other tasks. In this paper we used rotation as
the goal, but there are many other tasks that could
5. CONCLUSION AND FUTURE WORK
be given, based on semantic understanding of the
drawing. For example, we could ask people to match
This paper presents the Sketcha captcha, a task which
images drawn from the same model but with diﬀerent
requires users to determine the upright orientation for a
selection of 3D objects rendered as line drawings.
leveraging a large database of common objects, we render
a collection of images from various angles to use in the task.
• Deployment. Our user studies have been quite
We apply covert ﬁltering to ensure that the images used in
extensive, but performed in an artiﬁcial setting where
the captcha can be solved by humans with high accuracy. In
users were paid to solve a task. We would like to study
addition, a production implementation would actively add
this captcha in the context of a working web site where
new images to the database to thwart attackers.
visitors have the actual captcha experience.
 Philippe Golle. Machine learning attacks against the
Asirra captcha. Technical Report 2008/126, IACR
We would like to thank Brian Brewington, Rich Feit, Mark
Cryptology ePrint Archive, 2008.
Limber, and the Google 3D Warehouse for the models used
 Bruce Gooch, Erik Reinhard, and Amy Gooch. Human
in this paper as well as helpful guidance in the project. We
facial illustrations: Creation and psychophysical
are grateful for the encouragement and advice of Luis von
evaluation. ACM Trans. Graph., 23(1):27–44, 2004.
Ahn. We also thank Forrester Cole for support in adapting
 Rich Gossweiler, Maryam Kamvar, and Shumeet
his “dpix” automatic line drawing software, and Mark Gray
Baluja. What’s up captcha? A captcha based on
for an early prototype based on photos. This work was
image orientation. In Proceedings of WWW 2009, the
sponsored in part by a Google Research Award.
18th International World Wide Web Conference, 2009.
 Robert D. Kalnins, Lee Markosian, Barbara J. Meier,
Michael A. Kowalski, Joseph C. Lee, Philip L.
Davidson, Matthew Webb, John F. Hughes, and
 Luis von Ahn. Personal communication, 2008.
Adam Finkelstein. WYSIWYG NPR: drawing strokes
 Luis von Ahn, Manuel Blum, and John Langford.
directly on 3D models. ACM Transactions on
Captcha: Using hard AI problems for security. In
Graphics, 21(3):755–762, July 2002.
Proceedings of Eurocrypt, pages 294–311.
 Michael Kaplan. The 3-D Captcha.
 Luis von Ahn, Manuel Blum, and John Langford.
 Jiebo Luo and Matthew Boutell. Automatic image
Telling humans and computers apart automatically.
orientation detection via conﬁdence-based integration
Communications of the ACM, 47(2):56–60, 2004.
of low-level and semantic cues. IEEE Trans. Pattern
 Monica Chew and J. D. Tygar. Collaborative ﬁltering
Anal. Mach. Intell., 27(5):715–726, 2005.
captchas. In Henry S. Baird and Daniel P. Lopresti,
 Niloy J. Mitra, Hung-Kuo Chu, Tong-Yee Lee, Lior
editors, HIP, volume 3517 of Lecture Notes in
Wolf, Hezy Yeshurun, and Daniel Cohen-Or. Emerging
Computer Science, pages 66–81. Springer, 2005.
images. ACM Transactions on Graphics, 28(5), 2009.
 Richard Chow, Philippe Golle, Markus Jakobsson,
 Greg Mori and Jitendra Malik. Recognizing objects in
Lusha Wang, and XiaoFeng Wang. Making captchas
adversarial clutter: Breaking a visual captcha. In
clickable. In HotMobile ’08: Proceedings of the 9th
Computer Vision and Pattern Recognition CVPR03,
workshop on Mobile computing systems and
pages 134–141, 2003.
applications, pages 91–94, 2008.
 Brad Stone. Breaking Google captchas for some extra
 Forrester Cole, Doug DeCarlo, Adam Finkelstein,
cash. New York Times, March 13, 2008.
Kenrick Kin, Keith Morley, and Anthony Santella.
 Aditya Vailaya, Hongjiang Zhang, Senior Member,
Directing gaze in 3D models with stylized focus.
Changjiang Yang, Feng-I Liu, and Anil K. Jain.
Eurographics Symposium on Rendering, pages
Automatic image orientation detection. IEEE
377–387, June 2006.
Transactions on Image Processing, 11:600–604, 2002.
 J. Elson, J. Douceur, J. Howell, and J. Saul. Asirra: a
 Luis von Ahn, Benjamin Maurer, Colin McMillen,
Captcha that exploits interest-aligned manual image
David Abraham, and Manuel Blum. reCAPTCHA:
categorization. In Proceedings of ACM CCS 2007,
Human-Based Character Recognition via Web Security
pages 366–374, 2007.
Measures. Science, 321(5895):1465–1468, 2008.
 Hongbo Fu, Daniel Cohen-Or, Gideon Dror, and Alla
 Yongmei Wang and Hongjiang Zhang. Content-based
Sheﬀer. Upright orientation of man-made objects.
image orientation detection with support vector
ACM Trans. Graph., 27(3), 2008.
machines. IEEE Workshop on Content-Based Access
 Thomas Funkhouser, Patrick Min, Michael Kazhdan,
of Image and Video Libraries (CBAIVL 2001), pages
Joyce Chen, J. Alex Halderman, David Dobkin, and
David Jacobs. A search engine for 3D models. ACM
 Oli Warner. KittenAuth.
Trans. Graph., 22(1):83–105, 2003.