eVersion 1.0 - click for scan notes
DON'T SHOOT THE DOG
To my mother, Sally Ondeck; my stepmother, Ricky Wylie; and Winifred Sturley, my
teacher and friend.
1--Reinforcement: Better than Rewards
In which we learn of the ferocity of Wall Street lawyers; of how to--and how not to--buy presents and
give compliments; of a grumpy gorilla, a grudging panda, and a truculent teenager (the author); of gambling,
pencil chewing, falling in love with heels, and other bad habits; of how to reform a scolding teacher or a
crabby boss without their knowing what you've done; and more.
2--Shaping: Developing Super Performance Without Strain or Pain
How to conduct an opera; how to putt; how to handle a bad report card. Parlor games for trainers. Notes
on killer whales, Nim Chimpsky Zen, Gregory Bateson, the Brearley School, why cats get stuck in trees, and
how to train a chicken.
3--Stimulus Control: Cooperation Without Coercion
Orders, commands, requests, signals, cues, and words to the wise; what works and what doesn't. What
discipline isn't. Who gets obeyed and why. How to stop yelling at your kids. Dancing, drill teams, music,
martial arts, and other recreational uses of stimulus control.
4--Untraining: Using Reinforcement to Get Rid of Behavior You Don't Want
Eight methods of getting rid of behavior you don't want, from messy roommates to barking dogs to bad
tennis to harmful addictions, starting with Method 1: Shoot the Animal, which definitely works, and ending
with Method 8: Change the Motivation, which is more humane and definitely works too.
5--Reinforcement in the Real World
What it all means. Reading minds, coaching Olympic teams, how happiness can affect corporate profits,
ways to deal with other governments, and other practical applications of reinforcement theory.
6--Clicker Training: A New Technology
From the dolphin tanks to everyone's backyard: dog owners around the world put away the choke chain
and pick up the clicker. Long-term benefits: accelerated learning, precision, reliability, better
communication, and fun. The Great Internet Canine Hot Dog Challenge; some truly fetching horses; a pilot
pilot program; and clicking and autism. Changing the world one click at a time.
About the Author
This book is about how to train anyone--human or animal, young or old, oneself or others--to do
anything that can and should be done. How to get the cat off the kitchen table or your grandmother to stop
nagging you. How to affect behavior in your pets, your kids, your boss, your friends. How to improve your
tennis stroke, your golf game, your math skills, your memory. All by using the principles of training with
These principles are laws, like the laws of physics. They underlie all learning-teaching situations as surely
as the law of gravity underlies the falling of an apple. Whenever we attempt to change behavior, in ourselves
or in others, we are using these laws, whether we know it or not.
Usually we are using them inappropriately. We threaten, we argue, we coerce, we deprive. We pounce on
others when things go wrong and pass up the chance to praise them when things go right. We are harsh and
impatient with our children, with each other, with ourselves even; and we feel guilty over that harshness. We
know that with better methods we could accomplish our ends faster, and without causing distress, but we
can't conceive of those methods. We are just not attuned to the ways in which modern trainers take advantage
of the laws of positive reinforcement.
Whatever the training task, whether keeping a four-year-old quiet in public, housebreaking a puppy,
coaching a team, or memorizing a poem, it will go faster, and better, and be more fun, if you know how to
use positive reinforcement.
The laws of reinforcement are simple; you can put the whole business on a blackboard in ten minutes and
learn it in an hour. Applying these laws is more of a challenge; training by reinforcement is like a game, one
dependent upon quick thinking.
Anyone can be a trainer; some people are good at it from the very start. You do not need special qualities
of patience, or a forceful personality, or a way with animals or children, or what circus trainer Frank Buck
used to call the power of the human eye. You just need to know what you're doing.
There have always been people with an intuitive understanding of how to apply the laws of training. We
call them gifted teachers, brilliant commanding officers, winning coaches, genius animal trainers. I've
observed some theater directors and many symphony orchestra conductors who are wonderfully skilled at
using reinforcement. These gifted trainers don't need a book to be able to take advantage of the laws that
affect training. For the rest of us, however, those of us muddling along with an uncontrolled pet or at cross-
purposes with a child or coworker, a knowledge of how reinforcement really works can be a godsend.
Reinforcement training is not a system of reward and punishment--by and large modern trainers don't
even use those words. The concept of reward and punishment carries a great freight of emotional associations
and interpretations, such as desire and dread and guilt and shoulds and ought to's. For example, we give
rewards to others for things we did ourselves--such as ice cream to a child to make up for a scolding. We
also tend to think we know what a reward should be: ice cream, for example, or praise. But some people don't
like ice cream, and praise from the wrong person or for the wrong reason may hurt. In some cases praise
from a teacher may guarantee ridicule from classmates.
We expect people to do the right thing without reward. Our teenage daughter should wash the dishes
because that's her duty to us. We are angry if children or employees break things, steal, arrive late, speak
rudely, and so on, because they should know better. We punish, often long after the behavior occurred--
sending people to prison being a prime example--thus creating an event that may have no effect on future
behavior, and which in fact is merely retribution. Nevertheless we think of such punishment as education, and
people easily refer to it in that way: "I taught him a lesson."
Modern reinforcement training is based not on these folk beliefs but on behavioral science. Scientifically
speaking, reinforcement is an event that (a) occurs during or upon completion of a behavior; and (b) increases
the likelihood of that behavior occurring in the future. The key elements here are two: the two events are
connected in real time--the behavior engenders the reinforcement--and then the behavior occurs more
Reinforcers may be positive, something the learner might like and want more of, such as a smile or a pat,
or they might be negative, something to avoid, such as a yank on a leash or a frown. What's critical is that
there is a temporal relationship between them--the behavior occurs, then the reinforcer occurs, and
subsequently the behavior that brought the good result or averted the bad occurs more often. In fact, the
definition works in both directions, like a feedback loop: If the behavior does not increase, then either the
reinforcer was presented too early or too late, or the payoff you selected was not reinforcing to that
In addition, I believe there's an important difference between reinforcement theory, the science, and
reinforcement training, a specific application of that science. Research shows that following a behavior with
a pleasant consequence increases the behavior. That's true; but in practice, to get the sensational results we
trainers have now come to expect, the reinforcer has to occur in the very instant the behavior is taking place.
Bingo! Now! In the instant, in real time, you, the learner, need to know that what you're doing right now has
won you a prize.
Modern trainers have developed some great shortcuts for reinforcing instantaneously: primarily the use of
a marker signal to identify the behavior. This revised version of Don't Shoot the Dog! is about the laws of
reinforcement, some practical ways to use those laws in the real world, and the grassroots movement called,
at least at present, clicker training, which is taking the technology into new and unexplored terrain.
I first learned about training with positive reinforcement in Hawaii, where in 1963 I signed on as head
dolphin trainer at an oceanarium, Sea Life Park. I had trained dogs and horses by traditional methods, but
dolphins were a different proposition; you cannot use a leash or a bridle or even your fist on an animal that
just swims away. Positive reinforcers--primarily a bucket of fish--were the only tools we had.
A psychologist outlined for me the principles of training by reinforcement. The art of applying those
principles I learned from working with the dolphins. Schooled as a biologist, and with a lifelong interest in
animal behavior, I found myself fascinated, not so much with the dolphins as with what could be
communicated between us--from me to the animal and from the animal to me--during this kind of training. I
applied what I'd learned from dolphin training to the training of other animals. And I began to notice some
applications of the system creeping into my daily life. For example, I stopped yelling at my kids, because I
was noticing that yelling didn't work. Watching for behavior I liked, and reinforcing it when it occurred,
worked a lot better and kept the peace too.
There is a solid body of scientific theory underlying the lessons I learned from dolphin training. We shall
go considerably beyond theory in this book, since as far as I know, the rules for applying these theories are
largely undescribed by science and in my opinion often misapplied by scientists. But the fundamental laws
are well established and must be taken into account when training.
The study of this body of theory is variously known as behavior modification, reinforcement theory,
operant conditioning, behaviorism, behavioral psychology, and behavior analysis: the branch of psychology
largely credited to Harvard professor B. E Skinner.
I know of no other modern body of scientific information that has been so vilified, misunderstood,
misinterpreted, overinterpreted, and misused. The very name of Skinner arouses ire in those who champion
"free will" as a characteristic that separates man from beast. To people schooled in the humanistic tradition,
the manipulation of human behavior by some sort of conscious technique seems incorrigibly wicked, in spite
of the obvious fact that we all go around trying to manipulate one another's behavior all the time, by whatever
means come to hand.
While humanists have been attacking behaviorism and Skinner himself with a fervor that used to be
reserved for religious heresies, behaviorism has swelled into a huge branch of psychology, with university
departments, clinical practitioners, professional journals, international congresses, graduate studies programs,
doctrines, schisms, and masses and masses of literature.
And there have been benefits. Some disorders--autism, for example--seem to respond to shaping and
reinforcement as to no other treatment. Many individual therapists have been extremely successful in solving
the emotional problems of patients by using behavioral techniques. The effectiveness, at least in some
circumstances, of simply altering behavior rather than delving into its origins has contributed to the rise of
family therapy, in which every family member's behavior is looked at, not just the behavior of the one who
seems most obviously in distress. This makes eminent good sense.
Teaching machines and programmed textbooks derived from Skinnerian theory were early attempts to
shape learning step by step and to reinforce the student for correct responses. These early mechanisms were
clumsy but led directly to CAI, Computer-Assisted Instruction, which is great fun because of the amusing
nature of the reinforcers (fireworks, dancing robots) and highly effective because of the computer's perfect
timing. Reinforcement programs using tokens or chits that can be accumulated and traded for candy,
cigarettes, or privileges have been established in mental hospitals and other institutions. Self-training
programs for weight control and other habit changes abound. Effective educational systems based on
principles of shaping and reinforcement, such as Precision Teaching and Direct Instruction, are making
inroads in our schools. And biofeedback is an interesting application of reinforcement to training of
Academicians have studied the most minute aspects of conditioning. One finding shows, for example, that
if you make a chart to keep track of your progress in some self-training program, you will be more likely to
maintain new habits if you solidly fill in a little square every day on the chart, rather than just putting a check
mark in the square.
This absorption with detail has valid psychological purposes, but one does not often find much good
training in it. Training is a loop, a two-way communication in which an event at one end of the loop changes
events at the other, exactly like a cybernetic feedback system; yet many psychologists treat their work as
something they do to a subject, not with the subject. To a real trainer, the idiosyncratic and unexpected
responses any subject can give are the most interesting and potentially the most fruitful events in the training
process; yet almost all experimental work is designed to ignore or minimize individualistic responses.
Devising methods for what Skinner named shaping, the progressive changing of behavior, and carrying out
those methods, is a creative process. Yet the psychological literature abounds with shaping programs that are
so unimaginative, not to say ham-handed, that they constitute in my opinion cruel and unusual punishment.
Take, for example, in one recent journal, a treatment for bed-wetting that involved not only putting "wetness"
sensors in the child's bed but having the therapist spend the night with the child! The authors had the grace to
say apologetically that it was rather expensive for the family. How about the expense to the child's psyche?
This kind of "behavioral" solution is like trying to kill flies with a shovel.
Schopenhauer once said that every original idea is first ridiculed, then vigorously attacked, and finally
taken for granted. As far as I can see, reinforcement theory has been no exception. Skinner was widely
ridiculed years ago for demonstrating shaping by developing a pair of Ping-Pong-playing pigeons. The warm,
comfortable, self-cleansing, entertainment-providing crib he built for his infant daughters was derided as an
inhumane "baby box," immoral and heretical. Rumors still go around that his daughters went mad, when in
fact both of them are successful professional women and quite delightful people. Finally, nowadays many
educated people treat reinforcement theory as if it were something not terribly important that they have
known and understood all along. In fact most people don't understand it, or they would not behave so badly to
the people around them.
In the years since my dolphin-training experiences, I have lectured and written about the laws of
reinforcement in academic and professional circles as well as for the general public. I've taught this kind of
training to high school, college, and graduate students, to housewives and zookeepers, to family and friends,
and, in weekend seminars, to several thousand dog owners and trainers. I have watched and studied all kinds
of other trainers, from cowboys to coaches, and I've noticed that the principles of reinforcement training are
gradually seeping into our general awareness. Hollywood animal trainers call the use of positive
reinforcement "affection training" and are using these techniques to accomplish behaviors impossible to
obtain by force such as many of the behaviors of pigs and other animals in the movie Babe. Many Olympic
coaches nowadays use positive reinforcement and shaping, instead of relying on old-fashioned browbeating,
and have achieved notable improvements in performance.
Nowhere, however, have I found the rules of reinforcement theory written down so that they could be of
use in immediate practical situations. So here they are, explained in this book as I understand them and as I
see them used and misused in real life.
Reinforcement training does not solve all problems--it will not fatten your bank account, it cannot save a
bad marriage, and it will not overhaul serious personality disorders. Some situations, such as a crying baby,
are not training problems and require other kinds of solutions. Some behaviors, in animals and people, have
genetic components that may be difficult or impossible to modify by training. Some problems are not worth
the training time. But with many of life's challenges, tasks, and annoyances, correct use of reinforcement can
Using positive reinforcers in one situation may show you how to use them in others. As a dolphin
researcher whom I worked with sourly put it, "Nobody should be allowed to have a baby until they have first
been required to train a chicken," meaning that the experience of getting results with a chicken, an organism
that cannot be trained by force, should make it clear that you don't need to use punishers to get results with a
baby. And the experience should give you some ideas about reinforcing baby behavior you want.
I have noticed that most dolphin trainers, who must develop the skills of using positive reinforcers in their
daily work, have strikingly pleasant and agreeable children. This book will not guarantee you agreeable
children. In fact, it promises no specific results or skills. What it will give you is the fundamental principles
underlying all training, and some guidelines on how to apply these principles creatively in varying situations.
It may enable you to clear up annoyances that have been bothering you for years, or to make advances in
areas where you have been stymied. It will certainly, if you wish, enable you to train a chicken.
There seems to be a natural order to reinforcement training. These chapters come in the sequence in which
training events, from simple to complex, really take place, and this is also the sequence in which people seem
to learn most easily to be real trainers. The organization of this book is progressive in order to develop a
comprehensive understanding of training with positive reinforcers. Its applications, however, are meant to be
practical. Throughout the book's chapters real-life situations are offered as illustrations. Specific methods
should be treated as suggestions or inspirations, rather than as definitive instructions.
1--Reinforcement: Better than Rewards
What Is a Positive Reinforcer?
A reinforcer is anything that, occurring in conjunction with an act, tends to increase the probability that
the act will occur again.
Memorize that statement. It is the secret of good training.
There are two kinds of reinforcers: positive and negative. A positive reinforcer is something the subject
wants, such as food, petting, or praise. A negative reinforcer is something the subject wants to avoid--a
blow, a frown, an unpleasant sound. (The warning buzzer in a car if you don't fasten your seat belt is a
Behavior that is already occurring, no matter how sporadically, can always be intensified with positive
reinforcement. If you call a puppy and it comes, and you pet it, the pup's coming when called will become
more and more reliable even without any other training. Suppose you want someone to telephone you--your
offspring, your parent, your lover. If he or she doesn't call, there isn't much you can do about it. A major point
in training with reinforcement is that you can't reinforce behavior that is not occurring. If, on the other hand,
you are always delighted when your loved ones do call, so that the behavior is positively reinforced, the
likelihood is that the incidence of their calling will probably increase. (Of course, if you apply negative
reinforcement--"Why haven't you called, why do I have to call you, you never call me," and so on, remarks
likely to annoy--you are setting up a situation in which the caller avoids such annoyance by not calling you;
in fact, you are training them not to call.)
Simply offering positive reinforcement for a behavior is the most rudimentary part of reinforcement
training. In the scientific literature, you can find psychologists saying, "Behavioral methods were used," or,
"The problem was solved by a behavioral approach." All this means, usually, is that they switched to positive
reinforcement from whatever other method they were using. It doesn't imply that they used the whole bag of
tricks described in this book; they may not even be aware of them.
Yet switching to positive reinforcement is often all that is necessary. It is by far the most effective way to
help the bed-wetter, for example: private praise and a hug for dry sheets in the morning, when they do occur.
Positive reinforcement can even work on yourself. At a Shakespeare study group I once belonged to I met
a Wall Street lawyer in his late forties who was an avid squash player. The man had overheard me chatting
about training, and on his way out the door afterward he remarked that he thought he would try positive
reinforcement on his squash game. Instead of cursing his errors, as was his habit, he would try praising his
Two weeks later I ran into him again. "How's the squash game?" I asked. A look of wonder and joy
crossed his face, an expression not frequently seen on Wall Street lawyers.
"At first I felt like a damned fool," he told me, "saying 'Way to go, Pete, attaboy' for every good shot.
Hell, when I was practicing alone, I even patted myself on the back. And then my game started to get better.
I'm four rungs higher on the club ladder than I've ever been. I'm whipping people I could hardly take a point
from before. And I'm having more fun. Since I'm not yelling at myself all the time, I don't finish a game
feeling angry and disappointed. If I made a bad shot, never mind, good ones will come along. And I find I
really enjoy it when the other guy makes a mistake, gets mad, throws his racquet--I know it won't help his
game, and I just smile ... "
What a fiendish opponent. And just from switching to positive reinforcement.
Reinforcers are relative, not absolute. Rain is a positive reinforcer to ducks, a negative reinforcer to cats,
and a matter of indifference, at least in mild weather, to cows. Food is not a positive reinforcer if you're full.
Smiles and praise may be useless as reinforcers if the subject is trying to get you mad. In order to be
reinforcing, the item chosen must be something the subject wants.
It is useful to have a variety of reinforcers for any training situation. At the Sea World oceanariums, killer
whales are given many reinforcers, including fish (their food), stroking and scratching on different parts of
the body, social attention, toys, and so on. Whole shows are run in which the animals never know which
behavior will be reinforced next or what the reinforcer will be; the "surprises" are so interesting for the
animals that the shows can be run almost entirely without the standard fish reinforcers; the animals get their
food at the end of the day. The necessity of switching constantly from one reinforcer to another is
challenging and interesting for the trainers, too.
Positive reinforcement is good for human relationships. It is the basis of the art of giving presents:
guessing at something that will be definitely reinforcing (guessing right is reinforcing for the giver, too). In
our culture, present giving is often left to women. I even know of one family in which the mother buys all the
Christmas presents to and from everyone. It causes amusement on Christmas morning, brothers and sisters
saying, "Let's see, this is from Anne to Billy," when everyone knows Anne had nothing to do with it. But it
does not sharpen the children's skills at selecting ways to reinforce other people.
In our culture a man who has become observant about positive reinforcement has a great advantage over
other men. As a mother, I made sure that my sons learned how to give presents. Once, for example, when
they were quite young, seven and five, I took them to a rather fancy store and had them select two dresses,
one each, for their even younger sister. They enjoyed lolling about in the plush chairs, approving or
disapproving of each dress as she modeled it. Their little sister enjoyed it too; and she had the ultimate veto
power. And so, thanks to this and similar exercises, they all learned how to take a real interest in what other
people want; how to enjoy finding effective positive reinforcers for the people you love.
A reinforcer is something that increases a behavior; but it doesn't have to be something the learner wants.
Avoiding something you dislike can be reinforcing, too. Laboratory research shows that behavior can be
increased by aversive stimuli if a change in behavior will make the aversive stimulus go away. Such stimuli
are called negative reinforcers: things a person or animal will work to avoid.
Negative reinforcers may consist of the mildest of aversive stimuli--a derisive glance from a friend when
you make a poor joke, or a slight draft from an air conditioner that causes you to get up and move to another
chair. However, even very extreme aversives, from public humiliation to electric shock, may function as
negative reinforcers as well as being punishing experiences. We may experience being yelled at as highly
punitive, but we also quickly learn to come in to work the back way when the boss who has often yelled at us
is standing in the front door.
Negative reinforcers are aversives that can be halted or avoided by changing behavior. As soon as the new
behavior starts, the aversive stimulus stops, and thus the new behavior is strengthened. Suppose that while
sitting in my aunt's living room, I happen to put my feet on the coffee table as I would at home. My aunt
raises a disapproving eyebrow. I put my feet on the floor again. Her face relaxes. I feel relieved.
Reinforcement: Better than Rewards
The raised eyebrow was an aversive stimulus acting as a negative reinforcer. Because I was able to halt
the aversive stimulus, the new behavior--keeping my feet on the floor--is more likely to occur again, at
least at my aunt's house, but possibly in other houses, too.
Training can be done almost entirely with negative reinforcers, and much traditional animal training is
done exactly that way The horse learns to turn left when the left rein is pulled, because the annoying pressure
in its mouth ceases when the turn is made. The lion backs onto a pedestal and stays there, to avoid the
intrusive whip or chair held near its face.
Negative reinforcement, however, is not the same as punishment. So what is the difference? In the first
edition of this book I wrote that punishment is an aversive stimulus that occurs after the behavior it was
meant to modify, and therefore it can have no effect on the behavior. "A boy being spanked for a bad report
card may or may not get better report cards in the future, but he surely can't change the one he has just
brought home." Indeed, when we punish with intent, we frequently do it far too late, but that is not the actual
difference between punishment and negative reinforcement.
Modern behavior analysts identify punishment as any event that stops behavior. A baby starts to put a
hairpin into the electric socket. His mother grabs him and/or slaps his hand away from the socket: this life-
threatening behavior has to be interrupted now. The behavior stops. Lots of other things may start--the baby
cries, the mother feels bad, and so on--but the hairpin-in-electric-outlet behavior ceases, at least for that
moment. That's what punishment does.
B. F. Skinner was more precise. He defined punishment as what happens when a behavior results in the
loss of something desirable--the pleasure of investigating if this object can fit into that hole, a popular
pastime with babies--or when the behavior results in the delivery of something undesirable. However, in
both cases, while the ongoing behavior stops, there is no predictable outcome in the future. We know that
reinforcers strengthen behavior in the future, but a punisher will not result in predictable changes.
For example, will grabbing the baby or smacking his hand, even if his mother's timing is perfect,
guarantee that the baby won't try sticking things into outlets again? I doubt it. Ask any parent. What really
happens is that we pick up small objects, we put covers over the wall outlets, or we move furniture in front of
them, and eventually the baby outgrows this particular urge.
The behavior analysts look at it this way. Reinforcement and punishment are each a process, defined by
results. Negative reinforcers can be used effectively to train behavior, and even though aversive stimuli are
involved, the process can be relatively benign. Here (with thanks to llama expert Jim Logan) is a nice use of
the negative reinforcer with a semidomestic animal, the llama, kept in the United States as pets and
elsewhere as pack animals and for their wool.
Llamas are timid and shy, like horses. Unless handled a lot when young, they can be hard to approach. So,
while operant conditioning with a food reinforcer works splendidly with llamas, if a llama is too skittish to
come close enough to a person to take the food, here's what modern llama trainers do. They use a clicker as a
signal to tell the llama that what it is doing has earned a reinforcer, but the primary or real reinforcer is the
removal of a negative reinforcer, an aversive.
In effect, you say to the llama, "Will you stand still if I approach within thirty feet? Yes? Good. I'll click
my clicker and turn and go farther away.
"Now will you stand still if I approach within twenty-five feet? Yes? Good. I'll click and go away."
Using the click to mark the behavior of standing still, with the scary person turning and going away again
as the reinforcer, one can sometimes get within touching distance in five or ten minutes. The llama, as it
were, is in control. As long as it stands still, it can make you go away! So it stands still, even when the person
is right next to it.
When one has touched the llama several times and then retreated, the ice is broken. This person is no
longer as scary. Now it's time for the feed bucket. The communication loop becomes "May I touch you while
you stand still? Yes? Click and here's some delicious food." And the llama is on its way to earning positive
reinforcers, including food and scratching and petting, with its splendid new behavior of standing still instead
of heading for the next county.
This use of retreat, or easing back when the desired behavior occurs, is an important aspect of most of the
so-called "horse whisperer" techniques. In most of these methods the trainer works with a loose horse in a
confined area and proceeds in a relatively short time to transform a horse in flight to a horse calmly accepting
a human. The horse, once perhaps completely wild, becomes so calm, even accepting a saddle and rider, that
the total effect is magical.
Trainers who use these techniques often have superstitious explanations for what is happening; and while
many have formed the habit of making some sound or motion that functions as the marker signal or the
conditioned reinforcer, few are consciously aware of doing so. Nevertheless, it is not magic at work; it is the
laws of operant conditioning.
While negative reinforcement is a useful process, it's important to remember that each instance of negative
reinforcement also contains a punisher. When you pull on the left rein, until the moment that the horse turns,
you are punishing going straight ahead. Overuse of negative reinforcers and other aversives can lead to what
Murray Sidman, Ph.D., calls "fallout," the undesirable side effects of punishment (see Chapter 4).
Timing of Reinforcers
As already stated, a reinforcer must occur in conjunction with the act it is meant to modify. The timing of
the arrival of the reinforcer is information. It tells the learner exactly what it is you like. When one is trying
to learn, the informational content of a reinforcer becomes even more important than the reinforcer itself. In
coaching athletes or training dancers, it is the instructor's shouted "Yes!" or "Good!," marking a movement
as it occurs, that truly gives the needed information--not the debriefing later in the dressing room.
Laggardly reinforcement is the beginning trainers biggest problem. The dog sits, but by the time the
owner says "Good dog," the dog is standing again. What behavior did "Good dog" reinforce? Standing up.
Whenever you find yourself having difficulties in a training situation, the first question to ask yourself is
whether you are reinforcing too late. If you are working with a person or an animal and are caught up in the
thick of the action, it sometimes helps to have someone else watch for late reinforcers.
We are always reinforcing one another too late. "Gee, honey, you looked great last night" is quite
different from the same comment said at the moment. The delayed reinforcer may even have deleterious
effect ("What's the matter, don't I look great now?"). We have a touching trust in the powers of words to
cover our lapses in timing.
Reinforcing too early is also ineffective. At the Bronx Zoo the keepers were having trouble with a gorilla.
They needed to get it into its outdoor pen in order to clean the indoor cage, but it had taken to sitting in the
doorway, where with its enormous strength it could prevent the sliding door from being closed. When the
keepers put food outside, or waved bananas enticingly, the gorilla either ignored them or snatched the food
and ran back to its door before it could be shut. A trainer on the zoo staff was asked to look at the problem.
He pointed out that banana waving and the tossing in of food were attempts to reinforce behavior that hadn't
occurred yet. The name for this is bribery. The solution was to ignore the gorilla when it sat in the door, but
to reinforce it with food whenever it did happen to go out by itself. Problem solved.
Sometimes, I think, we reinforce children too soon under the misimpression that we are encouraging them
("Atta girl, that's the way, you almost got it right"). What we may be doing is reinforcing trying. There is a
difference between trying to do something and doing it. Wails of "I can't" may sometimes be a fact, but they
may also be symptoms of being reinforced too often merely for trying. In general, giving gifts, promises,
compliments, or whatever for behavior that hasn't occurred yet does not reinforce that behavior in the
slightest. What it does reinforce is whatever was occurring at the time: soliciting reinforcement, most likely.
Timing is equally important when training with negative reinforcers. The horse learns to turn left when
the left rein is pulled, but only if the pulling stops when it does turn. The cessation is the reinforcer. You get
on a horse, kick it in the sides, and it moves forward; you should then stop kicking (unless you want it to
move faster). Beginning riders often thump away constantly, as if the kicking were some kind of gasoline
necessary to keep the horse moving. The kicking does not stop, so it contains no information for the horse.
Thus are developed the iron-sided horses in riding academies that move at a snail's pace no matter how often
they are kicked.
The same applies to people getting nagged and scolded by parents, bosses, or teachers. If the negative
reinforcer doesn't cease the instant the desired result is achieved, it is neither reinforcing nor information. It
becomes, both literally and in terms of information theory, "noise."
Watching football and baseball on TV, I am often struck by the beautifully timed reinforcers that the
players receive again and again. As a touchdown is made, as the runner crosses home plate, the roar of the
crowd signals unalloyed approval; and the instant a score is made or a game is won, just watch the frenzied
exchange of mutual reinforcers among the players. It is quite different for actors, especially movie actors.
Even on stage the applause comes after the job is done. For movie actors, except for occasional response
from a director or camera operator or grip, there is no timely reinforcement; fan letters and good reviews,
arriving weeks or months later, are pallid compared with all of Yankee Stadium going berserk at the moment
of success. No wonder some stars often exhibit a seemingly neurotic craving for adulation and thrills; the
work can be peculiarly unsatisfying because the reinforcers, however splendid, are always "late."
Size of Reinforcer
Beginning trainers who use food reinforcement with animals are often confused as to how big each
reinforcer should be. The answer is: as small as you can get away with. The smaller the reinforcer, the more
quickly the animal eats it. Not only does this cut down on waiting time, it also allows for more reinforcers
per session, before the animal becomes satiated. In 1979 I was hired as a consultant by the National
Zoological Park in Washington, D.C., to teach positive reinforcement techniques to a group of zoo
employees. One of the keepers in my training class complained that her training of the panda had been
proceeding too slowly. I thought this odd because intuitively I felt that pandas--big, greedy, active animals--
should be easy to train with a reinforcer of food. I watched a session and found that while the keeper was
gradually succeeding in shaping a body movement, she was giving the panda a whole carrot for each
reinforcement. The panda took its own sweet time eating each carrot, so that in fifteen minutes of valuable
keeper time it had earned only three reinforcers (and was incidentally getting tired of carrots). A single slice
of carrot per reinforcement would have been better.
In general, a reinforcer that constitutes one small mouthful for that animal is enough to keep it interested
--a grain or two of corn for a chicken, a quarter-inch cube of meat for a cat, half an apple for an elephant.
With an especially preferred food you can go even smaller--a teaspoon of grain for a horse, for example.
Keepers at the National Zoo have trained their polar bears to do many useful things, such as moving to
another cage on command, with raisins.
A trainer's rule of thumb is that if you are going to have only one training session a day, you can count on
the animal working well for about a quarter of its rations; you then give it the rest for free. If you can get in
three or four sessions a day, you can divide the normal amount of food into about eighty reinforcers and give
twenty or thirty in each session. Eighty reinforcers seems to be about the maximum for any subject's interest
during any one day. (Perhaps that's why slide trays usually hold eighty slides; I know I always groan if a
lecturer asks the projectionist for the second tray of slides.)
The difficulty of the task also has some effect on the size of the reinforcer. At Sea Life Park we found it
necessary to give each of our whales a large mackerel for their Olympic-effort, twenty-two-foot straight-up
jump. They simply refused to do it for the usual reinforcer of two small smelt. For people, sometimes if not
always, harder jobs get bigger rewards. And how we hate it when they don't, if we are the ones doing the
One extremely useful technique with food or any other reinforcement, for animals or people, is the
jackpot. The jackpot is a reward that is much bigger, maybe ten times bigger, than the normal reinforcer, and
one that comes as a surprise to the subject. At an ad agency where I once worked we had the usual office
party at Christmas, as well as informal celebrations to signalize the completion of a big job or the signing of a
new client. But the president was also in the habit of throwing one or two totally unexpected parties a year.
Suddenly in midafternoon he would stride through all the offices, yelling for everyone to stop working. The
switchboard was closed down, and in came a procession of caterers, musicians, bartenders, champagne,
smoked salmon, the works: just for us and for no special reason. It was an unexpected jackpot for fifty
people. It contributed vastly, I thought, to the company's high morale.
A jackpot may be used to mark a sudden breakthrough. In the case of one horse trainer I know, when a
young horse executes a difficult maneuver for the first time, the man leaps from its back, snatches off saddle
and bridle, and turns the horse loose in the ring--a jackpot of complete freedom, which often seems to make
the new behavior stick.
Paradoxically, a single jackpot may also be effective in improving the response of a recalcitrant, fearful,
or resistant subject that is offering no desirable behavior at all. At Sea Life Park we were doing some U.S.
Navy-funded research that involved reinforcing a dolphin for new responses, instead of old, previously
trained behaviors. Our subject was a docile animal named Hou that rarely offered new responses. When she
failed to get reinforced for what she did offer, she became inactive, and finally in one session she went twenty
minutes offering no responses at all. The trainer finally tossed her two fish "for nothing." Visibly startled by
this largesse, Hou became active again and soon made a movement that could be reinforced, leading to real
progress in the next few sessions.
I had the same experience as that dolphin myself once. When I was fifteen, my greatest pleasure in life
was riding lessons. The stables where I rode sold tickets, ten lessons on a ticket. From my allowance I could
afford one ticket a month. I was living with my father, Philip Wylie, and my stepmother, Ricky, at the time;
and although they were very good to me, I had entered one of those adolescent periods in which one practices
being as truculent and disagreeable as possible for days on end. One evening the Wylies, being loving and
ingenious parents, told me that they were pretty tired of my behavior, and that what they had decided to do
was reward me.
They then presented me with a brand-new, extra, free riding ticket. One of them had taken the trouble of
going to the stables to buy it. Wow! An undeserved jackpot. As I recall, I shaped up on the spot, and Ricky
Wylie confirmed that memory as I was writing this book many years later.
Why the unearned jackpot should have had such abrupt and long-reaching effects, I do not fully
understand: Perhaps someone will do a Ph.D. dissertation on the matter someday and explain it to us. I do
know that the extra riding ticket instantly relieved in me some strong feelings of oppression and resentment,
and I suspect that's exactly how that dolphin felt, too.
It often happens, especially when training with food reinforcers, that there is absolutely no way you can
get the reinforcer to the subject during the instant it is performing the behavior you wish to encourage. If I
am training a dolphin to jump, I cannot possibly get a fish to it while it is in midair. If each jump is followed
by a thrown fish with an unavoidable delay, eventually the animal will make the connection between jumping
and eating and will jump more often. However, it has no way of knowing which aspect of the jump I liked.
Was it the height? The arch? Perhaps the splashing reentry? Thus it would take many repetitions to identify
to the animal the exact sort of jump I had in mind. To get around this problem, we use conditioned
A conditioned reinforcer is some initially meaningless signal--a sound, a light, a motion--that is
deliberately presented before or during the delivery of a reinforcer. Dolphin trainers have come to rely on the
police whistle as a conditioned reinforcer; it is easily heard, even underwater, and it leaves one's hands free
for signaling and fish throwing. With other animals I frequently use a cricket, the dime-store party toy that
goes click-click when you press it, or a particular praise word, selected and reserved for the purpose of acting
as a conditioned reinforcer: "Good dog," "Good pony." Schoolteachers often arrive at some such ritualized
and carefully rationed word of commendation--"That's fine" or "Very good"--for which the children
anxiously work and wait.
Conditioned reinforcers abound in our lives. We like to hear the phone ring or see a full mailbox, even if
half our calls are no fun or most of our mail is junk mail, because we have had numerous occasions to learn to
relate the ringing or the envelopes to good things. We like Christmas music and hate the smell of dentists'
offices. We keep things around us--pictures, dishes, trophies--not because they are beautiful or useful but
because they remind us of times when we were happy or of people we love. They are conditioned
Practical animal training that uses positive reinforcement should almost always begin with the
establishment of a conditioned reinforcer. Before the start of any real training of behavior, while the subject
is doing nothing in particular, you teach it to understand the significance of the conditioned reinforcer by
pairing it with food, petting, or other real reinforcers. You can tell, incidentally, at least with animals, when
the subject has come to recognize your signal for "Good!" It visibly startles on perceiving the conditioned
reinforcer and begins seeking the real reinforcer. With the establishment of a conditioned reinforcer, you
have a real way of communicating exactly what you like in the animal's behavior. So you do not need to be
Dr. Dolittle to talk to the animals; you can "say" an amazing amount with such trained reinforcement.
Conditioned reinforcers become immensely powerful. I have seen marine mammals work long past the
point of satiety for conditioned reinforcers, and horses and dogs work for an hour or more with few primary
reinforcers. People, of course, work endlessly for money, which is after all only a conditioned reinforcer, a
token for the things it can buy--even, or perhaps especially, people who have already earned more money
than they can actually spend, who have accordingly become addicted to the conditioned reinforcer.
One can make a conditioned reinforcer more powerful by pairing it with several primary reinforcers. The
subject at that moment may not want food, say, but if the same reinforcing sound or word has also been