An Animal
Trainer's Introduction To
Operant and
Classical Conditioning
Part Two
Stacy
Braslau-Schneck, MA
Positive
Reinforcement
This
is possibly the easiest, most effective consequence
for a trainer to control (and easy to understand,
too!). Positive reinforcement means starting or adding
Something Good, something the animal likes or enjoys.
Because the animal wants to gain that Good Thing again,
it will repeat the behavior that seems to cause that
consequence.
Examples
of positive reinforcement:
The
dolphin gets a fish for doing a trick. The worker
gets a paycheck for working. The dog gets a piece
of liver for returning when called. The cat gets comfort
for sleeping on the bed. The wolf gets a meal for
hunting the deer. The child gets dessert for eating
her vegetables. The dog gets attention from his people
when he barks. The elephant seal gets a chance to
mate for fighting off rivals. The child gets ice cream
for begging incessantly. The toddler gets picked up
and comforted for screaming. The dog gets to play
in the park for pulling her owner there. The snacker
gets a candy bar for putting money in the machine.
Secondary
positive reinforcers and Bridges
A
primary positive reinforcer is something
that the animal does not have to learn to like. It
comes naturally, no experience necessary. Primary
R+s usually include food, often include sex (the chance
to mate), the chance to engage in instinctive behaviors,
and for social animals, the chance to interact with
others.
A
secondary positive reinforcer is something
that the animal has to learn to like. The learning
can be accomplished through Classical Conditioning
or through some other method. A paycheck is a secondary
reinforcer - just try writing a check to reward a
young child for potty training!
Animal
trainers will often create a special secondary reinforcer
they call a bridge. A bridge is a stimulus that has been associated
with a primary reinforcer through classical conditioning.
This process creates a conditioned positive reinforcer, often called a conditioned reinforcer or CR for short. Animals that have learned a bridge
react to it almost as they would to the reward that
follows (animals that have learned what clicker training
is all about may somethings prefer the CR that tells
them they got it right to the actual "reward").
Schedules
of Reinforcement,
and Extinction
A
schedule of reinforcement determines how often a behavior
is going to result in a reward. There are five kinds:
fixed interval, variable interval, fixed ratio, variable
ratio, and random.
A
fixed interval means that a reward will occur after a fixed
amount of time. For example, every five minutes. Paychecks
work on this schedule - every two weeks I got one.
A
variable interval schedule means that reinforcers will be distributed
after a varying amount of time. Sometimes it will
be five minutes, sometimes three, sometimes seven,
sometimes one. My e-mail account works on this system
- at varying intervals I get new mail (for me this
is a Good Thing!).
A
fixed ratio
means that if a behavior is performed X number of
times, there will be one reinforcement on the Xth
performance. For a fixed ratio of 1:3, every third
behavior will be rewarded. This type of ratio tends
to lead to lousy performance with some animals and
people, since they know that the first two performances
will not be rewarded, and the third one will be no
matter what. Some assembly-line production systems
work on this schedule - the worker gets paid for every
10 widgets she makes. A fixed ratio of 1:1 means that
every correct performance of a behavior will be rewarded.
A
variable ratio schedule means that reinforcers are distributed
based on the average number of correct behaviors.
A variable ratio of 1:3 means that on average,
one out of every three behaviors will be rewarded.
It might be the first. It might be the third. It might
even be the fourth, as long as it averages out to
one in three This is often referred to as a variable
schedule of reinforcement or VSR (in other words,
it's often assumed that when someone writes "VSR"
they are refering to a variable ratio schedule
of reinforcement).
With
a random schedule,
there is no correlation between the animal's behavior
and the consequence. This is how Fate works.
If
reinforcement fails to occur after a behavior that
has been reinforced in the past, the behavior might
extinguish. This process is called extinction. A variable ratio schedule of reinforcement
makes the behavior less vulnerable to extinction.
If you're not expecting to gain a reward every time
you accomplish a behavior, you are not likely to stop
the first few times your action fails to generate
the desired consequence. This is the principle that
slot machines are based on. "OK, I didn't win
this time, but next time I'm almost sure
to win!"
When
a behavior that has been strongly reinforced in the
past no longer gains a reinforcement, you might experience
what's call an extinction
burst. This is when the animal performs the behavior
over and over again, in a burst of activity. Extinction
bursts are something for trainers to watch out for!
Recently
Bob Bailey has cautioned against needlessly using
variable schedules. Most useful behaviors, he points
out, will get some sort of reinforcement every time.
You might not always click and treat your
dog for sitting on cue, but you will always reward
it with some recognition and praise ("Good dog!").
If there is some circumstances where you will be unable
to deliver any reinforcement (during a long
sequence of behaviors, or when the animal is out of
contact), then you will need to build a buffer against
extinction with a VSR. Otherwise, don't bother.
Cautions
in using positive reinforcement
If
the animal is acting out of fear, you may be rewarding
the fear response. This can happen when you coddle
a shy dog.
The
timing must be good. If the animal did a great "stay"
and you reward after the release, you are rewarding
getting up.
The
reward has to be sufficient to motivate a repetition.
Mild praise won't be enough for some animals. Others
require the richest of food rewards, etc.
Reinforcements
can become associated with the person giving them.
If the animal realizes that he can't get any rewards
without you present, he will not be motivated to act.
Animals
can get sated with the reward you're offering when
they've had enough, and it will no longer be motivating.
Reinforcers
incrase behavior. If you don't want your animal actively
traying out new behaviors ("throwing behaviors
at the trainer"), don't use positive reinforcement.
Use a positive reinforcement to train an animal to
do something.
Negative
Punishment
Negative
punishment is reducing behavior by taking away Something
Good. If the animal was enjoying or depending on Something
Good she will work to avoid it getting taken away.
They are less likely to repeat a behavior that results
in the loss of a Good Thing. This type of consequence
is a little harder to control.
Examples
The
child has his crayons taken away for fighting with
his sister. The window looking into the other monkey's
enclosure is shut when the first monkey bites the
trainer. "This car isn't getting any closer to
Disneyland while you kids are fighting!" The
dog is put on leash and taken from the park for coming
to the owner when the owner called. The teenager is
grounded for misbehavior. The dolphin trainer walks
away with the fishbucket when the dolphin acts aggressive.
"I'm not talking to you after what you did!"
Xena cuts off the air of an opponent until he tells
her what she wants.
Secondary
Negative Punishers
Trainers
seldom go to the trouble of associating a particular
cue with negative punishment. It's sometimes called
a "delta", from SD or discriminative
stimulus. Some dog owners make the mistake of calling
their dogs in the park and then using the negative
punishment of taking the dog away from the fun. "Fido,
come!" then becomes a conditioned negative punisher.
My mom conditioned a similar CP- as "Time to
go!".
Positive
Punishment
Positive
punishment is something that is applied to reduce
a behavior. The term "positive" often confuses
people, because in common terms "positive"
means something good, upbeat, happy, pleasant, rewarding.
Remember, this is technical terminology we're using,
though, so here "positive" means "added"
or "started". Also keep in mind that in
these terms, it is not the animal that is
"punished" (treated badly to pay for some
moral wrong), but the behavior that is punished
(reduced). Positive punishment, when applied correctly,
is the most effective way to stop unwanted behaviors.
Its main flaw is that it does not teach specific alternative
behaviors.
Examples
Our
society seems to have a great fondness for positive
punishment, in spite of all the problems associated
with it (see below). The peeing on the rug (by a puppy)
is punished with a swat of the newspaper. A dog's
barking is punished with a startling squirt of citronella.
The driver's speeding results in a ticket and a fine.
The baby's hand is burned when she touches the hot
stove. Walking straight through low doorways is punished
with a bonk on the head. In all of these cases, the
consequence (the positive punishment) reduces the
behavior's future occurences.
Secondary
Positive Punishers
Because
a positive punisher, like other consequences, must
follow a behavior immediately or be clearly connected
to the behavior to be effective, a secondary positive
punisher is very important. (This is especially true
if the punisher is going to be something highly aversive
or painful). Many dog trainers actively condition
the word "No!" with some punisher, to form
an association between the word and the consequence.
The conditioned punisher (CP+) is an important part
of training with Operant Conditioning.
Cautions
in using Positive Punishment
Behaviors
are usually motivated by the expectation for some
reward, and even with a punishment, the motivation
of the reward is often still there. For example, a
predator must face some considerable risk and pain
in order to catch food. A wild dog must run over rough
ground and through bushes, and face the hooves, claws,
teeth, and/or horns of their prey animals. They might
be painfully injured in their pursuit. In spite of
this, they continue to pursue prey. In this case,
the motivation and the reward far outweigh the punishments,
even when they are dramatic.
The
timing of a positive punishment must be exquisite.
It must correspond exactly with the behavior for it
to have an effect. (If a conditioned punisher is used,
the CP+ must occur precisely with the behavior). If
you catch your dog chewing on the furniture and you
hit him when he comes to you, you are suppressing
coming to you. The dog will not make
the connection between the punishment and the chewing
(no matter how much you point at the furniture).
The
aversive must be sufficient to stop the behavior in
its tracks - and must be greater than the reward.
The more experience the animal has with a rewarding
consequence for the behavior, the greater the aversive
has to be to stop or decrease the behavior. If you
start with a small aversive (mild electric shock or
a stern talking-to) and build up to a greater one
(strong shock or full-on yelling), your trainee may
become adjusted to the aversive and it will not have
any greater effect.
Punishments may become associated with the person
supplying them. The dog who was hit after chewing
on the furniture may still chew on the furniture,
but he certainly won't do it when you're around!
Physical
punishments can cause physical damage, and mental
punishments can cause mental damage. You should only
apply as much of an aversive as it takes to stop the
behavior. If you find you have to apply a punishment
more than three times for one behavior, without any
decrease in the behavior, you are not "reducing
the behavior", you are harassing (or abusing)
the trainee.
Punishers
suppress behaviors. Use positive punishment
to train an animal not to do something.
Negative
Reinforcement
Negative
reinforcement increases a behavior by ending or taking
away Something Bad or aversive. By making the animal's
circumstances better, you are rewarding it and increasing
the likelihood that it will repeat the behavior that
was occurring when you ended the Bad Thing.
In
order to use negative reinforcement, the trainer must
be able to control the Bad Thing that is being taken
away. This often means that the trainer must also
apply the Bad Thing. And applying a Bad Thing might
reduce whatever behavior was going on when the Bad
Thing was applied. And reducing a behavior by applying
a Bad Thing is positive punishment. So when
you start your Bad Thing that you're going to end
as a negative reinforcer, you run the risk of punishing
some other behavior.
One
of the major results of taking away Something Bad
is often relief. So another way to think
of negative reinforcement is that you are providing
relief to the animal but of course, this makes it
an example of positive reinforcement - you
are providing Something Good - relief. Confusing?
Examples
The
choke collar is loosened when the dog moves closer
to the trainer. The ear pinch stops when the dog takes
the dumbbell. The reins are loosened when the horse
slows down. The car buzzer turns off when you put
on your seatbelt. Dad continues driving towards Disneyland
when the kids are quiet. "I'm not talking to
you until you apologize!" The hostage is released
when the ransom is paid. The torture is stopped when
the victim confesses. "Why do I keep hitting
my head against the wall? 'Cause it feels so good
when I stop!" The baby stops crying when his
mom feeds him.
Secondary
Negative Reinforcers
Trainers
seldom go to the trouble of associating a particular
cue with negative reinforcement. You can still go
ahead and do it.
Internal
Reinforcers and Punishers
Trainers
can not control all reinforcers and punishers, unfortunately.
There are a number of environmental factors that are
going to affect the animal's behavior that you have
no control over, but which will still be a significant
consequence for your trainee.
Some
of these come from the animal's internal environment
- their own reactions. Relief from stress, pain, or
boredom are common reinforcers and some "self-reinforcing"
behaviors are actually maintained because of this.
Examples are a dog barking because it relieves boredom,
or a person chewing on her fingers or smoking a cigarette
because it relieves stress. Drivers speed because
it is fun. Guilt is an internal punisher
that some people experience.
"No
Reward Markers" and "Keep Going Signals"
There's
actually a fifth possible consequence to any behavior:
nothing. You push the button and nothing happens.
You raise your hand and the teacher doesn't call on
you. You get no response to your e-mail, your proposal,
or your job application. The question you then have
is, did no one notice your behavior? Or was it just
not worthy of a reinforcement?
To
differentiate between these two possibilities, a trainer
can use a no reward marker (NRM). The NRM tells the animal that its behavior
will not gain it a reinforcer. A lot of dog trainers
use "Nope!" "Wrong!" "Uh-uh!"
or "Try again" as NRMs. For example, if
you're teaching your dog to sit in response to the
cue "sit" (it's not as obvious to the dog
as it is to you; after all, dogs don't have the experience
of verbal words being labels for actions), and the
dog lies down or barks, you can give a NRM. The purpose
of the NRM is to get the animal to try something different.
It is not a conditioned punisher and should not be
used when the dog does something you don't want it
to ever do. It's for when a behavior might
be correct in a different circumstance but not in
this one.
Some
trainers also have developed a keep
going signal
(KGS). This signal tells the animal that it's on the
right track, that its behavior is leading to something
that will gain it a reinforcer. For example, if you're
teaching a dog to roll over and it will lie on its
side, you can use a KGS to tell it that it's close
to a behavior that will get it a reward, but not there
yet.
Operant
Conditioning works on all animals!
---------------------------------------------------------------------------------------------------
Copyright Stacy
Braslau-Schneck, 1998. Please feel free to print
out and distribute for non-comercial use with my name
and this webpage address on your print-out www.wagntrain.com. Reproduced on www.Southwestk9services.com
with permission of author.
--------------------------------------------------------------------------------------------------
|