“The consequences of an act affect the probability of its occurring again.” – B. F. Skinner.
Continuous reinforcement is one of two broad categories of reinforcement schedules. If a schedule of reinforcement is not continuous, it is known as partial reinforcement, also known as intermittent reinforcement.
Reinforcement schedules are used to achieve the desired results of operant conditioning – typically getting a subject to engage in the desired target behavior. For example, you might use a reinforcement schedule to train a pet, teach a class, or parent a child.
You want your dog to sit when you ask it to, so when it obeys your command, you offer it a treat. If you were to offer the dog a treat every time it obeyed your command, you would be using continuous reinforcement. If you only offered it a treat sometimes but not always, that would be a partial reinforcement schedule.
This article will explore continuous reinforcement, how it works, and when it should be used. First, let’s develop a general understanding of operant conditioning and how it’s encouraged through reinforcement schedules.
What is operant conditioning?
Operant conditioning is a theory of learning popularized by B.F. Skinner. It is a type of learning that involves the subject learning and engaging in a specific desired behavior through the consequences of that behavior. Unlike classical conditioning, in which the subject learns to respond through association, operant conditioning involves a choice.
Operant conditioning strengthens or weakens a subject’s behavior, depending on the type of conditioning used (reinforcement or punishment). It is a type of associative learning that works through a causal relationship between behavior and consequence.
If a behavior leads to a reward, then it’s likely that the subject will continue to engage in that behavior moving forward. Alternatively, if a behavior is met with punishment, the subject is less likely to continue engaging in that behavior.
The Skinner Box
Skinner’s work on operant conditioning involved a device known as a Skinner box. The Skinner Box was used to modify animal behavior through operant conditioning. In Skinner’s experiment, he placed a rat in the Skinner Box. The box featured a food chamber, a bowl, and a lever along with the live rat. Pulling the lever opened the chamber and released a food pellet into the bowl.
When a rat first enters the box, it doesn’t know how the lever and chamber system works. As it moves around and explores the box, it will sooner or later hit the lever by accident, over time, and after more accidental activation of the lever system. The rat begins to notice the connection between the lever and the food in its bowl. It learns that its behavior (hitting the lever) has a consequence (food in the bowl).
The importance of operant conditioning
Before Skinner introduced the term ‘operant conditioning,’ a similar concept was already widely known and understood – instrumental learning, or habit, explain Staddon and Cerutti in Annual Review of Psychology.
So, the concept of learning behavior through repetition and reward was not new. However, what was important about Skinner’s work was his exploration and finding with regard to reinforcement schedules.
According to Staddon and Cerutti, Skinner and his colleagues’ work led to ‘a completely unsuspected range of powerful and orderly schedule effects that provided new tools for understanding learning processes and new phenomena to challenge theory.’
Categories of operant conditioning
There are two categories of operant conditioning – reinforcement and punishment. Skinner used reinforcement in the Skinner Box experiment – the more the rat hit the lever, the more food it was given.
Reinforcement is used to strengthen or encourage a behavior. It can be either positive or negative.
Positive reinforcement is the use of a positive, pleasant stimulus or reward to encourage the desired behavior (Skinner’s rat). A teacher in a classroom might use positive reinforcement to encourage students to study for a test. For example, after a spelling test at the end of the week, students who achieve 100 percent on the test are rewarded with a treat, such as a lollipop or a bar of chocolate.
Negative reinforcement involves the removal of a negative or unpleasant stimulus to achieve a desired behavior. For example, a parent who wants their child to clean their bedroom might continuously nag the child until they clean up. The child eventually cleans up, so the parent no longer nags them, thus removing the unpleasant stimulus.
Punishment is a type of operant conditioning that aims to reduce the frequency of a behavior or eliminate it (extinction). Like reinforcement, punishment can be either positive or negative.
In positive punishment, an unpleasant stimulus or consequence is used to reduce a behavior. For example, you want your cat to stop walking on the dining room table when you’re eating dinner. Each time it does, you spray it with some water.
Over time, the cat learns to avoid the table when you’re eating because it has learned that it will get sprayed otherwise. The spray is the addition of the unpleasant stimulus to reduce the behavior – walking on the dining room table.
Negative punishment is the removal of a positive or pleasant stimulus to reduce a behavior. For example, a student sits next to his best friend in class. He keeps chatting to his friend, despite the teacher’s requests for him to remain quiet during class.
To reduce his behavior, the teacher uses negative punishment. She removes him from his seat and from the company of his friend and relocates him to the top of the class in a chair by himself. He has removed a pleasant stimulus – the company and conversation of his friend to get him to stop talking (reduction of behavior).
Much of our understanding of reinforcement schedules today stems from behavioral scientists C.B. Ferster and B.F Skinner in their 1957 book Schedules of Reinforcement. In their work, Ferster and Skinner explored the advantages and disadvantages of different reinforcement schedules regarding achieving a desired behavior in a subject.
Understanding reinforcement schedules
Below we’ve outlined a glossary of terms for reinforcement schedules. We’ll further explore each type of reinforcement schedule later, but first, let’s develop a basic understanding. The different types of reinforcement schedules are:
- Continuous schedule
- Fixed interval reinforcement
- Fixed ratio reinforcement
- Variable interval reinforcement
- Variable ratio reinforcement
Which reinforcement schedule is most appropriate?
The type of reinforcement schedule one should use to teach, encourage and maintain a desired behavior in the subject depends on the target behavior.
What Is Continuous Reinforcement?
Continuous reinforcement is repeatedly reinforcing a behavior every time it is done. It can either be a positive or negative reinforcement. Positive reinforcement is done by adding a stimulus, whereas negative reinforcement is fulfilled by removing a stimulus. Continuous reinforcement aims to lead the subjects into doing a particular act and eventually condition them to act the same way when the stimulus is presented.
Simply put, it offers a reward every single time the subject engages in the desired behavior.
Example: A puppy gets a treat every time it obeys its owner’s command to sit.
The quickest and most effective way to teach a person or an animal to engage in a specific behavior is to use continuous reinforcement, either positive and negative, as outlined above.
For example, Skinner’s rat learned to push the lever through continuous positive reinforcement – it received a reinforcer (food pellet) every time it hit the lever.
Continuous reinforcement is most effective when teaching a new behavior. For example, you could use continuous reinforcement to train a puppy to sit, offering it a treat each time it obeys your command.
Timing is crucial in continuous reinforcement. The reinforcer should be offered immediately after the desired behavior has been carried out to encourage a strong association between behavior and reward.
As mentioned, continuous is best applied when teaching a new behavior. It is not wise to use a continuous reinforcement schedule in the long term when teaching a behavior, as doing so would require a significant amount of time and resources. Instead, once the behavior has been learned through a continuous schedule, the schedule can be changed to partial.
While a continuous reinforcement schedule involves a behavior being reinforced every time the subject engages in it.
Partial reinforcement schedules reinforce the behavior through a range of ratio and interval schedules – after a certain amount of time has passed (fixed interval schedule), after an unpredictable amount of time (variable interval schedule), after a set number of responses (fixed ratio schedule), or after an unpredictable number of responses (variable ratio schedule).
Partial or intermittent reinforcement schedules help to maintain and strengthen the desired behavior after it has initially been learned.
Fixed interval schedule
It offers a reward after a set amount of time when the subject engages in the desired behavior.
Example: An employee receives a paycheck every Friday for the work they completed since the previous Friday.
Fixed ratio schedule
A reward is offered after the desired behavior, but only after a specific number of responses.
Example: Bob writes articles for his local newspaper. He gets paid a set amount of money for every three articles he writes.
Variable interval schedule
The desired behavior is rewarded but after a varying and unpredictable amount of time.
Example: A builder gets paid at varying times for the same amount of work each week. Sometimes, he gets paid after a week’s work. Sometimes he gets paid after two weeks, and sometimes he gets paid twice in the same week. The work is always the same, but the reward is offered at unpredictable times.
Variable ratio schedule
It offers a reward when the subject engages in the desired behavior but after a varying and unpredictable number of responses.
Example: A slot machine in a casino uses a variable ratio reinforcement schedule to encourage users to keep gambling, making the behavior addictive and the schedule more resistant to extinction than others.
Extinction refers to the discontinuation of reinforced behavior. Some schedules of reinforcement lead to sooner extinction than others. Arguably the most effective schedule – variable ratio schedule- is the most resistant to extinction.
In the context of gambling, the gambler does not know when the reward will come and does not know how much they will win. The anticipation and excitement of not knowing combined with the possibility that the reward could be significant keeps the gambler coming back for more.
Maintaining behavior after continuous reinforcement
The point of continuously reinforcing behavior is usually because that behavior serves a greater function. Rewarding a student for passing an exam encourages that student to keep studying, which creates exciting opportunities for them in the future.
Offering a dog a treat when using the litter box instead of the floor serves to keep your house clean. Rewarding a child with extra time on their PlayStation if they keep their room clean teaches them to manage and be responsible for their personal space.
As mentioned, continuous reinforcement is best followed up with partial reinforcement. For example, a variable interval schedule would be best applied in a work context, in which a supervisor checks up on workers at unpredictable times.
Variable interval reinforcement encourages a steady response rate because the workers do not know when the supervisor will appear and must continue to engage in the desired behavior.
Variable ratio schedules also encourage steady response rates. There is no certainty that the reward will be offered, so the subject will continue to engage in the behavior until they receive it.
Fixed interval schedules keep a subject engaging in the desired behavior, such as when an employee receives a weekly paycheck. However, fixed intervals are prone to sooner extinction than variable schedules. If the employee no longer receives a paycheck, he or she will no longer engage in the behavior (work).
Fixed ratio schedules are also prone to extinction. A salesman who receives a commission for every ten sales will work harder and engage in the desired behavior with more purpose and urgency. The closer he gets to ten sales, he will be less likely to engage in the desired behavior (making ten sales) if the commission is not offered.