LEARNING - OPERANT / INSTRUMENTAL CONDITIONING

Operant / Instrumental Conditioning:

This type of conditioning was first investigated by B.F. Skinner. Skinner studied occurrence of voluntary responses when an organism operates on the environment. He called them operants. Operants are those behaviours or responses, which are emitted by animals and human beings voluntarily and are under their control.

The term operant is used because the organism operates on the environment. Conditioning of operant behaviour is called operant conditioning. Skinner conducted his studies on rats and pigeons in specially made boxes, called the Skinner Box. Experiment was as follow:

A hungry rat (one at a time) is placed in the chamber, which was so built that the rat could move inside but could not come out. In the chamber there was a lever, which was connected to a food container kept on the top of the chamber.
When the lever is pressed, a food pellet drops on the plate placed close to the lever. While moving around and pawing the walls (exploratory behaviour), the hungry rat accidentally presses the lever and a food pellet drops on the plate.
The hungry rat eats it. In the next trial, after a while the exploratory behaviour again starts. As the number of trials increases, the rat takes lesser and lesser time to press the lever for food.
Conditioning is complete when the rat presses the lever immediately after it is placed in the chamber. It is obvious that lever pressing is an operant response and getting food is its consequence.

In the above situation the response is instrumental in getting the food. That is why, this type of learning is also called instrumental conditioning. Examples of instrumental conditioning abound in our everyday life. Children who want to have some sweets in the absence of their mother learn to

locate the jar in which mother hides the sweets for safekeeping and eat it. Children learn to be polite and say ‘please’ to get favours from their parents and others.

One learns to operate mechanical gadgets such as radio, camera, T.V., etc. based on the principle of

instrumental conditioning. As a matter of fact human beings learn short cuts to attain desired goals or ends through instrumental conditioning.

Determinants of Operant Conditioning

Operant or instrumental conditioning is a form of learning in which behaviour is learned, maintained or changed through its consequences. Such consequences are called reinforcers.

A reinforcer is defined as any stimulus or event, which increases the probability of the occurrence of a (desired) response. A reinforcer has numerous features, which affect the course and strength of a response. They include its types – positive or negative, number or frequency, quality – superior or inferior, and schedule – continuous or intermittent (partial).

All these features influence the course of operant conditioning. Another factor that influences this type of learning is the nature of the response or behaviour that is to be conditioned. The interval or length of time that lapses between occurrence of response and reinforcement also influences operant learning.

Types of Reinforcement

Reinforcement may be positive or negative.

Positive reinforcement involves stimuli that have pleasant consequences. They strengthen and maintain the responses that have caused them to occur. Positive reinforcers satisfy needs, which include food, water, medals, praise, money, status, information, etc.

Negative reinforcers involve unpleasant and painful stimuli. Responses that lead organisms to get rid of painful stimuli or avoid and escape from them provide negative reinforcement. Thus, negative reinforcement leads to learning of avoidance and escape responses.

For instance, one learns to put on woollen clothes, burn firewood or use electric heaters to avoid the unpleasant cold weather.

It may be noted that negative reinforcement is not punishment. Use of punishment reduces or suppresses the response while a negative reinforcer increases the probability of avoidance or escape response.

It should be understood that no punishment suppresses a response permanently. Mild and delayed punishment has no effect. The stronger the punishment, the more lasting is the suppression effect but

it is not permanent.

Sometimes punishment has no effect irrespective of its intensity. On the contrary, the punished person may develop dislike and hatred for the punishing agent or the person who administers the punishment.

Number of Reinforcement and other Features

It refers to the number of trials on which an organism has been reinforced or rewarded. Amount of reinforcement means how much of reinforcing stimulus (food or water or intensity of pain causing agent) one receives on each trial.

Quality of reinforcement refers to the kind of reinforcer. Chickpeas or pieces of bread are of inferior quality as compared with raisins or pieces of cake as reinforcer. The course of operant conditioning is usually accelerated to an extent as the number, amount, and quality of reinforcement increases.

Schedules of Reinforcement

A reinforcement schedule is the arrangement of the delivery of reinforcement during conditioning trials. Each schedule of reinforcement influences the course of conditioning in its own way; and thus

conditioned responses occur with differential characteristics.

The organism being subjected to operant conditioning may be given reinforcement in every acquisition trial or in some trials it is given and in others it is omitted. Thus, the reinforcement may be continuous or intermittent. When a desired response is reinforced every time it occurs we call it continuous reinforcement.

In contrast, in intermittent schedules responses are sometimes reinforced, sometimes not. It is known as partial reinforcement and has been found to produce greater resistance to extinction – than is found with continuous reinforcement.

Delayed Reinforcement

The effectiveness of reinforcement is dramatically altered by delay in the occurrence of reinforcement. It is found that delay in the delivery of reinforcement leads to poorer level of performance. It can be easily shown by asking children which reward they will prefer for doing some chore. Smaller rewards immediately after doing the chore will be preferred rather than a big one after a long gap.

Key Learning Processes

When learning takes place, be it classical or operant conditioning, it involves the occurrence of certain processes. These include reinforcement, extinction or non-occurrence of learned response, generalisation of learning to other stimuli under some specifiable conditions, discrimination between reinforcing and non-reinforcing stimuli, and spontaneous recovery.

1. Reinforcement

Reinforcement is the operation of administering a reinforcer by the experimenter. Reinforcers are stimuli that increase the rate or probability of the responses that precede.

Reinforced responses increase in rate, while non-reinforced responses decrease in rate. A positive reinforcer increases the rate of response that precedes its presentation. Negative reinforcers increase the rate of the response that precedes their removal or termination.

The reinforcers may be primary or secondary.

A primary reinforcer is biologically important since it determines the organism’s survival (e.g., food
for a hungry organism).

A secondary reinforcer is one which has acquired characteristics of the reinforcer because of the organism’s experience with the environment. We frequently use money, praise, and grades as reinforcers. They are called secondary reinforcers. Systematic use of reinforcers can lead to the desired response. Such a response is shaped by reinforcing successive approximations to the desired response.

2. Extinction

Extinction means disappearance of a learned response due to removal of reinforcement from the situation in which the response used to occur. If the occurrence of CS-CR is not followed by the US in classical conditioning, or lever pressing is no more followed by food pellets in the Skinner box, the learned behaviour will gradually be weakened and ultimately disappear.

Learning shows resistance to extinction. It means that even though the learned response is now not reinforced, it would continue to occur for sometime. However, with increasing number of trials without reinforcement, the response strength gradually diminishes and ultimately it stops occurring.

How long a learned response shows resistance to extinction depends on a number of factors. It has been found that with increasing number of reinforced trials resistance to extinction increases and learned response reaches its highest level. At this level performance gets stabilised. After that the number of trials do not make a difference in the response strength.

Resistance to extinction increases with increasing number of reinforcements during acquisition trials, beyond that any increase in number of reinforcement reduces the resistance to extinction. Studies have also indicated that as the amount of reinforcement (number of food pellets) increases during acquisition trials, resistance to extinction decreases.

If reinforcement is delayed during acquisition trials, the resistance to extinction increases. Reinforcement in every acquisition trial makes the learned response to be less resistant to extinction. In contrast, intermittent or partial reinforcement during acquisition trials makes a learned response more resistant to extinction.

3. Generalisation and Discrimination

The processes of generalisation and discrimination occur in all kinds of learning. However, they have been extensively investigated in the context of conditioning.

Suppose an organism is conditioned to elicit a CR (saliva secretion or any other reflexive response) on presentation of a CS (light or sound of bell). After conditioning is established, and another stimulus similar to the CS (e.g., ringing of telephone) is presented, the organism makes the conditioned response to it. This phenomenon of responding similarly to similar stimuli is known as generalisation.

When a learned response occurs or is elicited by a new stimulus, it is called generalisation. Another process, which is complimentary to generalisation, is called discrimination. Generalisation is due to similarity while discrimination is a response due to difference.

For example, suppose a child is conditioned to be afraid of a person with a long moustache and wearing black clothes. In subsequent situation, when s/he meets another person dressed in black clothes with a beard, the child shows signs of fear. The child’s fear is generalised. S/he meets another stranger who is wearing grey clothes and is clean-shaven. The child shows no fear. This is an example of discrimination. Occurrence of generalisation means failure of discrimination. Discriminative response depends on the discrimination capacity or discrimination learning of the organism.

4. Spontaneous Recovery

Spontaneous recovery occurs after a learned response is extinguished. Suppose an organism has learned to make a response for getting reinforcement, then the response is extinguished and some time lapses.

A question now may be asked, whether the response is completely extinguished, and will not occur if the CS is presented. It has been demonstrated that after lapse of considerable time, the learned or CR recovers and occurs to the CS.

The amount of spontaneous recovery depends on the duration of the time lapsed after the extinction session. The longer the duration of time lapsed, the greater is the recovery of learned response. Such recovery occurs spontaneously. Below figure shows the Phenomenon of Spontaneous Recovery.

Educate Yourself To Grow

Search This Blog