If you faithfully obey the commands I am giving you today . . . then I will send rain on your land in its season, both autumn and spring rains, so that you may gather in your grain, new wine and olive oil. I will provide grass in the fi elds for your cattle, and you will eat and be satisfi ed . . . [But if not, I] will shut up the heavens so that it will not rain and the ground will yield no produce, and you will soon perish from the good land.
(Deuteronomy 11:13–17, New International Version, 1984)
Reward and punishment are the cornerstones of human and divine justice. In psychology, one of the best established principles is that an organism’s behav- ior is controlled by external contingencies. Deliberately changing behavior by managing rewarding and punitive consequences is one the great technical success stories of psychology. Th orndike’s Law of Eff ect essentially stated if an action is rewarded it will tend to be repeated; it will become more probable in the future. Reward “strengthens” a stimulus–response association, Th orndike (1932) argued, which is where the term “reinforcement” comes from. Skinner’s version of what is basically the same law was that behavior is a function of its consequences. Th is allowed him to avoid the tricky circular problem of knowing what is rewarding without fi rst claiming it is anything that strengthens behav- ior. In fact, Skinner was not fond of the term reward for the same reasons, and many operant conditioning researchers today continue to prefer the seemingly more neutral term reinforcement, although I will use them interchangeably. Th orndike’s rewards were pieces of salmon; Skinner originally used cut up pieces of dry spaghetti—a far cry from “grain, new wine and olive oil,” but seemingly serving the same function.
Note that I am already using a variety of terms for the same phenomenon (changed behavioral outcomes): “strengthening,” “more probable,” and “repeated.” So let me emphasize that these changes in probability are not absolute, they are not context free. Th e increase in probability is true only for certain conditions, usually the stimulus conditions present before the behavior was rewarded, or
those that signaled the likely availability of the reward. Th us, the strengthening idea does not really mean that the behavior literally becomes stronger, but that the connection or association between the contextual cue and the behavior is strengthened: in the presence of that cue a given behavior becomes more likely: in Th orndike’s words “the animal had formed a perfect association between the sense-impression of the interior of that box and the impulse leading to the suc- cessful movement” (1898, p. 10).
Th e alternative words and the concepts they signify are not just instances of pedantry. Th ese words carry considerable meaning, sometimes precise and some- times surplus, especially for clinical contexts. If we say a school child’s studying behavior has increased as a result of a reward contingency being imposed, what exactly do we mean? Does it mean the child will study harder or longer, or in the presence of homework cues (textbook in his backpack) will be more likely to engage in study than in an alternative activity? If these are optional alternative characteristics of behavior, which is the one we want to see more of? Th at is a question taking us back to an earlier discussion of “be still, be quiet, be docile”: what behavior change is actually desired in clinical practice? We can also see it is not really likely that the behavior of studying is itself an entirely new skill. (In some circumstances a child does not study because he or she does not know how to—a skill defi cit we will come to presently.) What we are actually trying to do is to make studying behavior fi ll a greater percentage of the child’s time when the homework cues (situational demands) are present. We do not want the child to be doing homework during family times when it is not desirable to be studying. Modifying the duration or intensity of study behavior could be thought of as a motivational infl uence rather than a learning one.
Th e diff erence between learning a new skill and performing a previously acquired one is a distinction that has been confusing the learning fi eld for a long time. Th orndike’s cats did not literally learn how to get out of the puzzle box he constructed, nor did they work out the solution to the puzzle; they learned to step on a paddle that opened the door of the cage, which then let them out to get food. Although the response was not exactly new, its probability changed from low to high. Probability was measured by response latency: the time between being put in the puzzle box and depressing the little paddle. Th is time parameter steadily decreased over trials (“trial” always means “learning opportunity”) until eventually the response occurred almost immediately after being placed back in the box and the latency reached asymptotic levels—it could not be performed much quicker. Th is steady decrease in latency could be plotted against trials and the resultant function was an operational defi nition of learning—the learning curve. It is common in everyday vernacular to hear people talking about being exposed to some new experience and saying that they were on a “steep learning curve,” meaning, of course, that they had to learn new habits very fast.
Th ese niceties aside, it is very clear that many properties of behavior are shaped by their consequences. It is true that when a rat—the usual reluctant participant
in instrumental conditioning studies, since cats rarely adhere to any behavioral principles—is taught to press a lever in the Skinner chamber the original act of depressing the lever is largely accidental, often caused by the rat leaning on the lever with its forepaws while stretching up and exploring its new environment. Th e lever makes a distinctive click (later this will serve as a secondary reinforcer) and a pellet of food drops into the feeding trough where it might or might not be found by the animal. If it is not found and consumed right away, the lever press response is not strengthened—consequences, or their symbolic representations, need to be immediate in order to provide feedback as to the specifi c response that is earning the reward.
To encourage lever pressing a researcher is likely to “bait” the lever by smear- ing a small amount of food over it and thus causing the rat to spend more time nosing around it. Key pecking by pigeons (the experimental subjects Skinner preferred—the “key” here being a little round plastic disc set in the wall of the chamber) is slightly less iff y because pigeons tend to peck at things, anything, in the presence of food. Pecking around is part of its natural foraging behavior, unlike lever pressing by the rat. Watch a pigeon walking about on the ground next time you are at a park and you will see what I mean. As a result of these diff erences it takes a while for the rat’s depression of the lever to become a sim- ple, minimal-eff ort, one paw action, but once it does the subsequent changes produced by the reward contingency are largely with respect to the pattern of repetition of the response and not the shape or form of the behavior. Rats end up depressing the lever in their own idiosyncratic ways; the researcher does not care, because the action is defi ned mechanically as enough force on the lever to activate the electronic mechanism that records a press. Th e most common dependent variable in operant conditioning research is the frequency of the response plotted against time—its rate.
However, many other properties of the response can be developed by diff eren- tial contingencies; for example, it is possible to make only harder, more eff ortful depressions of the lever earn the reward, so that industriousness in humans is easily shaped (Eisenberger, 1992). Generating infrequent responding by ensur- ing only low rates get rewarded is often useful clinically when a child’s targeted problem is that he or she does something too often. It is also possible to make the animal depress the lever in diff erent ways, such as with both paws rather than one, or with its nose rather than its paw. If the animal is a dolphin being trained for a marine mammal entertainment show, then leaping out of the water higher and higher can be consequated. Pretty much any behavior the animal is physically capable of performing can be diff erentially rewarded and thus shaped into a distinctive pattern by rewarding successive approximations of the desired end behavior
Species-specifi c behaviors that occur naturally in the wild or as a result of selec- tive breeding (if your dog is a Golden Retriever it can easily be taught to fetch your morning paper), are more readily shaped than totally novel behaviors, such as
getting a circus dog to ride a trick bicycle. With humans, depending on the con- tingency operating, we can produce eff ortful, graceful, even fl awless responding. Often, however, it is necessary to get only the basic task done. If the contingencies require task completion, a student will whip off a B– assignment the night before it is due; if the contingencies demand excellence, the student will put much more time and eff ort into writing a good essay. Because in education the contingencies are often ambiguous, individual diff erences in motivation, such as competitiveness, fear of failure, desire to please the teacher, and perfectionism, may be stronger deter- minants of quality of performance than are unspecifi ed reward contingencies.