Teaching and the New Behaviorism

Is learning really just reinforcement?

Key points

  • Teaching at an advanced level is more about finding the right response than reinforcing it.
  • The repertoire precedes the consequence.
  • Real learning goes beyond pigeons in Skinner boxes or sophomores remembering word lists.

Operant conditioning is a procedure for controlling behavior, explored and exploited most notably by B. F. Skinner, followed by many subsequent researchers. It involves shaping the behavior of an animal or person by rewarding a desired response by presenting a reinforcer, an A+ on a test or a dog treat, for instance, immediately after the response. Usually, the reinforced response will recur more frequently. If the response doesn’t occur spontaneously, it can be coaxed out of the organism by shaping successive approximations to the desired behavior.

Operant conditioning is still a model used by many teachers, but it is a poor guide to higher education. Teaching at that level is not about selection—reinforcement—so much as it is about variation: getting the right response, the right idea, to occur for the first time. Shaping works for some learning tasks, but it simply doesn’t work for producing entirely new, unpredictable behaviors. How to do that is a problem that has been little addressed.

Here is a story to illustrate a successful result, if not the process, used to achieve it. The anecdote comes from British evolutionist Richard Dawkins. It is a moving account[1] of "Sanderson of Oundle." Oundle is a British boarding [2] school famous for its output of talent, and Sanderson, its headmaster early in the 20th century.

Sanderson’s hatred of any locked door which might stand between a boy and some worthwhile enthusiasm symbolised his whole attitude to education. A certain boy was so keen on a project he was working on that he used to steal out of the dormitory at 2 a.m. to read in the (unlocked, of course) library. The Headmaster caught him there, and roared his terrible wrath for this breach of discipline (he had a famous temper and one of his maxims was, “Never punish except in anger”)... [The] boy himself tells the story.

The thunderstorm passed. "And what are you reading, my boy, at this hour?" I told him of the work that had taken possession of me, work for which the daytime was all too full. Yes, yes, he understood that. He looked over the notes I had been taking and they set his mind going. He sat down beside me to read them. They dealt with the development of metallurgical processes, and he began to talk to me of discovery and the values of discovery, the incessant reaching out of men towards knowledge and power, the significance of this desire to know and make, and what we in the school were doing in that process. We talked, he talked for nearly an hour in that still, nocturnal room. It was one of the greatest, most formative hours in my life... "Go back to bed, my boy. We must find some time for you in the day for this."

Dawkins writes that story brings him close to tears. It also shows a kind of creativity in teaching and a kind of spontaneous flowering in learning that seems to lie quite outside the rhetoric of “successive approximations” and the teaching of well-defined and pre-digested material. Sanderson’s pupil was not “shaped” to show an interest in metallurgy. Undoubtedly, he had felt Sanderson’s ire for past errors, as he felt it now for breaking the school rules. And yet, under Sanderson’s tutelage, in the environment Sanderson had created, he developed a passionate interest in learning of the kind we should love to see in any student.

But, are these examples fair? Some radical behaviorists will object that I am merely countering science with anecdote. I don’t think so. To explain why, we need to go back to what the science really is.

Skinner made at least two great discoveries in his analysis of operant behavior. One was hardly original at all; yet it is the one for which he has gotten the greatest credit—and which he himself thought the most important, namely the principle of reinforcement. But humanity knew about carrots and sticks for countless generations before Skinner came along[3].

I think that Skinner’s second contribution is more important than the reinforcement principle but, because it is still not fully understood, it has received much less attention. It is the idea that operant behavior is emitted; that it is essentially spontaneous, at least on the first occurrence. Many have compared operant learning to the Darwinian metaphor, the idea of selection and variation. Variation was Darwin’s term for the then-unknown processes that produced variants (variant phenotypes as we would now call them) from which natural selection would pick the winners. In a similar fashion, the processes that govern the emission of operant behavior produce an initial repertoire from which reinforcement can then select[4]. Reinforcement affects variation also, of course, not just as a selector, but also as a contributor to the labeling or framing of the situation. It puts the organism in an appropriate state. Food reinforcement will induce a food-related repertoire, social reinforcement, socially-related behavior, and so on.

The very best teaching is not really about selection at all, but about variation. Sanderson dispensed rather few obvious reinforcements and not a few punishments. He sought not to eliminate errors but to foster the tenacity and persistence—enthusiasm—that allows kids to learn from them. He created at Oundle an environment that got his boys thinking about things like metallurgical techniques, and, I daresay for Dawkins, the wonders of biological evolution. No treats were dispensed to achieve these ends. Instead, a culture was somehow created in which the boys, in their private thoughts and in their discussions and debates with others, were passionately attracted to topics with some intellectual weight. Sanderson manipulated not through the dispensing of rewards, but through labeling or framing. He thus set up a culture that favored what Skinner might have called the emission of creative operants.

How this works is still a matter of art rather than science. It is, in fact, the reverse of the standard operant approach, which applies a reinforcement procedure with the expectation that all subjects will react in basically the same way. Sanderson’s approach was the opposite, in that he tried to create an environment that allowed for individual differences. He wanted to tease out the particular abilities and enthusiasms of different pupils and then encourage them. He was interested in repertoires of emitted behavior, rather than achieving a fixed goal.

What is needed at the higher reaches of education is an understanding of how the environment created by a school interacts with the patterns of behavior pupils bring with them, from nature and their personal histories, to produce good or bad performance in learning tasks. Attitudes and expectations are just the names we give to the repertoires generated by that environment. Understanding how this works is a task that takes us well beyond pigeons in Skinner boxes or sophomores remembering word lists.

All we can be sure of is that the causes of effective behavior in challenging situations are complex, involving both nature and nurture in an uncertain mix. But three things seem clear: That there are processes in creative teaching that are understood in an intuitive way by great teachers, like Sanderson; That the Darwinian framework for behaviorism shows that processes of variation exist, even though they have been sorely neglected in favor of an almost exclusive focus on reinforcement as selection; And that behaviorists need to take time out from pressing the “reinforcement” lever and look beyond the selectionist simplicities of radical behaviorism to those engines of variation that motivate pupils and yield truly creative learning.

(More on this in The New Behaviorism (Psychology Press, 2021) and Science in an age of unreason (Regnery, 2022),


[1] The Guardian, Saturday July 6, 2002[3] Although they did not know the details, the concept of reinforcement contingencies and the mass of data on reinforcement schedules, discovered by operant conditioners.

[2] British public schools are now private, of course. But they were all founded before the existence of state-supported public education as ways to educate poor but talented boys (yes, mostly boys!). As the state took over their function, they had to find another one, which turned out, usually, to be as elite private boarding schools. Now many are coed and take day scholars — and are usually called ‘independent’ schools.

[4] See Catania, A. C., & Harnad, S. (Eds.). (1988). The selection of behavior: the operant behaviorism of B. F. Skinner. New York: Cambridge University Press, for a set of papers on Skinner and the Darwinian metaphor.

