Chapter 7: The Potemkin Spring

The 1960s seemed to be delivering on the Dartmouth promise....

Chapter 7: The Potemkin Spring

The 1960s seemed to be delivering on the Dartmouth promise.

At MIT, programs were emerging that appeared to understand language. At Carnegie Mellon, systems were solving problems and proving theorems. ARPA money flowed into laboratories where researchers spoke confidently of machines that would soon rival human intelligence. Herbert Simon had predicted in 1957 that within a decade, a computer would be world chess champion, would discover an important new mathematical theorem, and would provide the basis for most theories in psychology.

The decade was not over. The predictions had not yet failed. And in the demonstrations that circulated through academic conferences and funding reviews, intelligence seemed genuinely to be emerging from code.

It was a Potemkin spring. The facades were impressive. What lay behind them was something else.

🎧

ELIZA was born between 1964 and 1967 in Joseph Weizenbaum's laboratory at MIT.

The program was designed to explore natural language communication between humans and machines. Its most famous incarnation, DOCTOR, simulated a Rogerian psychotherapist, the kind who reflects questions back at patients. Type "I feel sad," and ELIZA might respond "Why do you feel sad?" Type "My mother doesn't understand me," and ELIZA might ask "Tell me more about your family."

The mechanism was simple. ELIZA used pattern matching and substitution. It identified keywords, applied transformation rules, and generated responses that gave the illusion of understanding. There was no comprehension inside the machine. There was no model of the user's mental state. There was only a bag of tricks for producing plausible-sounding output.

Weizenbaum knew this. He had built it.

What disturbed him was what happened next.

His secretary, who had watched him develop the program, asked to use it. After only a few exchanges, she requested that Weizenbaum leave the room so that she and ELIZA could have a real conversation. This was not a naive user; she knew ELIZA was a program, knew Weizenbaum had written it, knew there was no understanding there. And still she wanted privacy for her "conversation."

"I had not realized," Weizenbaum wrote later, "that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people."

The phenomenon became known as the ELIZA effect: the tendency to project human traits (comprehension, empathy, intention) onto programs that merely simulate them. Users knew, at some level, that they were typing at a machine. They also behaved as if something were listening.

Weizenbaum was a creator who became a critic. ELIZA was intended as a demonstration of technique, a step toward real language understanding. Instead, it revealed something troubling about human psychology: we were ready to believe, desperate to believe, that machines could understand us. Our need for connection would fill in what the technology lacked.

In 1976, Weizenbaum published Computer Power and Human Reason, a book-length warning. "Since we do not now have any ways of making computers wise," he wrote, "we ought not now to give computers tasks that demand wisdom." The man who had built one of AI's most famous programs was now arguing that the field had fundamentally misunderstood its relationship to human thought.

SHRDLU was more impressive still.

Terry Winograd built it at MIT between 1968 and 1970, and in demonstrations it seemed to understand language in a way ELIZA never approached. The program inhabited a simulated "blocks world," a virtual table holding colored blocks and pyramids. Users could type commands in English: "Pick up a big red block." "Put it in the box." "What is on top of the blue pyramid?" SHRDLU would execute commands, answer questions, and even explain its reasoning.

The conversations were remarkable. "Can you pick up the pyramid?" "I don't understand which pyramid you mean." Context mattered; pronouns resolved correctly; the system remembered what had been discussed and what it had done. Unlike ELIZA's shallow pattern-matching, SHRDLU appeared to build a genuine model of its world and reason about it.

The catch was the world itself.

"Clearly," Winograd admitted, "the world known by SHRDLU is outrageously rarefied." The vocabulary was tiny: perhaps 50 words. The physics was simple: blocks could stack, pyramids could not. The actions were limited: pick up, put down, move. Within this microworld, SHRDLU excelled. Outside it, the program had no capabilities at all.

And even within the microworld, the demonstrations were carefully managed. Winograd was candid about this in later years. "The famous dialogue with SHRDLU where you could pick up a block, and so on—I very carefully worked through, line by line." If a visitor asked questions not in the script, the system might answer reasonably if the questions were close enough in form and content. But "there was no attempt to get it to the point where you could actually hand it to somebody and they could use it to move blocks around."

The pressure, Winograd said, was for "something you could demo." This led to what he called "Potemkin villages," systems that looked impressive for the things they did in demonstrations but had no structure to work more generally. The gap between showing and doing was vast.

By 1974, SHRDLU had stopped working reliably. A visiting researcher tried to interact with it and found it unresponsive. "That's a pity," one of Winograd's colleagues remarked. "The program worked when Terry demonstrated it to us."

In the summer of 1965, a philosopher named Hubert Dreyfus arrived at the RAND Corporation in Santa Monica.

He had been hired, with the help of his brother Stuart, to provide an impartial evaluation of artificial intelligence research. Paul Armer, who arranged the visit, expected a balanced assessment. What he got was a demolition.

"Alchemy and AI" compared the field to medieval alchemy: a misguided attempt to transmute base metals into gold, founded on theory that was "no more than mythology and wishful thinking." Dreyfus catalogued the unfulfilled predictions (Simon's chess champion, Simon's theorem prover, Simon's psychological revolution) and argued that the failures were not temporary setbacks but symptoms of a fundamental error.

The error, Dreyfus claimed, was the assumption that intelligence consisted of symbol manipulation. Human expertise, he argued, depended on processes that were informal, unconscious, embodied. We recognized faces not by applying rules but by perceiving patterns we could not articulate. We understood language not through grammatical parsing but through immersion in contexts we had absorbed over lifetimes. We navigated the world not by consulting internal maps but by being bodies in spaces, with habits and intuitions that no formal system could capture.

Dreyfus drew on philosophers the AI community had never read: Merleau-Ponty, Heidegger, the phenomenological tradition that took embodiment and lived experience as primary. To the engineers and mathematicians building AI systems, this was incomprehensible. Edward Feigenbaum complained: "What does he offer us? Phenomenology! That ball of fluff. That cotton candy!" Simon called Dreyfus's ideas "garbage."

RAND delayed publishing the paper, uncomfortable with its conclusions. When it finally appeared as a memo, it became a surprise bestseller—and Dreyfus became a pariah. At MIT, AI researchers avoided being seen with him. His critique had touched a nerve precisely because it challenged the foundations, not just the implementations.

Decades later, the vindication arrived quietly. Robotics researchers rediscovered embodiment. Connectionist approaches bypassed symbolic rules. "Sub-symbolic" processing—exactly what Dreyfus had described—became central to the field's revival. Daniel Crevier, historian of AI, wrote that "time has proven the accuracy and perceptiveness of some of Dreyfus's comments."

In 2007, Dreyfus allowed himself a moment of satisfaction. "I figure I won and it's over—they've given up."

Meanwhile, across the Pacific, researchers at the University of Tokyo and Kyoto University were pursuing their own approaches to artificial intelligence.

Japanese engineers in the 1960s made early contributions to neural networking, image recognition, and what would later be called deep learning. Their work often went unnoticed in Western histories, which focused almost exclusively on MIT, Stanford, and Carnegie Mellon. The American narrative dominated; the American funding dwarfed what was available elsewhere; the American conferences set the agenda.

But the Japanese research planted seeds. In the 1980s, Japan would announce the Fifth Generation Computer Systems project—an ambitious national effort to build intelligent machines using logic programming. The project would eventually be judged a failure, but it emerged from a tradition of AI research that predated it by decades.

The global story of artificial intelligence is not the American story alone. The overlooked threads (Japanese, Soviet, British, European) surfaced at unexpected moments, sometimes vindicating approaches the American mainstream had dismissed.

Winograd eventually abandoned the symbolic approach entirely.

After encountering Dreyfus's critique and meeting the Chilean philosopher Fernando Flores, he came to see his own early work differently. In 1986, Winograd and Flores published Understanding Computers and Cognition, a book that attacked the foundations of classical AI from inside. The creator of SHRDLU was now arguing that computers could not think in the way traditional AI assumed.

The pattern was striking. Weizenbaum, disturbed by users' credulity, became a lifelong critic of AI's pretensions. Winograd, recognizing the Potemkin nature of his own demonstrations, turned to phenomenology. The people closest to the work were the first to see its limits.

The symbolic spring was not entirely illusion.

Real progress occurred. The Logic Theorist proved theorems. The General Problem Solver tackled puzzles. Natural language processing advanced, even if the advances fell short of understanding. The research programs of the 1960s built tools and techniques that proved useful for decades, just not for the purposes originally intended.

But the gap between demonstration and capability was vast. Systems that impressed in controlled settings failed in the wild. Programs that seemed to understand were revealed, on closer inspection, to be shuffling patterns. The predictions of the founders—chess champions by 1967, psychological revolutions by 1970—came and went unfulfilled.

The critics had seen it early. Dreyfus from outside the field, drawing on philosophy the AI community disdained. Weizenbaum and Winograd from inside, disturbed by what their own creations had revealed about human credulity and technical limitation. Their warnings were dismissed for years.

But winter was coming. The Lighthill Report in Britain, funding cuts in America, the failure of machine translation—the spring was ending.

And the questions the critics had asked would not go away. What is intelligence, really? Can symbols capture it? What happens when we build systems that look intelligent without being intelligent?

The Potemkin villages had to fall before the questions could be answered honestly.