Chapter 11: The Connectionists' Vigil

The idea wouldn't die. Even as expert systems collapsed and LISP machines gathered dust, even as "AI" became a term to avoid, a small community of res...

Chapter 11: The Connectionists' Vigil

The idea wouldn't die.

Even as expert systems collapsed and LISP machines gathered dust, even as "AI" became a term to avoid, a small community of researchers kept faith with neural networks. They believed (against the prevailing consensus, against the funding trends, against the papers they couldn't publish) that intelligence was not a matter of symbols manipulated by rules. It was something that emerged from connection.

The vigil would last decades. When it ended, the watchers would be vindicated beyond anything they had imagined.

🎧

Backpropagation has a tangled history. The algorithm that would eventually enable deep learning was discovered not once but at least three times, by researchers working independently, often unaware of each other's work.

The first modern form appeared in Seppo Linnainmaa's master's thesis in Finland in 1970. He described the mathematics of propagating derivatives backward through a computation, included FORTRAN code, but did not mention neural networks. The technique existed in optimal control theory before anyone thought to apply it to learning.

Around the same time, Paul Werbos was working on his PhD in the United States, trying to mathematize Freud's concept of "the flow of psychic energy." He developed backpropagation as part of this strange project and first applied it to predict nationalism and social communications in 1974. But he struggled for a decade to publish. The AI community was not interested in psychoanalytic mathematics or in neural networks. Werbos had priority, but the world wasn't listening.

Shun'ichi Amari in Japan had the algorithm too, in the late 1960s, but didn't pursue it. The pieces were scattered across continents, waiting to be assembled.

Then, around 1982, David Rumelhart at the University of California, San Diego reinvented the algorithm independently. He didn't know about Linnainmaa or Werbos. He was working on connectionist models of cognition, trying to understand how networks of simple units might learn. Backpropagation, the method of computing how to adjust connection weights by propagating error signals backward through the network, was the mathematical key he needed.

In 1986, Rumelhart, Geoffrey Hinton, and Ronald Williams published "Learning Representations by Back-Propagating Errors" in Nature. The paper demonstrated that backpropagation could train multilayer networks to learn internal representations, patterns that the network discovered on its own, without being told what to look for. The algorithm was not new. The insight that it could enable neural networks to discover their own features—that was revolutionary.

"As usual in science," one historian noted, "it is not the first inventor who gets the credit, but the last re-inventor."

The same year, Rumelhart and James McClelland edited a two-volume work that would become the manifesto of the connectionist movement: Parallel Distributed Processing: Explorations in the Microstructure of Cognition.

The PDP books proposed a radical alternative to the dominant view of mind. For three decades, cognitive science had assumed that thinking was symbol manipulation—that the brain was, in some functional sense, a computer running programs on data structures. This was the framework of Newell and Simon, of Minsky and McCarthy, of Good Old-Fashioned AI.

The connectionists challenged it directly. The mind, they argued, was composed of vast numbers of simple units connected in networks. Mental processes were not sequential symbol manipulation but parallel interactions, units exciting and inhibiting each other simultaneously. Knowledge was not stored in discrete locations but distributed across connection strengths throughout the network.

This was not merely a different implementation of the same ideas. It was a different conception of what intelligence is. If knowledge was distributed, then there were no discrete symbols to manipulate. If processing was parallel, then the step-by-step logic of classical AI was an illusion imposed by introspection. The mind might be doing something utterly different from what it felt like it was doing.

The PDP books galvanized a generation of researchers. They provided a framework, a vocabulary, and, crucially, working algorithms. The backpropagation chapter became one of the most-cited papers in AI history. Intelligence magazine called the work "the most intense, most effective and most mind-stretching view of neurocomputing origins, theories and concerns to yet reach print."

The debate that followed (symbols versus patterns, GOFAI versus connectionism) shaped AI for the next two decades.

The symbols camp had strong arguments. Symbolic systems were transparent: you could inspect their rules, trace their reasoning, explain their conclusions. They handled logical inference naturally. They could represent structured knowledge (that Socrates was a man, that men were mortal, that therefore Socrates was mortal) in ways that distributed representations struggled to match.

The connectionists had different strengths. Neural networks could learn from examples without being explicitly programmed. They could handle noisy, incomplete data. They degraded gracefully rather than failing catastrophically. And they bore at least some resemblance to what was known about actual brains.

The debate was not merely technical. It was philosophical. The symbolists believed that thinking was a species of computation—rule-governed manipulation of structured representations. The connectionists believed thinking was something that emerged from the dynamics of parallel systems, something that might not be reducible to rules at all.

For a while, it seemed the connectionists might win. The late 1980s brought a surge of enthusiasm for neural networks. But the enthusiasm faded as the second winter set in. Support vector machines emerged as powerful classifiers with cleaner mathematical foundations. By the mid-1990s, neural networks had become, as Yann LeCun put it, "taboo."

The three researchers who would eventually be called the "godfathers of deep learning" met this taboo with varying forms of stubbornness.

Geoffrey Hinton, British-born and a descendant of the logician George Boole, had been promoting neural networks since the 1970s. He moved to the University of Toronto in 1987 and began building a research group around connectionist ideas. "A University of Toronto computer scientist who spent decades promoting neural networks despite near-universal skepticism," one profile would later describe him. He attracted students, collaborated widely, and refused to concede that the approach was dead.

Yann LeCun grew up in the suburbs of Paris, the son of an engineer, a teenager who tinkered with electronics and played in bands. He did his PhD in France under Gérard Dreyfus, developing early forms of backpropagation, then spent a year as a postdoc with Hinton in Toronto. In 1988, he joined AT&T Bell Labs.

Bell Labs gave LeCun something invaluable: real problems. The U.S. Postal Service needed to read handwritten zip codes. Banks needed to process checks. LeCun designed convolutional neural networks, architectures inspired by the visual cortex, and trained them on tens of thousands of handwritten examples. By 1989, his system achieved 95% accuracy on zip codes. By the late 1990s, it was reading over 10% of all checks deposited in the United States.

This was neural networks in the wild, solving problems that mattered, generating revenue for corporations. It was also largely invisible to the AI mainstream.

Yoshua Bengio, born in Paris but raised in Canada, completed his PhD at McGill University in 1991 and joined the Université de Montréal. In the 1990s, he combined neural networks with probabilistic models, contributing to the check-recognition systems that AT&T deployed. He and Hinton and LeCun formed a small community, exchanging ideas, collaborating when they could, supporting each other through years of rejection.

"There was a dark period between the mid-90s and early-to-mid-2000s," LeCun recalled, "when it was impossible to publish research on neural nets, because the community had lost interest in it. In fact, it had a bad rep. It was a bit taboo."

What sustained them?

Partly the practical successes. LeCun's check-reading systems proved that neural networks could solve real problems. The technology worked, even if it wasn't fashionable.

Partly theoretical conviction. Hinton believed deeply that the brain was a neural network, and that artificial versions should therefore be capable of general intelligence if properly designed. The approach felt right even when the evidence was incomplete.

And partly institutional shelter. Bell Labs, as a corporate research laboratory, cared about practical applications rather than academic fashion. CIFAR provided sustained funding in Canada. These refuges weren't glamorous, but they were sufficient.

In 2004, Hinton began leading CIFAR's Neural Computation and Adaptive Perception program. The program brought together neuroscientists, computer scientists, physicists, and engineers, researchers from multiple disciplines converging on the question of how learning systems work. Bengio and LeCun participated. The community had a home.

LeCun's 1998 paper, "Gradient-Based Learning Applied to Document Recognition," introduced LeNet-5, a seven-layer convolutional network that set the template for modern deep learning. The paper also created the MNIST database of handwritten digits, a standardized benchmark that allowed researchers worldwide to compare their approaches on common ground.

These were seeds. The vigil was tending them, waiting for conditions that would allow them to grow.

In 2006, Hinton published a paper on "deep belief networks"—a new technique for training neural networks with many layers. The term "deep learning" began to circulate. The community remained small but was building on decades of accumulated technique.

Then came 2012.

A neural network designed by two of Hinton's students, Alex Krizhevsky and Ilya Sutskever, entered the ImageNet competition. The system, trained on two gaming GPUs, achieved an error rate of 15.3%. The second-place system scored 26.2%. The gap was so large it could not be ignored.

The field transformed overnight. The researchers who had been marginalized for decades found themselves, almost without warning, at the center of the most consequential technological development of the century.

In 2018, the Association for Computing Machinery awarded the Turing Award—the highest honor in computer science—to Geoffrey Hinton, Yann LeCun, and Yoshua Bengio. The citation praised their "conceptual and engineering breakthroughs that have made deep neural networks a critical component of computing."

In 2024, Hinton received the Nobel Prize in Physics for "foundational discoveries and inventions that enable machine learning with artificial neural networks."

The vindication was complete. The vigil had ended.

But the connectionists had never been motivated by prizes. They had seen a pattern (minds as networks, intelligence as something that emerges from connection) and they had refused to let it die. Through winters and taboos, through papers rejected and funding denied, they had kept the faith.

When the thaw finally came, the harvest was beyond anything they had imagined.