Chapter 17: The Alignment Anxiety

As the systems grew more capable, the question returned with new urgency: can we control what we are building?...

Chapter 17: The Alignment Anxiety

As the systems grew more capable, the question returned with new urgency: can we control what we are building?

The field called it "alignment"—the challenge of ensuring that artificial intelligence systems do what their creators intend, pursue goals that humans endorse, avoid outcomes that humans would reject. The term suggested a technical problem with technical solutions. But beneath the surface, alignment meant different things to different people. The disagreements were not just about methods. They were about what mattered, who was at risk, and whose values should be encoded into systems that might reshape the world.

The alignment anxiety fractured the community that had built modern AI. Some warned of existential catastrophe: superintelligent systems escaping human control, optimizing for objectives that would render humanity obsolete. Others pointed to harms already happening: biased algorithms, exploited workers, surveillance deployed against the vulnerable. Both camps claimed the mantle of safety. Neither could convince the other.


🎧

In May 2024, OpenAI's Superalignment team collapsed.

The team had been announced the previous year with great ambition: researchers would work on ensuring that superintelligent AI systems, when they arrived, would remain aligned with human values. OpenAI had officially committed to allocating 20% of its computing power to safety research.

But the commitment proved hollow. Jan Leike, who led the team, resigned saying he had "reached a breaking point." His public statement was devastating: "Over the past years, safety culture and processes have taken a backseat to shiny products." He had been "disagreeing with OpenAI leadership about the company's core priorities for quite some time."

The details that emerged were damning. The Superalignment team was routinely denied the computing hardware it needed for tests. Those chips were instead used to power ChatGPT, the revenue-generating product that had made OpenAI famous. Safety researchers were stripped of their veto power over new releases. The 20% commitment had been, in practice, an aspiration rather than a constraint.

Ilya Sutskever, OpenAI's co-founder and chief scientist, also departed. He had played a central role in the brief, chaotic firing and reinstatement of CEO Sam Altman in late 2023. The internal politics were opaque, but the outcome was clear: Sutskever founded a new company, Safe Superintelligence, and left OpenAI behind.

The departures continued throughout 2024. Leopold Aschenbrenner, reportedly fired for leaking information. Daniel Kokotajlo, William Saunders, Gretchen Krueger, each citing concerns about accountability and transparency. John Schulman left for Anthropic. In October, Miles Brundage, senior advisor for AGI Readiness, announced his resignation. "Neither OpenAI nor any other frontier lab is ready," he said, "and the world is also not ready."

By the end of 2024, more than half of the employees focused on AGI safety had left OpenAI. The team that was supposed to ensure the technology remained safe had been gutted, not by external pressure, but by the internal contradictions of a company that had transformed from nonprofit research lab to commercial behemoth.


The fracture had deeper roots.

Anthropic, the company that received several OpenAI refugees, had itself been founded on safety concerns. In 2021, Dario and Daniela Amodei, along with Tom Brown, Chris Olah, Sam McCandlish, Jack Clark, and Jared Kaplan, left OpenAI due to differences over safety and ethics. They built a Public Benefit Corporation with safety as its stated primary focus, developed "Constitutional AI" approaches to training, and positioned themselves as the responsible alternative.

But the fundamental critics were not appeased. For Emily Bender, Timnit Gebru, and Margaret Mitchell—the researchers who had been pushed out of Google for the Stochastic Parrots paper—the problem was not which company took safety seriously. The problem was the frame itself.

"AI hurts consumers and workers and isn't intelligent," Bender stated bluntly. The very concept of "alignment" presupposed that large language models were nascent intelligences requiring guidance toward human values. But if they were stochastic parrots, pattern-matching systems without understanding, then the alignment frame was not just wrong but actively harmful. It conceded the hype that justified the industry's expansion.

When the Future of Life Institute published its open letter in March 2023, calling for a six-month pause on training AI systems more powerful than GPT-4, the fundamental critics refused to sign. The letter cited risks: AI-generated propaganda, extreme automation, human obsolescence, society-wide loss of control. It gathered over 30,000 signatures, including Yoshua Bengio, Stuart Russell, Elon Musk, and Steve Wozniak.

Gebru called it "sensationalist" and noted that it "amplified some futuristic, dystopian sci-fi scenario instead of current problems." The harms from AI, she insisted, "are real and present and follow from the acts of people and corporations deploying automated systems." The letter had cited the Stochastic Parrots paper, but Bender pointed out that it "says the opposite of what we say." The paper's central argument was that claiming LLMs have "human-competitive intelligence" was itself a danger. The pause letter treated that claim as established fact.

Mitchell's critique cut deepest: "Ignoring active harms right now is a privilege that some of us don't have."


The divide mapped onto different communities, different backgrounds, different vulnerabilities.

The existential risk camp drew heavily from Effective Altruism—a movement focused on maximizing good through careful reasoning about where resources could have the greatest impact. Organizations like the Future of Life Institute, the Machine Intelligence Research Institute, and the Center for AI Safety approached AI through this lens. The risks they emphasized were speculative but catastrophic: superintelligent systems that might, pursuing poorly specified goals, render humanity extinct.

The near-term harm camp included researchers who had studied algorithmic discrimination, conducted audits of deployed systems, documented the ways AI was already affecting marginalized communities. Their concerns were not speculative. Facial recognition systems that misidentified Black faces. Predictive policing that reinforced existing patterns of discrimination. Content moderation algorithms that suppressed certain voices while amplifying others.

Both camps claimed to be working on safety. But their priorities were incompatible. Resources devoted to preventing hypothetical superintelligence were resources not devoted to addressing present discrimination. And the framing itself (the assumption that these systems were intelligent, that they required "alignment" rather than regulation or abolition) served the interests of the companies building them.

The collapse of FTX, Sam Bankman-Fried's cryptocurrency exchange, damaged the Effective Altruism movement that had funded much existential risk research. But the underlying tensions predated that scandal and would outlast it. The question was not just what risks mattered, but who got to decide.


The question looked different from the Global South.

In China, AI risk meant something other than superintelligence. Regulations required generative AI to avoid "subverting state power" and align with "socialist core values." The focus was social stability, information control, the immediate effects of AI on political order—not hypothetical future catastrophes.

In India, the concerns were economic: displacement of workers, deployment of systems trained on Western data into contexts they did not understand, digital colonialism dressed in the language of progress. The alignment that mattered was alignment with local needs, local values, local economic interests, not the values of Silicon Valley researchers imagining superintelligent futures.

In Africa, AI often arrived as surveillance and extraction. Facial recognition systems deployed by governments, data labor exported to train systems that would generate profits elsewhere, limited voice in the global governance conversations that would shape AI's future. The African Union was developing AI strategy, but the power asymmetries were vast.

In Latin America, algorithmic systems were already shaping lives: predictive policing deployed with limited oversight, social protection systems making consequential decisions about who received support. Chilean advocacy groups won transparency requirements for predictive policing algorithms, a small victory in a larger struggle.

When alignment researchers in California worried about superintelligent AI optimizing for paperclips, workers in Kenya were being paid pennies per task to label training data. When pause letters circulated among the tech elite, communities in the Global South were dealing with AI systems already deployed, already making decisions, already encoding values that were not their own.

"Alignment for whom?" was not a rhetorical question. It was a demand for specificity that the abstract discourse could not provide.


The tension admits no easy resolution.

Perhaps both camps are right: existential risks deserve attention, and present harms demand redress. Perhaps the either/or framing is itself a distraction, serving those who benefit from paralysis. Perhaps the resources exist to address both, if the political will could be found.

But the structural pressures point elsewhere. Commercial interests corrode safety culture even within companies founded on safety concerns. External regulation moves slowly while capabilities accelerate. The gap between what AI systems can do and what oversight can control continues to widen.

The safety teams at OpenAI were supposed to be the guardrails. They were dissolved or driven out. The researchers who warned about present harms were fired. The pause letter gathered signatures and accomplished nothing; the training continued, the systems grew more capable, the questions remained unanswered.

Alignment, in the end, may be the wrong metaphor. It implies a technical problem to be solved, a calibration to be achieved, a state of harmony to be reached. But the disagreements are not technical. They are political, about power, about values, about who bears the risks and who reaps the rewards.

The alignment anxiety persists because the systems are being built by some people and deployed on others. Because the future imagined by researchers in California is not the future experienced by workers in Kenya or citizens in China or communities in Latin America. Because "human values" is a phrase that obscures more than it reveals, as if humanity shared a single set of values, as if alignment with one group's preferences would satisfy all others.

The question is not whether AI can be aligned. The question is whether the humans building it can agree on what alignment means. The evidence, so far, suggests they cannot.