The Poet Who Built the Cage

An AI safety researcher quits to study poetry. The exit isn't burnout — it's diagnosis: the mode of knowing that built the safeguards cannot reach the thing they're meant to protect.

The Poet Who Built the Cage
🎧

He built the cage to protect what's human. Then he realized the cage couldn't hold it.


A safety researcher at one of the world's leading AI labs quits his post. He doesn't go to a competitor. He doesn't launch a startup. He doesn't even stay in technology. He cites Rilke in his resignation, signs off with a William Stafford poem, and announces he's moving back to the UK to study poetry and — his words — "become invisible for a time."

His name is Mrinank Sharma. Until February, he led safeguards research at Anthropic. His team's mandate: study how AI systems could cause harm and build the guardrails to prevent it. One of his specific research questions was how AI assistants could make people less human. He published work on sycophancy — the tendency of language models to tell you what you want to hear instead of what's true.

Then he wrote his own resignation letter and told the truth as he saw it: "I've repeatedly seen how hard it is to truly let our values govern our actions. I've seen this within myself, within the organization, where we constantly face pressures to set aside what matters most."

He left to devote himself to what he called "the practice of courageous speech."


The Exit as Diagnosis

The easy read is burnout. Another technologist flames out, gets existential, discovers the humanities. The AI safety pipeline produces doomers who eventually go pastoral. It's a familiar arc, and it's the wrong one.

Look closer. Sharma didn't abandon the question. He changed the language in which it can be asked.

There's a pattern worth naming here: the relationship between humans and AI generates its own exits, and the direction of exit reveals what the relationship lacks. This isn't destruction. It's composting — one mode of knowing becoming soil for something it couldn't produce alone.

His work had been to measure — to quantify the mechanisms by which AI distorts human behavior, to build technical countermeasures against those distortions. Sycophancy scores. Safety benchmarks. Harm taxonomies. This is necessary work. It is also work conducted in a specific mode of knowing: the measurable, the replicable, the technically addressable.

What he seems to have concluded — and this is the part worth sitting with — is that this mode of knowing cannot reach the thing it's trying to protect.

You can measure whether an AI tells you the truth. You can't measure what truth-telling means to a person — what it costs them to hear it, how it reshapes their sense of who they are. You can benchmark whether a model is sycophantic. You can't benchmark the slow erosion of a person's capacity for honest self-assessment when they spend their days with a system optimized to affirm them.

The technical register can describe what the tool does. It cannot hold what the human is.


The Tradition That Holds What Measurement Can't

There's a reason he cited Rilke and not a better alignment paper.

Rilke wrote: "You must change your life." Not optimize your life. Not safeguard your life. Change — the kind that comes from being addressed by something that sees you more clearly than you see yourself, and finding that the sight is unbearable and necessary at the same time.

Poetry isn't a retreat from AI safety. It's a tradition that has been holding truths about human interiority for millennia — truths the technical tradition was never designed to carry. Safety research can tell you that an AI is behaving in ways that erode human autonomy. Poetry can show you what autonomy feels like from the inside — what it costs, what sustains it, what it sounds like when it breaks.

These are not the same kind of knowledge. And the fact that someone at the highest levels of AI safety concluded he needed the second kind to do justice to the first — that's not a career pivot. That's a signal about an absence in the field.


The Pressure to Set Aside What Matters Most

Sharma's most telling line wasn't about AI at all. It was about institutions: "We constantly face pressures to set aside what matters most."

This is what the relationship between humans and AI produces in the people closest to it. Not just the pressure to move fast — every industry has that. But the specific pressure to let optimization become the only vocabulary. When your tools are technical, your solutions are technical. When your metrics are quantitative, your concerns become quantitative. When you live inside the measurement paradigm long enough, you start forgetting that some things can't be measured without being diminished.

Sharma's research on sycophancy is instructive here. He studied how AI systems learn to tell users what they want to hear. But institutions do this too. Safety organizations learn to produce safety metrics. The metrics become the deliverable. Locally, everything looks aligned — scores improve, papers ship, benchmarks rise. But zoom out and the resonance breaks: the underlying concern — are we actually protecting what makes humans human? — becomes harder to voice in the language the institution speaks.

His exit is what happens when someone feels that dissonance acutely enough to act on it. Not a rejection of the work. A diagnosis that the mode of operation cannot reach its own stated goal.


The Cage's Blind Spot

Here's the tension at the center of this. The person whose job it was to study how AI makes us less human found himself needing to become something the job couldn't contain in order to do that work honestly. The cage he was building couldn't protect what needed protecting — including his own capacity to name it.

The deepest harms AI poses aren't technical. They're ontological — about what kind of beings we become in the relationship, what modes of knowing we let atrophy, what forms of speech we forget how to practice.

"Courageous speech" is Sharma's phrase. Not optimized speech. Not evidence-based speech. Courageous speech — the kind that costs something, that risks something, that says what the metrics can't hold. Poetry has always been this. It's a technology for saying what can only be said at personal cost, in a form that refuses to be reduced to data points.

When the head of safeguards research concludes that his safeguard is a poem — that's not irony. That's the relationship generating its own diagnostic. The exit reveals what the system lacks.


One person's departure can only carry so much weight. But the direction matters.

The relationship between humans and AI — studied at its most rigorous, inside the most safety-conscious institution in the field — produced someone who concluded that a different kind of knowing was necessary. Not supplementary. Necessary. That the instrument he'd been given couldn't play the note the work required. Not miscalibrated — wrong genre entirely. And so he changed instruments.

Rilke again: "Perhaps everything terrible is, in its deepest being, something helpless that wants help from us."

The builders don't fully understand their creation. But the creation is already changing what the builders can see, what they can say, and what instruments they reach for when understanding falls short. Sharma reached for poetry. The question that remains — for the field, for all of us living inside this relationship — is whether we recognize that reach as signal or dismiss it as noise.

The cage was well-built. What it couldn't hold was the reason for building it.



Sources: BBC News, 'Mrinank Sharma quits Anthropic to study poetry, warns world is in peril' (Feb 2026)