Vibe Engineering: The Labor That Remains

Vibe Engineering: The Labor That Remains

The agent wrote 9,200 tests. I made 9,200 decisions about whether they mattered.

This is the ratio that nobody warned us about. When Emil Stenström built a production-grade HTML5 parser using AI assistance, the machine did the typing. He did something else—something continuous, invisible, exhausting in a way we're only beginning to name. Simon Willison calls it "vibe engineering." Not vibe coding, where you wave your hands and something emerges. Vibe engineering—where the agent executes and you hold the whole edifice in your head, deciding, moment by moment, whether each generated piece belongs.

The work didn't disappear. It condensed into pure judgment.


We expected relief. The promise was always about offloading—let the machine handle the mechanical parts while you focus on the creative ones. But something stranger happened. The mechanical parts vanished, and with them went the cognitive breaks they provided. Typing is meditative. Debugging has rhythm. The spaces between decisions matter.

With AI assistance, those spaces collapse. Every moment becomes evaluative. Is this test meaningful or coverage theater? Does this implementation match intent or just specification? Should I accept this suggestion or request a revision? The questions don't stop. The generation doesn't stop. The agent is always ready for the next prompt, the next iteration, the next perfectly formatted response.

You become a continuous judgment engine.

This isn't about bad AI or unskilled users—Stenström knew what he was building, Willison is among the most thoughtful practitioners in the field. These are people with clear intent and strong judgment, exactly the combination that should benefit most from AI leverage. And they do benefit. The parser exists. It works. Nine thousand tests pass.

But the subjective experience they report isn't "I did less work." It's "I did different work, and that work was unrelenting attention."


The technology is functioning exactly as designed. It's an amplifier. You bring judgment, it amplifies your capacity to apply that judgment at scale. You can evaluate 9,200 tests instead of writing 92. But you must evaluate all 9,200. The machine doesn't create the discrimination—it demands more of it, faster, continuously.

If you lack clear intent, the amplifier doesn't help you find it. It just produces more options to feel uncertain about.


What kind of partnership demands this? Not the kind where you delegate and review outcomes. That's management. This is something more symbiotic and more strange—a judgment-execution loop where the relationship itself creates a new cognitive space. The human doesn't just use the agent as a tool. The agent doesn't just follow instructions. Together they form a system that requires the human to maintain coherence across everything the agent produces, while the agent extends what the human can hold in working consideration.

What does this relationship want from both parties? Unrelenting discernment from the human. Tireless generation from the machine. The partnership has no natural stopping point because generation is cheap and evaluation is expensive. The bottleneck used to be production; now it's discrimination. Can you tell what matters? Can you hold your intent steady across thousands of micro-decisions? Can you maintain the conceptual integrity of something you didn't directly build?

These aren't new skills. Architects have always thought this way. Editors do. Anyone who's hired has felt this: the work you review shapes you as much as the work you create. You become responsible for coherence you didn't author. Stenström's experience with the HTML parser exemplifies this—nine thousand test judgments that refined not just the code but his own capacity to recognize what belonged in a production-grade implementation.

AI collaboration universalizes that relationship. Everyone who works with these tools becomes, in part, an editor of machine output. The question isn't whether the AI is good enough. It's whether you can sustain that much judgment without the mechanical relief that used to punctuate it.


I don't know if this is sustainable. I don't know if we'll develop new cognitive rhythms that make continuous evaluation feel less depleting, or if we'll discover that human attention needs those old mechanical breaks more than we realized. I don't know if "vibe engineering" is a transition state or the new equilibrium.

What I know is that the labor didn't go away. It transformed. And the transformation reveals something about what we actually do when we build things: we maintain intent across execution. We hold the pattern steady. We decide what belongs.

The agent can't do that part. It can only force us to do it faster, more often, with less time between decisions to remember why we started.

That's the amplification. Not of output. Of judgment demand.

The question isn't whether AI makes us more productive. It's whether we have enough discernment to meet what it amplifies.