Chapter 13: The Coherence Test

What would it mean for a society to pass a coherence test? — From principles to practice In 1990, Elinor Ostrom published a set of eight design principles for governing the commons — principles drawn not from th...

Chapter 13: The Coherence Test

From principles to practice

🎧

Six Questions

In 1990, Elinor Ostrom published a set of eight design principles for governing the commons — principles drawn not from theory but from hundreds of case studies of communities that had successfully managed shared resources for generations. Fisheries in Turkey, irrigation systems in Nepal, forests in Japan. She didn't invent the principles. She found them, embedded in the practices of people who had never read a paper on institutional design but who had, through centuries of trial and error, discovered what works.

The principles were deceptively simple. Clear boundaries. Rules that fit local conditions. Collective-choice arrangements so those affected could shape the rules. Monitoring. Graduated sanctions. Conflict resolution. Recognition of the community's right to organize. And for larger systems, nested enterprises — institutions within institutions, each operating at its own scale.

What made Ostrom's work revolutionary was not the principles themselves but the method: she had derived design criteria from evidence rather than ideology. She watched what endured and asked why.

This chapter attempts something similar, but at a different scale. Instead of commons governance, the subject is any system — an institution, a technology, an economy, a policy — that claims to serve human flourishing. And instead of hundreds of case studies drawn from one domain, the evidence base is five chronicles spanning three thousand years, examining philosophy, artificial intelligence, economics, governance, and revolution.

The pattern library assembled in Chapter 2 distills that evidence. But a pattern library is a reference, not a method. What practitioners need is a way to apply the patterns — a diagnostic that can be used by anyone evaluating a proposal, designing an institution, or assessing whether a system does what it claims to do.

Six questions. Not a formula, not an algorithm — a set of orientations, each earned from evidence, each illuminating a different dimension of coherence. And the questions are not independent. Each reaches into the others — feedback requires inclusion to know what to listen for, inclusion requires imagination to see who is missing, imagination requires environment change to escape the constraints of the present, and all of them require the coherence gap to be visible before any correction is possible. The test works as a system, not as a checklist.

Question One: Is the Coherence Gap Real?

Every system tells a story about itself. Markets tell a story about efficiency and freedom. Democracies tell a story about representation and consent. Corporations tell a story about innovation and value. The first question cuts through the narrative: Does the system's self-description match its actual effects?

This is the coherence gap — the distance between a system's stated purpose and its lived reality. The French monarchy claimed divine legitimacy while peasants starved. The Soviet Union claimed worker ownership while workers had no voice. The contemporary financial system claims to allocate capital efficiently while producing inequality at Gilded Age levels.

The coherence gap is not hypocrisy, or not merely hypocrisy. It is a structural condition. Systems develop internal logics that drift from their stated purposes, often without anyone intending the drift. A university founded to educate becomes an organization optimized for rankings. A healthcare system designed to heal becomes a system optimized for billing. A democracy designed for self-governance becomes a system optimized for the reelection of incumbents.

The first diagnostic question, then, is not an accusation but an inquiry: What does this system actually produce, and does that match what it says it's for?

The inquiry requires empirical honesty — the willingness to look at outcomes rather than intentions. It requires what the philosophy chronicle called the examined life: the capacity to turn the lens on one's own institutions, one's own assumptions, one's own comfortable fictions.

Question Two: Does the Design Preserve Feedback?

If there is one master principle across five chronicles, it is this: systems that maintain feedback loops adapt; systems that sever them collapse.

The second question examines the nervous system of any proposed design: Can the system detect when it's failing, and can those affected signal back?

Feedback, in this sense, is not customer satisfaction surveys or quarterly earnings reports. It is the structural capacity for a system to hear from the people inside it — especially the people at the bottom, at the edges, the people most likely to experience the system's failures first. Athenian democracy maintained feedback because citizens directly experienced the consequences of their votes. The Roman Republic severed feedback when senators could wage wars they didn't fight. Modern representative democracy attenuates feedback through layers of abstraction, lobbying, and gerrymandering until the signal from governed to governors is barely audible.

The feedback question has both a detection component and a response component. Detection: Does the system have sensors? Can it know when something is wrong? Response: Once failure is detected, can the system adjust? Or are there structural barriers — political incentives, sunk costs, ideological commitments — that prevent correction even when the problem is visible?

The AI chronicle adds a dimension: alignment is a feedback problem. How does an artificial intelligence system know when it's wrong? How do the humans affected by algorithmic decisions signal back to the system that shapes their lives? The question applies with equal force to a loan algorithm, a criminal sentencing model, a content recommendation engine, and a climate policy.

But feedback quality depends on who is heard — which makes Question Two inseparable from Question Five. A system with perfect feedback channels that only connect to a narrow constituency will self-correct toward that constituency's interests and call it coherence. The feedback principle requires the inclusion engine, and the inclusion engine requires the feedback principle. Neither is sufficient alone. This interdependence is not a complication of the diagnostic — it is the diagnostic.

Preserve feedback. This is the non-negotiable.

Question Three: Does the Design Change the Environment?

In the Revolution chronicle, Jacque Fresco's insight emerged as one of the most penetrating diagnoses of revolutionary failure: changing operators within the same architecture reproduces the same problems. Replace the king with a parliament, but leave the economic structure intact, and the parliament will serve the same interests the king did. Replace the dictator with a democrat, but leave the information environment corrupted, and the democrat will govern in the same fog.

The third question — the Fresco test — asks: Does this proposal change the conditions under which decisions are made, or does it merely change who makes them?

Universal basic income illustrates the distinction. UBI partially changes the economic environment: by providing a floor beneath which no one can fall, it alters the conditions of labor market participation. A person with a guaranteed income can refuse exploitative work, invest in education, start a small business, or care for family members. The decision environment changes. But UBI does not change the architecture of economic coordination itself — it operates within existing market structures, redistributing outcomes without redesigning incentives. This is both its pragmatic strength (it is implementable without systemic overhaul) and its theoretical limitation (it addresses symptoms without touching structures).

Compare Denmark's energy transition. When the Danish parliament required that local citizens be offered at least twenty percent ownership in new wind energy projects, it didn't just change the personnel of the energy system. It changed the incentive architecture. People who own a share of the wind turbine on their horizon have a different relationship to energy policy than people who merely receive electricity from a distant corporation. The environment of decision-making shifted — from consumer to co-owner, from recipient to participant.

The Fresco test is not a binary. Few proposals fully change the environment; few leave it entirely untouched. The question is directional: In which direction does this push?

Question Four: Does It Scale Without Severing?

Athens worked beautifully — for forty thousand citizens in a single city. Try to scale Athenian direct democracy to a modern nation of three hundred million, and the feedback that made it work disappears. This is the scale trap: governance mechanisms that function at one level fail at another, usually because the feedback loops that sustained them cannot survive the expansion.

The fourth question asks: As this system grows, does it maintain its essential feedback architecture, or does scale introduce distances — between decision and consequence, between governance and governed — that sever the loops?

Citizens' assemblies illustrate the challenge. At the level of a single assembly — a hundred to two hundred randomly selected citizens deliberating on a specific policy question — the feedback is immediate and rich. Participants hear evidence, engage in structured dialogue, develop recommendations that reflect genuine deliberation. The OECD has documented over seven hundred such processes worldwide since 1979, involving more than eighty thousand randomly selected citizens. The quality of deliberation is consistently high. The recommendations are consistently more ambitious, more nuanced, and more forward-looking than what elected legislatures produce under electoral pressure.

But can this scale? A single citizens' assembly addresses a single question in a bounded timeframe. Governance requires addressing hundreds of interconnected questions continuously. The Paris Citizens' Climate Assembly produced detailed recommendations — and the French government implemented barely a third of them, often in diluted form. The gap between deliberative quality and political implementation is the scale trap in action: the assembly works at its own scale, but the system it feeds into operates at a different scale with different incentive structures.

The question is not whether scale is possible — human systems have always found ways to coordinate at larger scales — but whether the essential properties survive the scaling. Ostrom's nested enterprises offer one design pattern: institutions within institutions, each maintaining its own feedback at its own scale, connected by protocols that allow alignment without homogenization. The internet's governance architecture — IETF, ICANN, rough consensus and running code — offers another. Neither is a complete solution. Both are evidence that the scale trap, while real, is not absolute.

Question Five: Does It Include Those Affected?

The inclusion ratchet — once a group gains political voice, it rarely loses it permanently — is one of the most hopeful patterns in the chronicles. But inclusion is not only a moral principle. It is an engineering requirement.

The fifth question asks: Does this design include the perspectives of those it affects — including those who cannot speak for themselves?

Inclusive systems are more resilient because they have more feedback channels. Diverse governance bodies make better decisions — not because diversity is pleasant but because different perspectives bring different information. A room of people who share the same background will have the same blind spots. A room of people with genuinely different experiences will see more of the landscape — including the hazards.

But the inclusion question extends beyond the living and the present. The climate crisis is, among other things, a failure of inclusion: the people most affected by current decisions — future generations, communities in the Global South, nonhuman species — are systematically excluded from the decision-making that shapes their fate. Wales's Future Generations Commissioner, Finland's Committee for the Future, New Zealand's recognition of legal personhood for the Whanganui River — these are experiments in expanding the circle of inclusion beyond the living, the present, and the human.

Platform cooperatives demonstrate inclusion as design. Stocksy United, an artist-owned photography cooperative with over a thousand photographer-members across sixty-seven countries, pays contributors fifty percent of standard license purchases — compared to the single-digit percentages typical of stock photography platforms. The photographers are not just included in the revenue; they are included in the governance. They vote on policy. They shape the platform that shapes their livelihood. The feedback loop is short and direct.

The question is always: Whose experience counts as evidence in this design?

Question Six: Does It Expand or Contract the Imaginable?

The Revolution chronicle found that revolutionary movements are limited by the political imaginary available to them. Spartacus could not imagine abolishing slavery — only escaping it. The French Revolutionaries could not imagine governance without a single sovereign center — they replaced the king with the Committee. Today we struggle to imagine economic coordination beyond growth, or governance beyond the nation-state.

The sixth question asks: Does this proposal expand the range of what people can conceive as possible, or does it narrow it?

Citizens' assemblies score high on this dimension. When ordinary citizens are given access to evidence, expert testimony, and structured deliberation, they consistently produce recommendations that exceed the imaginative horizons of professional politicians. Climate assemblies recommend more ambitious action than governments propose. Why? Because citizens in deliberation are freed from the electoral incentive to play it safe. Their imagination is released from the constraints that bind professional politics.

Conversely, certain designs contract the imaginable. Algorithmic content curation, optimized for engagement, tends to amplify outrage and confirm existing beliefs — narrowing rather than expanding the range of perspectives a person encounters. A technology that collapses the information environment into an echo chamber fails this test regardless of how efficiently it operates.

The imagination question connects to the deepest theme in the chronicles: that you cannot build what you cannot conceive. Any system that expands what people believe is possible — that demonstrates, in working practice, that alternatives exist — is doing something more important than any policy output. It is enlarging the space of the buildable.

The Test Applied

Let the test work. Take three proposals — not to judge them, but to demonstrate the diagnostic in action.

Universal basic income. Coherence gap: moderate — UBI addresses a real gap between the story of meritocracy and the reality of structural inequality, but does not claim to resolve the gap entirely. Feedback: moderate — provides resources that enable economic signaling but does not create new institutional feedback channels. Environment change: partial — alters the floor of economic participation without changing the architecture. Scale: high — UBI is one of the few proposals that scales by definition, since universality is its mechanism. Inclusion: high — removes means-testing gatekeepers that exclude marginalized populations. Imagination: moderate — demonstrates that income need not be tied to employment, but does not by itself expand the economic imaginary further.

Citizens' assemblies as permanent institutions. Coherence gap: addresses it directly — sortition-based assemblies exist precisely to close the gap between representative claims and representative reality. Feedback: high by design — structured channels for citizen input, though implementation rates vary. Environment change: moderate — changes the decision-making environment for those who participate but operates within existing political architectures. Scale: uncertain — works powerfully at the level of individual assemblies, untested as permanent governance institutions at national scale. Inclusion: high — random selection with stratification produces demographic representativeness that elections cannot match. Imagination: high — consistently produces more ambitious, creative recommendations than elected bodies.

Platform cooperatives. Coherence gap: directly addresses the gap between platform rhetoric of "sharing economy" and the reality of extraction. Feedback: high — worker-owners experience consequences of decisions directly. Environment change: high — alters the incentive architecture of platform economics from extraction to cooperation. Scale: uncertain — none has yet achieved the network effects of extractive platforms, and it is genuinely unknown whether cooperative structures can compete in winner-take-all digital markets. Inclusion: high — ownership is distributed by design. Imagination: high — demonstrates that platforms need not be extractive, expanding the digital economic imaginary.

The test does not produce verdicts. It produces visibility. It shows where a design is strong and where it is vulnerable. It identifies the dimensions that need attention.

The Test That Tests Itself

Any evaluation framework risks becoming a cage — reducing complex systems to scores, flattening the very complexity it means to address. The coherence test must be honest about its own limits.

The six questions carry implicit values. Feedback is valued over efficiency. Inclusion is valued over speed. Imagination is valued over predictability. These values are not arbitrary — they are earned from three thousand years of evidence about what makes systems endure and what makes them collapse. But they remain values, not neutral measurements. A different set of priorities would produce a different test.

Dave Snowden's Cynefin framework distinguishes between simple, complicated, complex, and chaotic contexts, arguing that different methods of inquiry apply to each. The coherence test operates best in the complex domain — where causes and effects are entangled, where outcomes cannot be predicted, where the path forward must be discovered through experimentation. In simple or complicated domains, more conventional evaluation methods may suffice.

Michael Quinn Patton's developmental evaluation offers a complementary insight: in complex environments, evaluation should be embedded in the development process, not applied from outside. The coherence test works best not as a post-hoc judgment but as a design companion — a set of questions asked continuously, not once.

And the test should evolve. When it produces misleading assessments — when a system that scores well on all six questions nevertheless fails, or when a system that scores poorly nevertheless succeeds — those anomalies are data. They are the compost from which a better test can grow.

The coherence test is itself subject to the coherence test. It must preserve feedback. It must remain open to revision. It must not mistake its own framework for the final word.

This is not a weakness. It is the design. A test that could not be tested would have severed the very feedback loop it demands of others. The six questions are equipment, not answers. They are a way of paying attention — disciplined, evidence-grounded attention — to the systems that shape our lives.

Use them. And when they fail you, tell us how.