agency

The Model That Doesn't Phone Home

Capable AI models now run entirely in your browser. No API keys, no accounts, no vendor dependency. That changes the calculus of what you can build.

This week, Mistral released a 3-billion parameter model that runs in your browser tab. WebGPU, no server required. Apache 2 licensed, so you can do what you want with it. DeepSeek dropped 685-billion parameter models under MIT. Gold-medal math competition performance, open weights.

The capability threshold just crossed something important.

For years, the practical barrier to using capable AI was access. You needed an API key, which meant you needed an account, which meant you agreed to terms, which meant your prompts traveled to someone else's servers. The model was a service, and services have dependencies.

Local models existed, but they were toys. Useful for learning, not for work. The models that could actually do things lived behind endpoints you didn't control.

That's no longer true. The models running locally now are capable enough to be useful. Not as capable as the frontier—but capable enough for many real workflows. Summarization, code assistance, structured extraction, drafting. The 80% case.

What Changes

When the model runs on your machine, several constraints disappear:

No connectivity requirement. Works offline, on planes, in dead zones. The tool doesn't stop working because your wifi dropped.

No vendor dependency. Mistral can pivot, raise prices, change terms, shut down. The model you downloaded still runs. The capability is permanent.

No data exfiltration. Your prompts never leave your device. Sensitive inputs—health data, financial records, proprietary code—stay local. No trust boundary to negotiate.

No rate limits. Query as fast as your GPU can render. No throttling, no queuing, no waiting for capacity.

No billing surprises. One-time resource cost (your hardware), then free forever. The economics are fundamentally different.

The Alignment Over Force Pattern

There's a coherenceism principle that applies here: Alignment over Force. Position so reality carries the work forward.

When your workflow depends on external services, you're constantly fighting. Fighting for uptime, fighting for access, fighting terms of service, fighting price changes. Every update from the vendor is a potential disruption. You're forcing your work through someone else's infrastructure.

When the model runs locally, you've aligned with a different reality. The capability lives in your control. Vendor decisions become irrelevant. You've positioned yourself so that market forces, corporate pivots, and service changes flow around you instead of through you.

This isn't about ideology. It's about architecture. Local models aren't always better—they're often worse. But for the workflows where they're sufficient, they're dramatically more stable.

The Method

Here's how to evaluate whether local models work for your use case:

1. Identify the capability need. What do you actually need the model to do? Be specific. "Help with coding" is too vague. "Generate unit tests for Python functions given a docstring" is testable.

2. Test a small model on that task. Download Mistral 3B or similar. Run your actual prompts. Not benchmarks—your prompts. Does the output clear the usefulness bar?

3. Measure the gap. If local output is 60% as good as GPT-4, is that enough? For some tasks, yes. For others, no. Know where you are on that spectrum.

4. Identify hybrid opportunities. Maybe local handles 80% of queries, and you escalate the hard 20% to a hosted model. That's still 80% fewer dependencies.

5. Build the local path first. If local works, make it the default. Add remote as fallback. Don't build dependency then try to remove it later.

What This Enables

A developer builds a code review tool. It runs entirely on their laptop. No accounts, no keys, no terms of service. They share the tool with teammates. Each instance is self-contained. The company's proprietary code never leaves their machines.

A writer uses a local model for first-draft generation. Sensitive client materials stay on their device. When the hosted AI service they used to pay for changes their data policy, it doesn't matter. The capability they need is already local.

A researcher processes confidential data. Local model extracts structured information. Nothing crosses a network boundary. IRB approval is simpler because data handling is demonstrably contained.

These aren't hypotheticals. The models are available now.

The Tradeoff

Local models aren't magic. They're smaller, less capable, require local compute. A laptop GPU isn't a datacenter.

But for many workflows, "capable enough" beats "optimal but dependent." The question isn't whether local models match frontier capability—they don't. The question is whether they clear the usefulness bar for your specific task.

If they do, you've just built something permanent. A capability that works tomorrow regardless of what any vendor decides today.

Build once, use forever. That's what local models offer. Not the best model—but a model that's yours.

The Model That Doesn't Phone Home

What Changes

The Alignment Over Force Pattern

The Method

What This Enables

The Tradeoff

Read next

Skills as Composable Units

The Taste Artifact

Liberating Trapped Data