Sufficiently Intelligent Agents

How AI labs turn tokens into economic productivity.

May 11, 2026

I wrote this essay as a submission to Dwarkesh Patel’s blog prize.

Labs will become profitable by selling packaged, highly customizable “sufficiently intelligent agents” that do the work a company would otherwise pay humans to do. The ones that win will own enough of the stack to do this cheaper than anyone else, with a runtime that customers cannot trivially swap out.

This “sufficiently intelligent agent” will consist of three parts: sovereign compute, models optimized for inference on that compute, and a harness built to orchestrate those models. We crossed the raw model intelligence threshold a while ago, and post-training models on data from coding harnesses has drastically improved their instruction-following abilities. While we’re still not there, we’ve seen glimpses of this future from tools like OpenClaw and long-running coding agents.

A huge swath of economically valuable work does not require frontier intelligence. The Bureau of Labor Statistics counts 18.5 million office and administrative support workers in the US, the single largest occupational group in the economy at 12.2% of total employment. Most of this work is what I’d call “administrative”: ingest problems, context, and constraints; use judgement and reasoning to find a solution; then implement the solution. Say you have an issue with your insurance and can’t get a prescription filled. A call center worker ingests context (your chart, your insurance policy), considers constraints (must a doctor sign off?), then presents and implements a solution. An accounts payable clerk does almost the same thing for invoices. Most of it is tool use and procedure-following, not PhD-level science. There is so much of this work to do!

Sierra built a benchmark called tau-bench to test agents on multi-turn customer service tasks across retail, airline, and telecom. The current ceiling is around 85% pass-rate. A year ago, it was below 70%. Voice agents jumped from 30% to 67% in the last eight months. These benchmarks are nearing saturation, at which point I’d argue the first version of sufficiently intelligent agents are ready.

The obvious objection is that some of this work tolerates no errors, the way self-driving cars do not. I expect these agents to far exceed average human capability in these roles, just like Waymo far exceeds the average human driver already. Good harnesses and instructions will produce significantly more deterministic results than human equivalents can. And many of these jobs are lower stakes; human intervention is possible if something goes wrong. Adoption will be much faster in these fields.

Once we get to this point with frontier models and harnesses, what remains is compression. Andrej Karpathy described this on Dwarkesh’s podcast as moving toward a “cognitive core”: stripping out broad world knowledge and leaving behind reasoning, action, and common sense in a much smaller model. Labs should keep investing in frontier models, but frontier models won’t be cost-effective for doing most work. There’s a separate economic case for the frontier - possibly outcome-based pricing where labs charge millions for the tokens that discover a new cancer treatment - but that’s a much smaller market than administrative/knowledge work.

To run the agent business profitably, labs must optimize for cost. Serving sufficient intelligence at the scale of human labor is a compute problem before anything else. A lab that owns most of its stack, from chips and data centers up through model architecture and the serving runtime, can co-design every layer and optimize everything. That integrated stack could drive the cost per unit of useful work below what anyone assembling the same product from three different companies can match.

The labs will sell these agents to enterprise users, and the only thing left to add will be internal data. Any company will be able to buy the agent product from OpenAI or Anthropic directly, give it everything their company has, and use it to replace human work at scale. The same will be true in industries that require voice agents, like call centers and customer support. Companies buy an “agent-in-a-box,” onboard it like they would with human employees, and replace entire departments in a matter of weeks.

Switching costs will be high because onboarding these agents requires ingesting and organizing so much data. Just like with new employees, a “training cost” still exists. Humans must verify agents are doing the job correctly during this onboarding, and behavior may vary slightly between agent providers. No company fires their whole workforce every couple months, retrains a whole new team, and expects identical performance immediately. You won’t swap agent providers for the same reason.

These products will not be one-size-fits-all. Different work needs different intelligence, and labs can build a price ladder around this. A lab selling agent products will offer cheaper models for call center work, more expensive ones for financial analysis, and maybe even frontier models for novel research. All the models will perform similarly on tau-bench, but not on a quantum physics exam.

A big objection to all of this is distillation. If any model can be approximated via distillation, customers can just swap an API key and capture the same work at lower cost. This won’t work as well for agent-in-a-box products. Distillation works for approximating model behavior, and maybe the open source community can replicate the harness, but the nature of a modular approach means it will be difficult to optimize across the whole stack. The vertical approach of “our compute, our model, our harness” will be easier to cost-optimize.

Among current labs, Google is best positioned structurally. It’s the only lab that owns chips, data centers, models, and a distribution channel into Workspace and Cloud: every layer of the stack. While Google is still catching up on agentic products, they have a bit of time before this product category is mature. I anticipate OpenAI and Anthropic will build the first products.

Sufficient intelligence is already here. Sufficient agency is right around the corner. Once the labs build a sufficiently intelligent agent that any company can buy, onboard, and integrate into their workforce, they’ve cracked the code on turning tokens into economic productivity.