The Great Token Arbitrage

One striking feature of AI software today is that we’re seeing what looks like two very different prices for essentially the same resource: model tokens.

If you build AI products on APIs, you pay per token.

If you use products like Claude Code, ChatGPT, Codex, Gemini, or similar subscription offerings, you effectively pay a fixed monthly fee while consuming what appears to be a much larger amount of inference than the equivalent API bill would suggest.

On social platforms, customers share screenshots and estimate that they’ve consumed hundreds, sometimes thousands, of dollars’ worth of API-equivalent tokens while paying only a fraction of that amount through a subscription.

The arithmetic is easy to run: Claude’s Max plans cost US$100–200 a month, while public Claude Code users have posted API-equivalent estimates ranging from hundreds of dollars in a week to several thousand dollars in a month, depending on model mix, caching, and workload.

Whether those calculations are perfectly accurate isn’t the important part.

The important part is that a meaningful pricing gap appears to exist.

Whenever the same resource is sold at multiple effective prices, markets naturally look for arbitrage.

Two Prices for the Same Intelligence

flowchart LR
    A[Foundation Model]
    A -->|API Pricing<br/>Pay per token| B[AI SaaS]
    A -->|Subscription Pricing<br/>Flat monthly fee| C[Claude Code / ChatGPT / Codex]
    B --> D[End Users]
    C --> D
    D -. Compares both .-> B

This isn’t arbitrage in the traditional financial sense.

You can’t buy API tokens and resell them.

(Well — almost: a grey market of proxies reselling subscription access through API-shaped endpoints already exists, typically against provider terms. Its existence may be the strongest evidence the gap is real.)

Instead, it’s pricing arbitrage.

Developers increasingly choose whichever access path delivers the most intelligence for the lowest effective cost.

That raises an obvious question.

Why does this pricing gap exist?

There are a few plausible explanations.

API pricing may include a substantial premium over the marginal cost of serving inference.
Subscription users appear to receive inference at a substantially lower effective price than API-equivalent usage would imply.
Or reality lies somewhere between those two observations.

There’s also a less dramatic explanation worth taking seriously: subscriptions are priced on the average subscriber, not the heaviest one.

The people posting screenshots are the top decile. Most subscribers use far less than they pay for, and rate limits cap the tail.

Like a gym, the plan can be profitable in aggregate even when every screenshot looks like a subsidy.

None of this makes the gap less real for a developer deciding how to buy intelligence — it just means the gap can persist longer than a naive arbitrage argument would predict.

The China Pricing Signal

The gap becomes even more interesting when looking at published pricing from Chinese frontier model providers.

DeepSeek’s published pricing is a useful signal rather than a proof: its most capable model, V4 Pro, currently lists at $0.435 per million input tokens and $0.87 per million output, against $5 and $25 for Claude Opus 4.8.

Even after allowing for capability gaps, architecture, caching, hardware economics, and possible subsidies, the spread suggests that frontier API prices may include more than raw serving cost.

That does not prove Western APIs are overpriced.

But it does support a useful hypothesis: frontier API pricing may also price in reliability, distribution, ecosystem access, compliance, product margin, and market position.

Why Would Anyone Price Inference This Way?

History has seen similar patterns before.

Companies routinely subsidize or discount products to acquire users.

Browsers
Cloud storage
Ride sharing
Streaming platforms
Payment networks

Sometimes the goal isn’t today’s profit.

It’s tomorrow’s dependency.

AI companies appear to be doing something similar.

The objective isn’t simply selling tokens.

The objective is becoming the place where developers spend every day.

The more developers build workflows around one ecosystem, the harder it becomes to leave.

Usage creates familiarity.

Familiarity creates dependence.

Dependence creates pricing power.

Whether this is an intentional strategy or simply aggressive competition is difficult to know from the outside, but the market incentives clearly point in that direction.

The Great AI SaaS Squeeze

If this pricing pattern persists, it creates a structural problem for AI-native software companies.

flowchart TB
    P[Foundation Model Provider]
    P -->|Owns, serves at cost| M[Claude Code<br/>ChatGPT<br/>Codex]
    P -->|Sells inference at API price| S[AI SaaS]
    U[Subscription User] -->|Flat monthly fee| M
    U -->|Compares against| S
    style S fill:#ffdddd

Every AI SaaS company wants to build better workflows: better collaboration, better domain expertise, and better customer experience. But many of them buy inference through APIs.

Meanwhile, users compare them against subscription products whose effective inference cost appears dramatically lower — products owned by the very companies setting the API price.

This creates a difficult economic reality.

The infrastructure providers own the models.

The application companies pay API prices.

The users compare both products.

As models become increasingly capable, more of the software value may shift toward the foundation model providers.

Application companies absorb the cost while competing against the providers themselves.

This is the pressure behind the great AI SaaS squeeze.

This Isn’t Just About Pricing

In a June 2026 essay, Microsoft CEO Satya Nadella warned that “every company across every sector is ceding value to a few models” — arguing that “our priority has to be building a frontier ecosystem, not just a frontier model.”

That observation extends beyond model quality.

If inference keeps moving toward commodity pricing, but only a handful of companies control the dominant access paths, then every application company built on top inherits those economics.

You can build an outstanding product.

You can delight customers.

But if your primary cost scales directly with API usage while your competitor effectively bundles inference into a subscription, your business starts from a disadvantage.

The problem isn’t your product.

The problem is the market structure.

Escaping the Squeeze

There are two developments that could fundamentally change these economics.

1. Open Models

As open models approach frontier capability, inference pricing can move closer to commodity pricing.

Instead of paying prices set primarily by scarce frontier access, software companies gain access to intelligence much closer to its actual operating cost.

Competition shifts away from controlling models toward operating them efficiently.

2. Intelligent Model Routing

Not every request deserves the largest model.

A routing layer can intelligently choose between:

Local models
Open-weight models
Frontier APIs
Specialized reasoning models
Multi-agent orchestration systems

Simple requests might run locally.

Medium-complexity work might use an inexpensive open model.

Only the hardest problems require the most expensive frontier models.

Instead of optimizing for model quality alone, software companies optimize for cost per useful answer.

These two forces race against the squeeze itself: every time frontier models pull further ahead, the closed ecosystems gain gravity; every time open models close the gap, the escape route widens.

The Future Stack

flowchart LR
    Q[User Request]
    Q --> R[Intelligent Router]
    R --> O[Open Models]
    R --> L[Local Models]
    R --> F[Frontier APIs]
    R --> G[Multi-Agent Systems]
    O --> A[Application]
    L --> A
    F --> A
    G --> A
    A --> U[User]

The future probably isn’t one model.

It’s many models — and tiers of agents orchestrating them.

This is already being productized: Sakana’s Fugu packages an entire multi-agent orchestration system as a single model — a learned router that delegates work across a swappable pool of frontier models and charges frontier prices for it.

Value is already accruing to the orchestration layer, not just the models beneath it.

The winning products won’t simply use the smartest AI.

They’ll dynamically select the right intelligence for each task.

Model routing may become just as important as model capability.

Intelligence Becomes the Commodity

Throughout computing history, expensive infrastructure eventually became infrastructure that everyone could access: compute, storage, networking, databases. There is good reason to believe inference follows the same path.

The long-term winners may not be those with the smartest single model.

They may be those who deliver the lowest cost per unit of useful intelligence.

That future depends on open models continuing to improve, and on routing systems becoming sophisticated enough to treat models as interchangeable infrastructure rather than products.

The token arbitrage we see today appears likely to be temporary.

Markets rarely tolerate persistent price gaps for the same commodity forever.

Eventually, one of three things happens.

API prices fall.
Subscription prices rise.
Open models force inference toward commodity pricing.

Regardless of which path wins, one thing seems increasingly clear.

The future competitive advantage won’t come from owning the only model.

It will come from building the most efficient system for turning intelligence into value — from winning on cost per useful answer.