Protocol · Provider arbitrage × embedding-based classification → adaptive cost ceiling

The Multi-Provider Router

Stop paying Opus prices for Haiku problems.

devsAI power usersfounders running AI products

Also in Operator's AI

I run four practices. My business model is that I refuse to specialise.· A lawyer who builds software, a producer who reads Friston, a writer with two bar memberships. The orthodoxy says pick one. The numbers say something else.
The Reel Factory· The protocol that built the reels you're watching.
The Bias Mirror· Use AI as critic, not cheerleader. Friston's price for honesty.

See all in Operator's AI ↗

Artifact

typescript
// Router classifies the task profile, then routes.
async function route(task: Task): Promise<Response> {
  const profile = await classify(task);

  if (profile.simple_extraction || profile.classification) {
    return call("haiku-4.5", task);            // $0.80/M
  }

  if (profile.huge_context && !profile.reasoning) {
    return call("gemini-2.5-flash", task);     // 1M ctx, $0.30/M
  }

  if (profile.multi_step_reasoning) {
    return call("sonnet-4.6", task);           // $3/M
  }

  if (profile.frontier_coordination) {
    return call("opus-4.7", task);             // $15/M
  }

  // Fallback chain: try cheap first, escalate on confidence drop
  return tryChain(["haiku-4.5", "sonnet-4.6", "opus-4.7"], task);
}

The router. Classifier first; price-floor second; fallback chain only when both miss.

What you need to know

embedding-based classification
prompt-token vs completion-token economics
provider SLA differences (rate limits, region pinning)
the latency-cost-quality trilemma
fallback chain design
LiteLLM / OpenRouter proxy patterns
the 80%-of-traffic-on-cheap-tier rule

If a name is unfamiliar, that's the gap. The list is the curriculum.

The recipe

01
Audit your last 1000 requests. Tag each as classification, extraction, reasoning, or coordination.
02
Train (or fine-tune a prompt for) a classifier that returns the profile in ~30 tokens.
03
Build the routing table. Cheapest model capable of each profile wins by default.
04
Add a confidence check. If the cheap model returns low confidence, escalate one tier. Never two.
05
Pin a quality canary. 1% of traffic always goes to the premium tier; compare outputs daily.
06
Track cost-per-task and resolution-rate per route. The trilemma is real; the dashboard makes it negotiable.
07
Re-tune monthly. New model releases invalidate the table.

Receipt

Cost per Proxi conversation dropped from $0.31 to $0.07 with no resolution-rate change.

Three months of production traffic on a WhatsApp operator handling ~2400 conversations/day. Haiku handles 71% of turns; Sonnet handles 24%; Opus 5%. Canary comparisons showed Haiku indistinguishable on those 71% from Sonnet, within margin of noise.

Why it works

Most teams pick one model and pay frontier prices on extraction tasks. Routing flips the default: cheap unless proven otherwise. The classifier costs ~30 tokens; the savings on a single rerouted task usually exceed 100 tokens. At any volume the maths only goes one way. The discipline is the canary, not the routing; without it you can't tell when a tier change degrades quality silently.

Cited

← Previous in Operator's AI

The Bias Mirror

Use AI as critic, not cheerleader. Friston's price for honesty.

Next in Operator's AI →

The Reel Factory

The protocol that built the reels you're watching.