Protocol · Provider arbitrage × embedding-based classification → adaptive cost ceiling

The Multi-Provider Router

Stop paying Opus prices for Haiku problems.

devsAI power usersfounders running AI products
Also in Operator's AI
See all in Operator's AI
Artifact
typescript
// Router classifies the task profile, then routes. async function route(task: Task): Promise<Response> { const profile = await classify(task); if (profile.simple_extraction || profile.classification) { return call("haiku-4.5", task); // $0.80/M } if (profile.huge_context && !profile.reasoning) { return call("gemini-2.5-flash", task); // 1M ctx, $0.30/M } if (profile.multi_step_reasoning) { return call("sonnet-4.6", task); // $3/M } if (profile.frontier_coordination) { return call("opus-4.7", task); // $15/M } // Fallback chain: try cheap first, escalate on confidence drop return tryChain(["haiku-4.5", "sonnet-4.6", "opus-4.7"], task); }
The router. Classifier first; price-floor second; fallback chain only when both miss.
What you need to know
  • embedding-based classification
  • prompt-token vs completion-token economics
  • provider SLA differences (rate limits, region pinning)
  • the latency-cost-quality trilemma
  • fallback chain design
  • LiteLLM / OpenRouter proxy patterns
  • the 80%-of-traffic-on-cheap-tier rule

If a name is unfamiliar, that's the gap. The list is the curriculum.

The recipe
  1. 01

    Audit your last 1000 requests. Tag each as classification, extraction, reasoning, or coordination.

  2. 02

    Train (or fine-tune a prompt for) a classifier that returns the profile in ~30 tokens.

  3. 03

    Build the routing table. Cheapest model capable of each profile wins by default.

  4. 04

    Add a confidence check. If the cheap model returns low confidence, escalate one tier. Never two.

  5. 05

    Pin a quality canary. 1% of traffic always goes to the premium tier; compare outputs daily.

  6. 06

    Track cost-per-task and resolution-rate per route. The trilemma is real; the dashboard makes it negotiable.

  7. 07

    Re-tune monthly. New model releases invalidate the table.

Receipt
Cost per Proxi conversation dropped from $0.31 to $0.07 with no resolution-rate change.
Three months of production traffic on a WhatsApp operator handling ~2400 conversations/day. Haiku handles 71% of turns; Sonnet handles 24%; Opus 5%. Canary comparisons showed Haiku indistinguishable on those 71% from Sonnet, within margin of noise.
Why it works

Most teams pick one model and pay frontier prices on extraction tasks. Routing flips the default: cheap unless proven otherwise. The classifier costs ~30 tokens; the savings on a single rerouted task usually exceed 100 tokens. At any volume the maths only goes one way. The discipline is the canary, not the routing; without it you can't tell when a tier change degrades quality silently.

Cited
← Previous in Operator's AI
The Bias Mirror
Use AI as critic, not cheerleader. Friston's price for honesty.
Next in Operator's AI
The Reel Factory
The protocol that built the reels you're watching.