The Multi-Provider Router
Stop paying Opus prices for Haiku problems.
- I run four practices. My business model is that I refuse to specialise.· A lawyer who builds software, a producer who reads Friston, a writer with two bar memberships. The orthodoxy says pick one. The numbers say something else.
- The Reel Factory· The protocol that built the reels you're watching.
- The Bias Mirror· Use AI as critic, not cheerleader. Friston's price for honesty.
typescript// Router classifies the task profile, then routes. async function route(task: Task): Promise<Response> { const profile = await classify(task); if (profile.simple_extraction || profile.classification) { return call("haiku-4.5", task); // $0.80/M } if (profile.huge_context && !profile.reasoning) { return call("gemini-2.5-flash", task); // 1M ctx, $0.30/M } if (profile.multi_step_reasoning) { return call("sonnet-4.6", task); // $3/M } if (profile.frontier_coordination) { return call("opus-4.7", task); // $15/M } // Fallback chain: try cheap first, escalate on confidence drop return tryChain(["haiku-4.5", "sonnet-4.6", "opus-4.7"], task); }
- embedding-based classification
- prompt-token vs completion-token economics
- provider SLA differences (rate limits, region pinning)
- the latency-cost-quality trilemma
- fallback chain design
- LiteLLM / OpenRouter proxy patterns
- the 80%-of-traffic-on-cheap-tier rule
If a name is unfamiliar, that's the gap. The list is the curriculum.
- 01
Audit your last 1000 requests. Tag each as classification, extraction, reasoning, or coordination.
- 02
Train (or fine-tune a prompt for) a classifier that returns the profile in ~30 tokens.
- 03
Build the routing table. Cheapest model capable of each profile wins by default.
- 04
Add a confidence check. If the cheap model returns low confidence, escalate one tier. Never two.
- 05
Pin a quality canary. 1% of traffic always goes to the premium tier; compare outputs daily.
- 06
Track cost-per-task and resolution-rate per route. The trilemma is real; the dashboard makes it negotiable.
- 07
Re-tune monthly. New model releases invalidate the table.
Most teams pick one model and pay frontier prices on extraction tasks. Routing flips the default: cheap unless proven otherwise. The classifier costs ~30 tokens; the savings on a single rerouted task usually exceed 100 tokens. At any volume the maths only goes one way. The discipline is the canary, not the routing; without it you can't tell when a tier change degrades quality silently.
- Anthropic API pricingAnthropic · primary sourceLive pricing for Opus / Sonnet / Haiku tiers.
- Gemini API pricingGoogle · primary sourceLive pricing for Gemini 2.5 Pro / Flash / Flash-Lite.
- LiteLLM proxyBerriAI · primary sourceProvider-agnostic routing layer. Makes the cheap-by-default discipline easy to ship.