the LLM landscape has 6 major providers, dozens of variants, and pricing from $0.14 to $75 per million tokens. this page helps you stop guessing and start matching.
62% of production traffic should hit cheap models. 27% hits mid-tier. only 11% needs frontier.
most teams overspend because they send everything to the same model.
ranked by what they actually win at, not marketing benchmarks.
the answer to "which model?" is always "for what?"
| task | best pick | runner up | budget pick |
|---|---|---|---|
| code generation | Claude 4.5 Opus | Claude Sonnet 4.5 | DeepSeek-V3.2 |
| math / formal reasoning | GPT-5.2 | Gemini 3 Deep Think | DeepSeek-R1 |
| scientific research | Gemini 3 Deep Think | GPT-5.2 | Gemini 2.5 Flash |
| long document analysis | Gemini 3 Pro (1M) | Llama 4 Scout (10M) | Gemini 2.5 Flash |
| creative writing | Claude 4.5 Opus | GPT-5.2 | Llama 4 |
| classification / extraction | Gemini 2.5 Flash | GPT-5 Mini | DeepSeek-V3.2 |
| agentic workflows | Claude Sonnet 4.5 | GPT-5.2 | Grok 4.1 Fast |
| multimodal (image/video) | Gemini 3 Pro | GPT-5.2 | Gemini 2.5 Flash |
| summarization | Claude Sonnet 4.5 | Gemini 2.5 Flash | DeepSeek-V3.2 |
| structured JSON output | GPT-5.2 | Claude Sonnet 4.5 | Gemini 2.5 Flash |
arxiv proved cheaper models can cost 28x more due to token verbosity. list price is not real price.
a model-agnostic architecture with rule-based routing saves 40-60% on token costs.
use a tiny model or heuristic to score task complexity. simple extraction? cheap tier. multi-step reasoning? escalate. this classifier costs almost nothing.
route to the cheapest model that can handle the complexity. most requests are simpler than you think. only escalate when the cheap model's confidence is low.
check output quality. if the cheap model failed, retry with mid-tier. if mid-tier failed, hit frontier. cascading saves money without sacrificing quality.
coding goes to Claude. math goes to GPT-5. science goes to Gemini. if you don't know, start with GPT-5.2 as the generalist.
under 128K: any model works. 128K-1M: Gemini or Grok. over 1M: Llama 4 Scout or Gemini. context length eliminates options fast.
under $100/mo: DeepSeek or Gemini Flash. $100-1000/mo: Sonnet or GPT-5. unlimited: Opus or GPT-5 Pro. be honest about this upfront.
if no: Llama 4, DeepSeek, or Qwen (self-host). if yes: any API provider. data sovereignty is a hard constraint, not a preference.
single model: pick the best for your task. pipeline: use cheap models for 90% of steps, frontier for the hard 10%. this is where routing pays off.
the model you pick matters less than knowing when to switch.
bookmark this page. it gets updated when the landscape shifts.