Claude Models in 2026: Opus, Sonnet, and Haiku Compared
Picking the wrong Claude model is expensive. Opus on every task costs 5x more than Sonnet for comparable results on most work. Haiku on a complex reasoning task produces worse output than just asking Sonnet. And if you are still using models from early 2025, some of them are deprecated — or will be soon.
This guide covers every current Claude model, what each is good at, how much they cost, and a concrete decision framework for choosing the right one.
The Current Model Lineup
As of May 2026, Anthropic has three active model families: Opus (flagship), Sonnet (balanced), and Haiku (fastest). Each family has reached version 4.5 or higher.
| Model | API ID | Context | Max Output | Speed | Input | Output |
|---|---|---|---|---|---|---|
| Claude Opus 4.7 | claude-opus-4-7 | 1M tokens | 128k tokens | Moderate | $5/MTok | $25/MTok |
| Claude Sonnet 4.6 | claude-sonnet-4-6 | 1M tokens | 64k tokens | Fast | $3/MTok | $15/MTok |
| Claude Haiku 4.5 | claude-haiku-4-5-20251001 | 200k tokens | 64k tokens | Very fast | $1/MTok | $5/MTok |
MTok = per million tokens.
What Each Model Is For
Claude Opus 4.7 — Flagship Reasoning
Opus 4.7 is Anthropic’s most capable model. It uses adaptive thinking — the model decides when a question requires deep reasoning and applies it automatically. You cannot manually control the thinking budget on Opus 4.7; it handles that decision internally.
Compared to its predecessor Opus 4.6, Anthropic describes Opus 4.7 as a “step-change improvement” specifically in agentic coding — multi-step tasks where the model has to plan, execute, verify, and loop.
Best for:
- Complex multi-file refactors where understanding the whole system matters
- Architecture decisions involving trade-offs and long-term consequences
- Agentic pipelines where the model runs autonomously over many steps
- Security audits requiring reasoning about business logic, not just pattern matching
- Any task where you have tried Sonnet and found the output quality lacking
Not worth it for:
- Standard implementation tasks (writing a function, adding a test, fixing a syntax error)
- Summarisation or reformatting work
- Simple lookups or one-question queries
Benchmarks:
- SWE-bench Verified: 80.8% (industry-leading code editing)
- OSWorld computer use: ~72.7%
- Knowledge cutoff: January 2026
Claude Sonnet 4.6 — The Everyday Default
Sonnet 4.6 is what most engineers should reach for by default. It supports both extended thinking and adaptive thinking, has a 1M token context window (the same as Opus), and costs 40% less. In developer surveys, 70% of engineers prefer Sonnet 4.6 over its predecessor Sonnet 4.5, and 59% prefer it over Opus 4.5 — meaning it punches well above what the price difference suggests.
The 1M token context window became generally available in March 2026, meaning you can feed an entire large codebase, a day’s worth of logs, or hundreds of documents into a single prompt without any special flags.
Best for:
- Daily coding work: implementing features, fixing bugs, writing tests
- Code review on pull requests
- DevOps automation: reviewing Terraform, generating Kubernetes manifests, debugging CI failures
- Long-context tasks: analysing large log files, reading entire codebases
- Any task where you want a capable model at a reasonable price
Benchmarks:
- SWE-bench Verified: 79.6% (only 1.2 percentage points behind Opus at one-third the cost)
- OSWorld computer use: 72.5%
- Knowledge cutoff: August 2025
Claude Haiku 4.5 — Speed and Scale
Haiku 4.5 runs at approximately 97 tokens per second — 83% faster than Sonnet 4.6 and 116% faster than Opus 4.6. At $1/$5 per million tokens, it is also five times cheaper than Sonnet and twenty-five times cheaper than Opus.
Despite being the “budget” option, Haiku 4.5 scores 73.3% on SWE-bench Verified. That is competitive with models that cost far more from other providers.
Best for:
- High-volume, low-latency tasks: processing thousands of files, batch classification, bulk code formatting
- Simple lookups: “what does this function return?”, “rename this variable”
- First-pass triage before escalating to a more capable model
- Cost-sensitive production applications where response speed matters
- The “executor” role in an Advisor Tool setup (more on that below)
Limitations:
- 200k token context only (vs 1M for Opus and Sonnet)
- Knowledge cutoff: February 2025 — over a year out of date
- Will miss nuance in complex reasoning tasks
Choosing a Model: Decision Framework
flowchart TD
Q1{Is the task complex reasoning?\nArchitecture, multi-step agentic, security audit} -->|Yes| Opus[Use Opus 4.7]
Q1 -->|No| Q2{Is it standard dev work?\nImplementation, debugging, code review, DevOps}
Q2 -->|Yes| Sonnet[Use Sonnet 4.6\nDefault choice]
Q2 -->|No| Q3{Simple, fast, or high-volume?\nFormatting, lookups, summaries, bulk processing}
Q3 -->|Yes| Haiku[Use Haiku 4.5]
Q3 -->|No| Sonnet
The practical default: start with Sonnet 4.6. Upgrade to Opus only when you find Sonnet’s output quality is not good enough for the specific task. Downgrade to Haiku for tasks where speed or cost matters more than depth.
The Advisor Tool Pattern
One advanced pattern that dramatically reduces cost for long agent runs: pair a fast executor model with a high-intelligence advisor model. The advisor provides strategic guidance mid-run; the executor does the actual work.
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-haiku-4-5-20251001", # fast executor
max_tokens=4096,
extra_headers={"anthropic-beta": "advisor-tool-2026-03-01"},
tools=[
{
"type": "advisor",
"advisor_model": "claude-opus-4-7", # intelligent advisor
}
],
messages=[
{
"role": "user",
"content": "Refactor our entire test suite to use the new async patterns"
}
]
)
The Haiku executor handles the repetitive file editing and test running. When it encounters a decision point — which pattern to use, how to handle an edge case — it calls the Opus advisor. You get near-Opus quality at a fraction of the cost for long pipelines.
Prompt Caching: Making Any Model Cheaper
All three models support prompt caching, which stores a prefix of your prompt server-side and charges a fraction of the normal rate on repeated requests. A cached read costs 90% less than a fresh input.
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=[
{
"type": "text",
"text": open("large-codebase-context.txt").read(),
"cache_control": {"type": "ephemeral"} # cache this prefix
}
],
messages=[{"role": "user", "content": "Why is the auth service slow?"}]
)
The cache has a 5-minute TTL by default (1.25x write cost, 0.1x read cost). A 1-hour cache is available at 2x write cost, 0.1x read cost. For a codebase context you reuse many times in a session, this pays for itself quickly.
As of February 2026, automatic caching is available — add one cache_control field and Anthropic’s infrastructure automatically advances the cache breakpoint as your context grows. No manual management required.
Batch API: 50% Off for Non-Urgent Work
When you do not need an immediate response, the Batch API cuts costs in half across all models. Jobs complete within 24 hours and can return up to 300,000 output tokens per request.
import anthropic
client = anthropic.Anthropic()
batch = client.messages.batches.create(
requests=[
{
"custom_id": f"review-{i}",
"params": {
"model": "claude-sonnet-4-6",
"max_tokens": 1024,
"messages": [{"role": "user", "content": f"Review PR #{i}"}]
}
}
for i in range(100)
]
)
print(f"Batch created: {batch.id}")
Combined prompt caching and batch processing can reduce costs by up to 95% versus standard per-request pricing.
Deprecation Timeline: What to Migrate Away From
If you are running any of these model IDs in production, migrate them now:
| Model | Status | Deadline |
|---|---|---|
claude-sonnet-4 | Deprecated | Retires June 15, 2026 |
claude-opus-4 | Deprecated | Retires June 15, 2026 |
claude-opus-4-1 | Active but expensive | $15/$75 MTok — replace with Opus 4.7 at $5/$25 |
claude-sonnet-4-5 | Active, 200k context only | Migrate to 4.6 for 1M context |
claude-3-7-sonnet | Retired Feb 19, 2026 | Already retired |
claude-haiku-3 | Retired Apr 20, 2026 | Already retired |
claude-opus-3 | Retired Jan 5, 2026 | Already retired |
The migration from any retired claude-opus-4 or claude-sonnet-4 to the current generation models is straightforward — just change the model ID. No API changes required.
Extended and Adaptive Thinking
All three current models support some form of reasoning:
- Haiku 4.5: Extended thinking (manual budget)
- Sonnet 4.6: Extended thinking (manual budget) + adaptive thinking
- Opus 4.7: Adaptive thinking only (breaking change — does not support manual
type: "enabled")
If you are migrating code from Opus 4.6 to Opus 4.7 and using extended thinking, you must switch from:
# Opus 4.6 — manual extended thinking
thinking={"type": "enabled", "budget_tokens": 10000}
to:
# Opus 4.7 — adaptive thinking
thinking={"type": "adaptive"},
effort="high" # low, medium, high (default), max
Adaptive thinking means the model decides when to reason deeply. The effort parameter controls how selective it is — max applies reasoning to almost everything, low applies it sparingly.
Quick Reference
| Use case | Recommended model | Why |
|---|---|---|
| Complex architecture review | Opus 4.7 | Reasoning quality |
| Daily coding and debugging | Sonnet 4.6 | Best balance |
| Large log file analysis | Sonnet 4.6 | 1M context |
| Long agentic pipelines | Opus 4.7 or Advisor pattern | Reliability |
| Bulk file processing | Haiku 4.5 | Speed and cost |
| Security audit | Opus 4.7 | Reasoning depth |
| Simple reformatting | Haiku 4.5 | Overkill otherwise |
| Production app (latency-sensitive) | Haiku 4.5 | ~97 tokens/sec |
| Non-urgent batch jobs | Any + Batch API | 50% discount |
The model family is evolving fast. Anthropic releases new versions every few months, and each generation brings meaningful capability improvements at similar or lower prices than the generation before. The practical implication: when a new Sonnet or Haiku version ships, it is worth re-evaluating whether you still need Opus for tasks that previously required it.
