Claude Models in 2026: Opus, Sonnet, and Haiku Compared

Picking the wrong Claude model is expensive. Opus on every task costs 5x more than Sonnet for comparable results on most work. Haiku on a complex reasoning task produces worse output than just asking Sonnet. And if you are still using models from early 2025, some of them are deprecated — or will be soon.

This guide covers every current Claude model, what each is good at, how much they cost, and a concrete decision framework for choosing the right one.


The Current Model Lineup

As of May 2026, Anthropic has three active model families: Opus (flagship), Sonnet (balanced), and Haiku (fastest). Each family has reached version 4.5 or higher.

ModelAPI IDContextMax OutputSpeedInputOutput
Claude Opus 4.7claude-opus-4-71M tokens128k tokensModerate$5/MTok$25/MTok
Claude Sonnet 4.6claude-sonnet-4-61M tokens64k tokensFast$3/MTok$15/MTok
Claude Haiku 4.5claude-haiku-4-5-20251001200k tokens64k tokensVery fast$1/MTok$5/MTok

MTok = per million tokens.


What Each Model Is For

Claude Opus 4.7 — Flagship Reasoning

Opus 4.7 is Anthropic’s most capable model. It uses adaptive thinking — the model decides when a question requires deep reasoning and applies it automatically. You cannot manually control the thinking budget on Opus 4.7; it handles that decision internally.

Compared to its predecessor Opus 4.6, Anthropic describes Opus 4.7 as a “step-change improvement” specifically in agentic coding — multi-step tasks where the model has to plan, execute, verify, and loop.

Best for:

  • Complex multi-file refactors where understanding the whole system matters
  • Architecture decisions involving trade-offs and long-term consequences
  • Agentic pipelines where the model runs autonomously over many steps
  • Security audits requiring reasoning about business logic, not just pattern matching
  • Any task where you have tried Sonnet and found the output quality lacking

Not worth it for:

  • Standard implementation tasks (writing a function, adding a test, fixing a syntax error)
  • Summarisation or reformatting work
  • Simple lookups or one-question queries

Benchmarks:

  • SWE-bench Verified: 80.8% (industry-leading code editing)
  • OSWorld computer use: ~72.7%
  • Knowledge cutoff: January 2026

Claude Sonnet 4.6 — The Everyday Default

Sonnet 4.6 is what most engineers should reach for by default. It supports both extended thinking and adaptive thinking, has a 1M token context window (the same as Opus), and costs 40% less. In developer surveys, 70% of engineers prefer Sonnet 4.6 over its predecessor Sonnet 4.5, and 59% prefer it over Opus 4.5 — meaning it punches well above what the price difference suggests.

The 1M token context window became generally available in March 2026, meaning you can feed an entire large codebase, a day’s worth of logs, or hundreds of documents into a single prompt without any special flags.

Best for:

  • Daily coding work: implementing features, fixing bugs, writing tests
  • Code review on pull requests
  • DevOps automation: reviewing Terraform, generating Kubernetes manifests, debugging CI failures
  • Long-context tasks: analysing large log files, reading entire codebases
  • Any task where you want a capable model at a reasonable price

Benchmarks:

  • SWE-bench Verified: 79.6% (only 1.2 percentage points behind Opus at one-third the cost)
  • OSWorld computer use: 72.5%
  • Knowledge cutoff: August 2025

Claude Haiku 4.5 — Speed and Scale

Haiku 4.5 runs at approximately 97 tokens per second — 83% faster than Sonnet 4.6 and 116% faster than Opus 4.6. At $1/$5 per million tokens, it is also five times cheaper than Sonnet and twenty-five times cheaper than Opus.

Despite being the “budget” option, Haiku 4.5 scores 73.3% on SWE-bench Verified. That is competitive with models that cost far more from other providers.

Best for:

  • High-volume, low-latency tasks: processing thousands of files, batch classification, bulk code formatting
  • Simple lookups: “what does this function return?”, “rename this variable”
  • First-pass triage before escalating to a more capable model
  • Cost-sensitive production applications where response speed matters
  • The “executor” role in an Advisor Tool setup (more on that below)

Limitations:

  • 200k token context only (vs 1M for Opus and Sonnet)
  • Knowledge cutoff: February 2025 — over a year out of date
  • Will miss nuance in complex reasoning tasks

Choosing a Model: Decision Framework

flowchart TD
    Q1{Is the task complex reasoning?\nArchitecture, multi-step agentic, security audit} -->|Yes| Opus[Use Opus 4.7]
    Q1 -->|No| Q2{Is it standard dev work?\nImplementation, debugging, code review, DevOps}
    Q2 -->|Yes| Sonnet[Use Sonnet 4.6\nDefault choice]
    Q2 -->|No| Q3{Simple, fast, or high-volume?\nFormatting, lookups, summaries, bulk processing}
    Q3 -->|Yes| Haiku[Use Haiku 4.5]
    Q3 -->|No| Sonnet

The practical default: start with Sonnet 4.6. Upgrade to Opus only when you find Sonnet’s output quality is not good enough for the specific task. Downgrade to Haiku for tasks where speed or cost matters more than depth.


The Advisor Tool Pattern

One advanced pattern that dramatically reduces cost for long agent runs: pair a fast executor model with a high-intelligence advisor model. The advisor provides strategic guidance mid-run; the executor does the actual work.

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-haiku-4-5-20251001",   # fast executor
    max_tokens=4096,
    extra_headers={"anthropic-beta": "advisor-tool-2026-03-01"},
    tools=[
        {
            "type": "advisor",
            "advisor_model": "claude-opus-4-7",  # intelligent advisor
        }
    ],
    messages=[
        {
            "role": "user",
            "content": "Refactor our entire test suite to use the new async patterns"
        }
    ]
)

The Haiku executor handles the repetitive file editing and test running. When it encounters a decision point — which pattern to use, how to handle an edge case — it calls the Opus advisor. You get near-Opus quality at a fraction of the cost for long pipelines.


Prompt Caching: Making Any Model Cheaper

All three models support prompt caching, which stores a prefix of your prompt server-side and charges a fraction of the normal rate on repeated requests. A cached read costs 90% less than a fresh input.

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": open("large-codebase-context.txt").read(),
            "cache_control": {"type": "ephemeral"}  # cache this prefix
        }
    ],
    messages=[{"role": "user", "content": "Why is the auth service slow?"}]
)

The cache has a 5-minute TTL by default (1.25x write cost, 0.1x read cost). A 1-hour cache is available at 2x write cost, 0.1x read cost. For a codebase context you reuse many times in a session, this pays for itself quickly.

As of February 2026, automatic caching is available — add one cache_control field and Anthropic’s infrastructure automatically advances the cache breakpoint as your context grows. No manual management required.


Batch API: 50% Off for Non-Urgent Work

When you do not need an immediate response, the Batch API cuts costs in half across all models. Jobs complete within 24 hours and can return up to 300,000 output tokens per request.

import anthropic

client = anthropic.Anthropic()

batch = client.messages.batches.create(
    requests=[
        {
            "custom_id": f"review-{i}",
            "params": {
                "model": "claude-sonnet-4-6",
                "max_tokens": 1024,
                "messages": [{"role": "user", "content": f"Review PR #{i}"}]
            }
        }
        for i in range(100)
    ]
)
print(f"Batch created: {batch.id}")

Combined prompt caching and batch processing can reduce costs by up to 95% versus standard per-request pricing.


Deprecation Timeline: What to Migrate Away From

If you are running any of these model IDs in production, migrate them now:

ModelStatusDeadline
claude-sonnet-4DeprecatedRetires June 15, 2026
claude-opus-4DeprecatedRetires June 15, 2026
claude-opus-4-1Active but expensive$15/$75 MTok — replace with Opus 4.7 at $5/$25
claude-sonnet-4-5Active, 200k context onlyMigrate to 4.6 for 1M context
claude-3-7-sonnetRetired Feb 19, 2026Already retired
claude-haiku-3Retired Apr 20, 2026Already retired
claude-opus-3Retired Jan 5, 2026Already retired

The migration from any retired claude-opus-4 or claude-sonnet-4 to the current generation models is straightforward — just change the model ID. No API changes required.


Extended and Adaptive Thinking

All three current models support some form of reasoning:

  • Haiku 4.5: Extended thinking (manual budget)
  • Sonnet 4.6: Extended thinking (manual budget) + adaptive thinking
  • Opus 4.7: Adaptive thinking only (breaking change — does not support manual type: "enabled")

If you are migrating code from Opus 4.6 to Opus 4.7 and using extended thinking, you must switch from:

# Opus 4.6 — manual extended thinking
thinking={"type": "enabled", "budget_tokens": 10000}

to:

# Opus 4.7 — adaptive thinking
thinking={"type": "adaptive"},
effort="high"  # low, medium, high (default), max

Adaptive thinking means the model decides when to reason deeply. The effort parameter controls how selective it is — max applies reasoning to almost everything, low applies it sparingly.


Quick Reference

Use caseRecommended modelWhy
Complex architecture reviewOpus 4.7Reasoning quality
Daily coding and debuggingSonnet 4.6Best balance
Large log file analysisSonnet 4.61M context
Long agentic pipelinesOpus 4.7 or Advisor patternReliability
Bulk file processingHaiku 4.5Speed and cost
Security auditOpus 4.7Reasoning depth
Simple reformattingHaiku 4.5Overkill otherwise
Production app (latency-sensitive)Haiku 4.5~97 tokens/sec
Non-urgent batch jobsAny + Batch API50% discount

The model family is evolving fast. Anthropic releases new versions every few months, and each generation brings meaningful capability improvements at similar or lower prices than the generation before. The practical implication: when a new Sonnet or Haiku version ships, it is worth re-evaluating whether you still need Opus for tasks that previously required it.

Abhay

Abhay Pratap Singh

DevOps Engineer passionate about automation, cloud infrastructure, and self-hosted tools. I write about Kubernetes, Terraform, DNS, and everything in between.