Stop Burning Tokens: A Practical Guide to Claude Code Cost Optimization

Token usage with Claude Code follows a frustrating pattern: costs are not spread evenly — they cluster around a handful of bad habits. Most developers using Claude Code daily are burning 40–60% more tokens than they need to, simply because of how they phrase prompts, what they put in CLAUDE.md, and which model they reach for by default.

This guide covers five concrete changes that make an immediate difference.


Why Tokens Are Worth Caring About

Every message you send in a Claude Code session includes:

  • The full conversation history up to that point
  • Every file Claude has read during the session
  • The content of your CLAUDE.md files
  • Tool outputs from previous commands
  • MCP tool definitions for configured servers

That means a long session is not just expensive on the last message — it gets more expensive on every single message, because history accumulates. A 3-hour debugging session where Claude reads ten files can cost 10x what the same work would cost if you cleared context between tasks.

flowchart LR
    subgraph SessionCost[What Goes Into Every Message]
        H[Conversation history]
        F[Files Claude read]
        C[CLAUDE.md content]
        T[Tool outputs]
        M[MCP definitions]
    end
    SessionCost -->|multiplied by| MSG[Every new message]
    MSG --> COST[Token cost compounds over time]

Habit 1: Trim Your CLAUDE.md

CLAUDE.md loads into every context window, every session, on every message. If your file is 400 lines, you are paying for 400 lines of context on every single message — even when 300 of those lines are things Claude already knows.

Common bloat to cut:

  • Framework documentation (“React is a library for building UIs”) — Claude knows this
  • Obvious best practices (“write readable code”, “use meaningful variable names”)
  • Package version lists Claude can read from package.json
  • Architectural diagrams that are already in your README

What to keep:

  • Commands Claude cannot guess: npm run deploy:staging, ./scripts/seed-db.sh
  • Team conventions that differ from defaults: “we use 4-space indentation, not 2”
  • Project-specific gotchas: “the auth service requires NODE_ENV=production to initialise”
  • Model routing rules (see Habit 3 below)

Target: under 150 lines. Studies of Claude’s instruction-following show adherence drops noticeably beyond that length — so a shorter, sharper file is both cheaper and more effective.


Habit 2: Ask for Specific Lines, Not Whole Files

When Claude reads a file, the entire file enters the context window. If you ask “what does this function in auth.ts do?” and auth.ts is 800 lines, all 800 lines are loaded — even if the function is at line 45.

Instead, be specific:

Check lines 40 to 65 in src/auth/token-service.ts — why does the refresh logic fail when the user is logged in on two devices?

This is especially important for:

  • Configuration files (often large, usually only partially relevant)
  • Migration files (you usually care about one migration)
  • Log files (you need the last 50 lines, not 10,000)
# Instead of: cat app.log | claude -p "debug this"
tail -100 app.log | claude -p "what errors appear here and why?"

Habit 3: Route Tasks to the Right Model

The default behaviour in Claude Code is to use a capable model for everything. That is correct for complex work but wildly over-specified for simple tasks.

flowchart TD
    Task[New task] --> Q1{Requires deep reasoning or architecture?}
    Q1 -->|Yes| Opus[Claude Opus\nArchitecture decisions\nComplex refactors\nDeep debugging]
    Q1 -->|No| Q2{Standard implementation or debugging?}
    Q2 -->|Yes| Sonnet[Claude Sonnet\nCode implementation\nBug fixes\nCode review\nTest writing]
    Q2 -->|No| Haiku[Claude Haiku\nRenamings\nFormatting\nSimple lookups\nSummaries]

Add this routing table to your CLAUDE.md so Claude applies it automatically:

## Model Routing

Use the cheapest model that can handle the task:

- **Haiku:** renaming, formatting, simple lookups, one-line changes, summarising output
- **Sonnet:** standard implementation, debugging, code review, writing tests
- **Opus:** architecture decisions, complex multi-file refactors, security analysis

You can also switch mid-session:

/model claude-haiku-4-5    # switch to Haiku for a simple task
/model claude-sonnet-4-6   # back to Sonnet for implementation

Habit 4: Front-Load Your Prompts

The most expensive pattern in any Claude Code session is back-and-forth clarification. Each exchange reloads the full context. A prompt that generates three rounds of clarification costs roughly 4x as much as a prompt that generates one complete answer.

Expensive pattern:

You: "Refactor the payment flow"
Claude: "Which part? The card form or the API layer?"
You: "The API layer"
Claude: "What framework? What's the error?"
You: "Express, and it times out on large orders"

That is three context loads before any work begins.

Cheaper pattern:

Before writing any code, confirm your understanding:
- I want to refactor the payment API layer in src/api/payments/
- It uses Express and times out on orders with more than 50 line items
- I need the fix to not change the public API contract
- Tell me if anything is ambiguous before you start

One message establishes full context. Claude confirms, you approve, work begins. One context load.

sequenceDiagram
    participant Dev as Developer
    participant Claude as Claude Code

    Note over Dev,Claude: Expensive pattern (4x context loads)
    Dev->>Claude: Refactor the payment flow
    Claude-->>Dev: Which part?
    Dev->>Claude: The API layer
    Claude-->>Dev: What framework?
    Dev->>Claude: Express, it times out on large orders
    Claude-->>Dev: OK, making changes...

    Note over Dev,Claude: Cheap pattern (2x context loads)
    Dev->>Claude: Refactor payment API in src/api/payments/ - Express - times out on 50+ line items - keep public API unchanged - confirm understanding first
    Claude-->>Dev: Confirmed. Making changes...

Habit 5: Scope Your Subagent Briefs

When Claude spawns subagents to investigate your codebase, vague instructions cause expensive exploration. A subagent told to “look into the auth system” may read dozens of files searching for context it does not need.

Vague (expensive):

Use a subagent to investigate why login is slow

Scoped (cheap):

Use a subagent to investigate login latency.
Read only: src/auth/login-handler.ts, src/middleware/session.ts, and src/db/user-queries.ts
Report: which function takes the longest and why
Do not read any other files

Tight scoping reduces subagent context to just the relevant files, cutting costs dramatically on large codebases.


Tracking Your Usage

Make it a habit to check costs before ending a session:

/cost      # estimated session cost so far
/usage     # detailed breakdown by message
/context   # see what is consuming context space

The /context command is particularly useful — it shows you exactly what is taking up space, so you can target the right optimisation.


Quick Reference

HabitImpactEffort
Trim CLAUDE.md to under 150 linesHighOne-time
Use line ranges instead of full filesHighLow per-prompt
Route tasks to Haiku/Sonnet/OpusMediumOne-time setup
Front-load context in promptsHighMedium habit change
Scope subagent file accessMediumLow per-task
Run /clear between unrelated tasksHighLow habit

The biggest gains come from CLAUDE.md trimming and front-loading. Do those two first and check your usage dashboard in a week — you will see the difference immediately.

Abhay

Abhay Pratap Singh

DevOps Engineer passionate about automation, cloud infrastructure, and self-hosted tools. I write about Kubernetes, Terraform, DNS, and everything in between.