Lesson 37: Cost & Context Optimization

Day 37 of 50 · ~7 min read · Phase 5: Advanced Patterns

The Opening Question

Let's say you're going to use Claude Code heavily. Multiple sessions a day, big codebases, lots of interactions.

You run a session for 10 minutes. You read the usage stats: $3.50 used.

At that rate, a 40-hour work week costs you over $400 just for Claude Code. Multiply that across a team of engineers, and suddenly you're looking at real money.

Here's the question: is there a way to get the same intelligence but use fewer tokens?

The answer is yes. And it's not about using a cheaper model (though that's one lever). It's about being strategic about what you send to Claude.

Because every token you don't send is a token you don't pay for.

Discovery

Question 1: What actually costs money?

Let me be clear about the economics:

Input tokens: Every token you send to Claude (context, code, conversation)
Output tokens: Every token Claude generates (usually 2-3x more expensive)

A rough example:

Reading a 1,000-line file: ~1,500 input tokens
Claude writing a response: 500 output tokens
Total cost for one interaction: ~$0.10-0.15

Now multiply that by 100 files in a large codebase:

Context cost alone: ~150K input tokens
One session: potentially $2-3

And if you spawn subagents? Each one starts fresh with its own context.

Pause and think: what's the difference between sending a file to Claude and not sending it?

The cost difference. And sometimes, you don't need to send it.

Question 2: How do you reduce context without losing intelligence?

There are several strategies:

Strategy 1: Don't read entire files

Instead of:

claude: I'm reading src/utils.js (all 500 lines)

Do:

claude: I'm reading lines 45-67 of src/utils.js, which contains the function you asked about

You still get the context you need. You save 400+ lines of tokens.

Strategy 2: Use /compact before big requests

The /compact command summarizes your context and removes verbose information. It preserves the essential bits but reduces token count.

/compact

This might reduce your active context from 50K to 30K tokens while keeping all the important information.

Strategy 3: Isolate work with subagents

Remember Lesson 35? Subagents are also cost-efficient. Instead of bloating your main context with a full codebase review, spawn a subagent to do it. It gets fresh context, completes the task, and returns a summary. Your main session stays lean.

Strategy 4: Start fresh sometimes

If you've been in a session for 30 minutes and your context is full of old conversation, sometimes it's cheaper to start a new session. You lose the conversation history, but you get a clean context window.

Question 3: How does model selection affect costs?

Claude has different models with different costs and speeds:

Haiku (cheaper, faster, less capable)
Sonnet (balanced, good for most tasks)
Opus (most capable, most expensive)

For straightforward tasks (formatting, refactoring, standard bug fixes), Haiku or Sonnet might be sufficient. For complex reasoning, multi-step problem solving, or difficult bugs, Opus is worth the cost.

But here's the key: you don't have to use the same model for every task.

For a security review (complex), you might use Opus. For auto-formatting (simple), you might use Haiku.

You can configure which model Claude Code uses based on the task:

claude --model haiku "format this file"
claude --model opus "refactor this architecture"

Question 4: What's the ROI on optimization?

Some of this might feel complex. Is it worth it?

Let's do the math:

Without optimization:

10 sessions/day
Average context: 50K tokens
Average interaction cost: $0.50-1.00
Daily cost: $5-10
Monthly cost (22 work days): $110-220

With optimization (reading selectively, using /compact, right-sizing models):

Same 10 sessions/day
Average context: 30K tokens
Average interaction cost: $0.25-0.50
Daily cost: $2.50-5
Monthly cost: $55-110

You cut costs roughly in half, and your work quality doesn't change.

Now multiply that across a team, and optimization becomes economically significant.

The Insight

Token costs are real, but they're not fixed. You control how many tokens you send and what you do with them. The smartest teams don't just run Claude Code — they optimize for intelligence per dollar by reading selectively, using /compact, isolating work, and right-sizing models.

The mental model: Optimize like a professional athlete. Every action has a cost-benefit. You want maximum performance (intelligence) per unit of resource (tokens). Sometimes that means being ruthless about what you include in context.

Try It

Let's set up a cost-aware workflow.

Check your current token usage:

claude status

You'll see something like:

Session: abc123
Tokens used (input): 45,230
Tokens used (output): 12,450
Estimated cost: $1.75

Run /compact to clean up context:
```
/compact
```
Read the output. Claude will summarize and remove verbose parts. This usually saves 30-40% of tokens.

Try reading selectively instead of whole files:

I need to understand the authentication logic in auth.js.
Can you read only the login() function and the validateToken() function?
Ignore the rest of the file.

Compare costs with a fresh session: Start a new session and do a small task. Notice the token cost. Then do the same task in your existing session. Often, starting fresh for isolated work is cheaper.
Look at your monthly usage:
```
claude billing monthly
```
Get a baseline. Next month, after applying these optimizations, compare.

Key Concepts Introduced

Concept	Definition
Input tokens	Tokens you send to Claude (context, code, conversation) — cheaper but abundant
Output tokens	Tokens Claude generates (usually 2-3x cost of input)
Context reduction	Strategies to minimize tokens: reading selectively, /compact, fresh sessions
Model selection	Using cheaper models (Haiku) for simple tasks, expensive models (Opus) for complex ones
Token efficiency	Getting maximum intelligence per dollar spent
Cost-aware workflow	Designing processes that achieve goals with minimum token usage

Bridge to Lesson 38

You've now learned everything about working with Claude Code: prompting it, using tools, extending it with skills, automating it, testing it, debugging with it, parallelizing it, running it headless, and optimizing costs.

But you haven't learned what to do when Claude Code hits a wall: large, complex codebases.

Tomorrow's question: How do you work with code so massive that even an agent struggles?

We'll explore strategies for scaling Claude Code to 100K-line projects, million-line monorepos, and codebases where context becomes the real bottleneck.

← Back to Curriculum · Lesson 38 →