Lesson 44: Troubleshoot — Common Problems & Fixes
Day 44 of 50 · ~10 min · Phase 5: Advanced Patterns
Your Goal
You'll encounter real problems when using Claude Code at scale. This lesson teaches diagnosis and recovery for three common scenarios.
Scenario 1: Subagent Returns Garbage Summary
The Problem
You spawn a subagent to review security vulnerabilities in your codebase:
Main: "Review all database queries for SQL injection vulnerabilities"
[Subagent runs for 5 minutes]
Subagent returns:
"Found some issues. Database.js uses variables. Check it."
The summary is vague, incomplete, and unhelpful. You can't act on it. You don't know:
- How many issues?
- Which ones are critical?
- How to fix them?
- Are there false positives?
Diagnosis
Ask yourself:
- Was the task autonomous enough? ("Review for SQL injection" is concrete. ✓)
- Did Claude have the right context? (You didn't give the subagent your codebase map. ✗)
- Was the task too broad? ("All database queries" might be 50 files. Subagent got lost. ✗)
- Did you specify output format? (No. Subagent improvised. ✗)
Fix
Restart the subagent with better instructions:
Main: "Review all database queries in src/db/ for SQL injection.
Critical information:
- We use parameterized queries (prepared statements)
- Look for: raw string concatenation, user input in query strings
- Output format:
1. Files analyzed: [count]
2. Findings:
- [file.js line N]: [issue type] - [brief description]
3. Risk level: LOW / MEDIUM / HIGH
4. Recommended action: [specific fix]
Be thorough and specific. Show code examples."
Better summary:
Files analyzed: 8
Findings:
- auth.js line 42: Direct string concatenation in query
Code: const q = `SELECT * FROM users WHERE id = ${userId}`
Fix: Use parameterized query: db.query('SELECT * FROM users WHERE id = ?', [userId])
Risk level: HIGH (SQL injection in authentication)
Recommended action: Refactor auth.js to use prepared statements
Concept Reinforced
Subagent quality depends on:
- Clear task: Not vague, not too broad
- Specific output format: Tell Claude exactly how to structure results
- Context provided: CLAUDE.md, architecture map, examples of good/bad code
- Autonomy: Can the task be completed without asking you questions?
Mental model: Subagents are like writing requirements for a contractor. Vague requirements = vague results. Specific requirements = usable results.
Scenario 2: Headless Mode Runs Infinitely
The Problem
You set up a GitHub Actions workflow to auto-format code:
- name: Auto-format code
run: |
claude --headless --task "format code with prettier"
The workflow starts. Minutes pass. No completion. 30 minutes later, GitHub cancels the job.
error: Job exceeded maximum execution time
Claude is stuck in a loop.
Diagnosis
Common causes:
- Claude can't complete the task — It keeps finding things to format, never stops
- Circular dependency — Formatting triggers another format step
- No explicit termination condition — Claude never knows when it's done
- Insufficient permissions — Claude is blocked and retrying endlessly
Check the logs:
# GitHub Actions logs show:
Found 50 files to format...
Formatted 10 files...
Formatted 10 files... # Repeated, not progressing
Formatted 10 files...
Claude is retrying the same files.
Fix
Rewrite the task with explicit bounds:
- name: Auto-format code
run: |
claude --headless \
--task "format code with prettier" \
--max-iterations 1 \
--timeout 300 # 5 minutes max
# Explicit success check
if [ $? -eq 0 ]; then
git add .
git commit -m "chore: auto-format"
else
echo "Format task failed or timed out"
exit 1
fi
Better version — give Claude a specific list:
# Get files that need formatting
FILES=$(git diff --name-only origin/main...HEAD | grep -E '\.(js|ts)$' | head -20)
claude --headless \
--task "format exactly these files with prettier, no other changes: $FILES" \
--timeout 300
Concept Reinforced
Preventing infinite loops:
- Explicit scope — Tell Claude exactly which files/lines, not "all code"
- Termination condition — "Format until all files pass linting" vs. "Format these 5 files"
- Timeout protection — Set max execution time (5-10 min for typical tasks)
- Idempotency — Task should produce the same result if run twice
- Logging — Make Claude report progress so you can spot loops early
Mental model: Headless Claude needs guardrails. An interactive Claude can ask for help. A headless Claude must have clear boundaries, or it spins forever.
Scenario 3: Cost Spirals Out of Control
The Problem
You set up subagents to explore a large codebase daily. One week in, your bill is $500.
"That's $500/day. That's going to bankrupt us."
Diagnosis
Why does cost spiral?
- Large codebase + fresh context — Each subagent reads the entire codebase (100K tokens each)
- Multiple subagents — You spawned 5 subagents × 100K tokens = 500K tokens/run
- Frequent runs — Running daily means high recurrence
- No context reuse — Each subagent starts from scratch
Cost breakdown:
5 subagents × 100K tokens input = 500K tokens
500K input tokens × $0.003 per 1K = $1.50
5 subagents × 10K tokens output = 50K tokens
50K output tokens × $0.015 per 1K = $0.75
Cost per run: $2.25
Runs per day: 1
Runs per month: 30
Monthly cost: $67.50
But wait... you also run it on demand during development.
3 extra runs per day × 30 days = 90 extra runs
90 × $2.25 = $202.50
Monthly total: $270
That matches your bill!
Fix
Option 1: Reuse context (best)
Instead of spawning 5 independent subagents:
Main session reads codebase once (100K tokens)
Main session spawns 5 subagents with: "Here's the codebase. Do your specific analysis."
Saves: 400K tokens per run
New cost: $0.60/run instead of $2.25/run
Option 2: Reduce scope
# Instead of analyzing entire codebase:
Main: "Analyze only src/api/ and src/db/"
Saves: 60% of tokens (only reading relevant code)
Option 3: Use cheaper model
claude --model haiku "analyze codebase"
# Haiku costs 50% less than Sonnet
Option 4: Reduce frequency
# Instead of daily: run weekly
# Instead of 5 subagents: run 2
# Instead of full codebase: run on PRs only
Combined fix:
Before:
- 5 subagents daily
- Full codebase
- Sonnet model
- Cost: $270/month
After:
- 2 subagents weekly (on PRs only)
- Only changed files
- Haiku model
- Cost: $15/month
Savings: 95%
Concept Reinforced
Cost optimization strategy:
- Measure first — Understand where costs come from
- Reuse context — Main session reads once, subagents reference it
- Reduce scope — Analyze only what changed, not the whole codebase
- Right-size models — Use Haiku for simple tasks, Sonnet for complex ones
- Reduce frequency — Weekly is cheaper than daily; on-demand is cheaper than scheduled
- Monitor continuously — Track costs weekly and set alerts
Mental model: Cost is a dial you can turn down. Start simple, measure, then optimize. Most teams can cut costs 70% by being strategic without sacrificing quality.
Quick Reference: Common Problems
| Problem | Diagnosis | Quick Fix |
|---|---|---|
| Subagent returns vague results | No output format specified | Specify exact format: "Output: [file] [line] [issue]" |
| Headless mode hangs | Circular dependency or no termination condition | Add timeout, explicit task bounds, iteration limits |
| Costs explode | Full codebase reads, too many subagents, too frequent | Reduce scope, reuse context, use Haiku, reduce frequency |
| Tests fail inconsistently | Flaky tests, not Claude's problem | Fix tests; add determinism and isolation |
| Permission denied errors | CLAUDE.md missing or too restrictive | Review permissions, allow what you need |
| Large codebase confusion | No architecture map | Create CLAUDE.md with directory structure and key files |
| Safety concerns | Don't trust headless Claude | Use --print to preview, test in staging first |
| Slow performance | Too much context, too many files | Use Glob/Grep to read selectively, split into subagents |
Your Troubleshooting Checklist
When something goes wrong with Claude Code:
1. Is it actually Claude Code, or something else?
- Check error messages carefully
- Run with
--verboseflag for debug output
2. Check permissions (CLAUDE.md)
- Can Claude Code read the files it needs?
- Can it write/edit?
- Are there restrictions?
3. Check the task definition
- Is it clear and specific?
- Is it autonomous (doesn't require human feedback)?
- Does it have bounds (time, scope, format)?
4. Check context
- Is Claude reading too much?
- Is it missing important information?
- Should you provide CLAUDE.md or architecture map?
5. Check the model/cost
- Are you using the right model?
- Can you optimize token usage?
- Is frequency/scope reasonable?
6. Isolate the problem
- Run a simpler version of the task
- Try in interactive mode first
- Test with a small subset of code
Final Thought
The most common mistakes are:
- Underspecifying tasks — "Analyze the code" vs. "Check src/api/ for SQL injection and report findings as a list"
- Not using /print before headless — Always preview before running unsupervised
- No permissions boundary — Always define what Claude can and cannot do in CLAUDE.md
- Not measuring costs — You can't optimize what you don't measure
Fix these three, and 80% of problems disappear.