Lesson 38: Working with Large Codebases

Day 38 of 50 · ~7 min read · Phase 5: Advanced Patterns

The Opening Question

You're asked to work on a 500K-line monorepo. It's a real codebase. Real complexity. Real stakes.

You open Claude Code and try to do something simple: "Add a feature to the authentication module."

But Claude can't read the entire codebase into context. Even with subagents and cost optimization, you're still constrained by what Claude can reasonably hold in its head at one time.

Here's the real question: how do you navigate a codebase that's larger than your tool's context window?

The answer isn't to read everything. It's to read strategically.

Discovery

Question 1: What makes a codebase "large"?

It's not just line count. It's complexity. A 100-line file with 10 interdependent modules is "larger" in practice than a 100K-line file with clear separation of concerns.

But practically, "large" means:

File count: More than 50 source files becomes hard to hold mentally
Context pressure: Reading all relevant files costs 100K+ tokens
Hidden dependencies: You can't just grep your way to understanding — you need architecture knowledge
Onboarding cost: It takes time to understand where things live

At this scale, the bottleneck isn't Claude's capability. It's your ability to tell Claude what matters.

Pause and think: if Claude can't read the whole codebase at once, what should it read first?

Question 2: What's the "architecture-first" approach?

Before Claude reads code, it needs to read architecture.

This is where CLAUDE.md from Lesson 17 becomes critical. It's your map.

# Project Architecture

## Codebase Overview
- `src/auth/` — Authentication & authorization (40 files, 12K lines)
- `src/api/` — REST API endpoints (60 files, 18K lines)
- `src/db/` — Database models & migrations (25 files, 8K lines)
- `src/ui/` — Frontend React components (100 files, 25K lines)

## Critical Paths
To add a feature to auth, you need:
1. Understand `src/auth/models/User.ts`
2. Check `src/auth/middleware/authenticate.ts`
3. Review `src/auth/routes/login.ts`
Don't read: UI code (not relevant to backend auth changes)

## Dependency Graph
Auth → DB → Cache
API → Auth → DB
UI → API → Auth

If you change Auth, you must test: API, UI

This is your compass. Claude reads this first, then follows the breadcrumbs to the specific files it needs.

Question 3: How do you navigate with Glob and Grep?

You learned Glob and Grep in earlier lessons. In large codebases, they're not optional — they're essential.

Instead of reading files randomly:

claude: Read all files in src/

You guide Claude strategically:

claude: 
1. Find all files that import 'User' model (use Grep)
2. Of those, find the ones in src/api/ (use Glob)
3. Read those specific files

This is much more efficient than reading randomly.

Pattern: Grep → Glob → Read → Analyze

# Find where validateToken is called
grep -r "validateToken" src/

# Find all JavaScript files in the auth module
glob "src/auth/**/*.js"

# Read only those files

This approach scales because you're reading less but smarter.

Question 4: When should you use subagents for large codebases?

Subagents shine in large projects because they let you parallelize exploration.

Example workflow:

Main session says: "Explore this large codebase architecture"

Spawns 3 subagents:

Subagent 1: "Map the auth system. Read relevant files, report dependencies."
Subagent 2: "Map the API system. Find all endpoints, report dependencies."
Subagent 3: "Map the database layer. Find all models, report relationships."

[All run in parallel]

Main session gets summaries:

Auth system: 5 core files, depends on DB
API system: 12 endpoint files, depends on Auth
DB system: 8 model files, standalone

Now you have a mental model. You can dive deep into one area without context bloat.

The Insight

Large codebases aren't harder because of their size — they're harder because you can't hold them all in context at once. The solution isn't to read more, it's to read strategically: start with architecture, navigate with Glob and Grep, parallelize exploration with subagents, and only read the specific code that matters to your task.

The mental model: Working with a large codebase is like navigating a large city. You don't memorize every street — you get a map (CLAUDE.md), use directions (Grep/Glob), and explore one neighborhood at a time (Lesson 35: subagents).

Try It

Let's practice navigating a real (or hypothetical) large codebase.

Create a CLAUDE.md architecture map for a project you know:

# [Project Name] Architecture

## Directory Structure
- src/module1/ — [purpose, files, lines]
- src/module2/ — [purpose, files, lines]

## Critical Paths for Common Tasks
- To add feature X: Read these files in this order
- To debug issue Y: Check these modules

## Dependency Graph
[Show how modules depend on each other]

Give Claude Code a task with constraints:

I need to add a new feature to [module].

Here's my codebase map (CLAUDE.md).
Don't read anything else.

Use Glob and Grep to find the specific files I need to modify.
Report your findings first, then read only those files.

Watch Claude navigate strategically:
- It reads CLAUDE.md
- It uses Grep to find dependencies
- It uses Glob to locate specific files
- It reads only what's necessary
- It completes the task without context bloat
Compare to a "read everything" approach:
- Time: much faster
- Tokens: significantly fewer
- Quality: often better (Claude isn't distracted by irrelevant code)

Key Concepts Introduced

Concept	Definition
Architecture-first	Understanding codebase structure before diving into code
Strategic reading	Using Glob and Grep to find only relevant code, not reading everything
CLAUDE.md as map	Documentation that guides Claude through large codebases
Parallel exploration	Using subagents to explore different modules simultaneously
Dependency awareness	Understanding how modules interact to avoid missed side effects

Bridge to Lesson 39

You've learned to scale your own work (subagents), automate repetitive tasks (headless), manage costs (optimization), and navigate complex codebases (strategic reading).

But we've glossed over something important: trust.

When you give Claude Code shell access, file editing permissions, and git commands, you're saying "I trust you to act on my codebase." But what does that actually mean? What can go wrong? And how do you protect yourself?

Tomorrow's question: How does Claude Code's security model actually work?

← Back to Curriculum · Lesson 39 →