Dashboard/Phase 4: Integrations
📖 Concept7 min10 XP

Lesson 28: Testing & Validation with Claude

Day 28 of 50 · ~7 min read · Phase 4: Integrations


The Opening Question

Let's say Claude Code just wrote 200 lines of code for you in 30 seconds. It looks good. It compiles. There are no syntax errors.

So you're done, right? You can ship it?

Here's the hard question: do you actually trust it?

Not because Claude isn't smart — it is. But because intelligence and correctness aren't the same thing. A brilliant architect can design a building that falls down if they don't check the load-bearing walls.

The only real way to trust code is to test it. And the interesting thing is: Claude is actually fantastic at writing tests. So the question becomes:

Why would you ever let Claude write code without first writing tests?


Discovery

Question 1: What makes testing different when Claude is involved?

Think about how you normally write code. You write some, then you test it. Or maybe you test-drive (write tests first). Either way, you're the thing that knows what "correct" means.

But when Claude writes code, there's an extra layer: Claude doesn't know if its code is right. You do.

So the workflow should be:

  1. You or Claude write the test (the source of truth)
  2. Claude writes the implementation
  3. The test either passes or fails
  4. If it fails, Claude fixes it

Why is this better than just asking Claude to "make sure the code is correct"?

Pause and think. What's the difference between a test and a guarantee?

A test is objective. The code either satisfies it or it doesn't. There's no interpretation, no hoping Claude "got it right." Either the test passes or it doesn't.

Question 2: What does test-first actually look like?

You've probably heard of TDD (Test-Driven Development). The flow is:

  1. Write a test for the feature you want
  2. Run it (it fails)
  3. Write code to make it pass
  4. Refactor

With Claude Code, you can ask it to follow this flow:

Me: Write a function that takes a list of prices and returns the sum, 
    but ignores any price below $0.01

Claude: First, I'll write a test:
[Claude writes test_sum_prices.py with test cases]

Now let me run the test to make sure it fails:
[Claude runs pytest, it fails]

Now I'll implement the function:
[Claude writes the function]

Let me run the test again:
[Claude runs pytest, it passes]

Why is this powerful? Because Claude is now forced to be precise. It can't hand-wave. The test is the contract.

Question 3: How do you validate Claude's work before committing?

Here's the typical flow:

  1. Claude makes changes to your codebase
  2. A hook runs your test suite (remember Lesson 27?)
  3. If tests fail, Claude sees the output and fixes the code
  4. Once tests pass, you can confidently commit

This is where hooks and testing work together. You configure a hook that says: "After Claude edits files, run the test suite. If it fails, I want to know."

Now Claude can see its own test results and iterate until everything passes.

What happens if Claude doesn't fix the failing tests? What's your recourse?

You stop. You read the error. You either ask Claude to fix it, or you fix it yourself. The test is the objective standard — not Claude's assurance that it's "probably fine."

Question 4: What's the ROI on testing with Claude?

Some people think: "Testing takes time. If Claude is making the changes anyway, isn't testing a bottleneck?"

Think about it the other way: What's the cost of shipping untested code?

If Claude writes code and you ship it without tests, and it breaks in production, you're not saving time. You're moving the cost forward and multiplying it.

But if Claude writes the tests first, then writes code to pass them, the tests are effectively free (Claude wrote them) — and they give you confidence.


The Insight

Testing with Claude Code inverts the usual risk model. Instead of hoping Claude's code is correct, you define correctness up-front through tests, and Claude's job becomes making those tests pass. Tests become the objective standard — not Claude's judgment.

The mental model: Think of tests as a contract between your expectations and the code. Claude Code's job is to satisfy the contract. Your job is to write the contract correctly.


Try It

Let's set up a real test-driven workflow. We'll write a test first, then ask Claude to implement it.

  1. Create a test file (test_greet.py):

    def test_greet_with_name():
        from main import greet
        assert greet("Alice") == "Hello, Alice!"
    
    def test_greet_without_name():
        from main import greet
        assert greet() == "Hello, World!"
    
  2. Run the test (it will fail):

    pytest test_greet.py -v
    
  3. In Claude Code, say:

    "I've written a test file. Can you implement the greet function in main.py to make these tests pass?"

  4. Claude will write the implementation, then run the tests itself to verify.

  5. Check the test output:

    pytest test_greet.py -v
    

You'll see all tests pass. That's the contract working.

  1. Now expand the tests. Add more test cases and ask Claude to make them pass. Notice how the test suite grows but Claude adapts to satisfy it.

Key Concepts Introduced

ConceptDefinition
Test-driven development (TDD)Writing tests first, then implementation that satisfies them
Objective validationUsing tests as the source of truth for correctness, not judgment or assurance
Test suite automationRunning tests via hooks after edits to ensure Claude's changes don't break existing functionality
Contract-based developmentTests as contracts: Claude's job is to satisfy them
Red-green-refactorTDD cycle: Red (test fails), Green (code passes test), Refactor (improve code)

Bridge to Lesson 29

So you've got tests running, Claude's writing code to pass them, and you're confident the code is correct. Great.

But what if something breaks anyway? What if a test fails and you don't understand why?

Tomorrow's question: How do you actually debug code with an AI agent that can read your entire codebase?

We'll dive into debugging — and you'll see that having an agent in the loop is actually a superpower when something goes wrong.


← Back to Curriculum · Lesson 29 →