Lesson 46: Building Custom Agents

Day 46 of 50 · ~7 min read · Phase 6: Mastery

The Opening Question

In Lesson 45, you learned what the Claude Agent SDK is. But reading about tools and system prompts is abstract. You haven't seen the agentic loop in action.

Here's the question: What does a real agent look like? When you build one with the SDK, what does the code actually do? How does the agent reason, call tools, handle errors, and loop until the task is complete?

Let's write actual code and see the loop in action.

Discovery

Question 1: What does the agentic loop look like in code?

This is the core pattern. When you build an agent with the SDK, you're implementing a loop:

1. Send message to Claude with tools
2. Claude responds (might include tool calls)
3. If tool calls: execute them, add results to message history
4. Loop back to step 1 with updated history
5. If no tool calls: agent is done

Here's what that looks like in TypeScript:

async function agenticLoop(userMessage: string) {
  const messages = [{ role: "user", content: userMessage }];

  while (true) {
    // Step 1: Send message to Claude
    const response = await client.messages.create({
      model: "claude-3-5-sonnet-20241022",
      max_tokens: 4096,
      system: "You are a helpful coding assistant.",
      tools: myTools,
      messages
    });

    // Step 2: Check if we're done
    if (response.stop_reason === "end_turn") {
      // Claude finished, no tool calls
      console.log("Agent response:", response.content[0]);
      return;
    }

    // Step 3: Process tool calls
    if (response.stop_reason === "tool_use") {
      messages.push({ role: "assistant", content: response.content });

      const toolResults = [];
      for (const block of response.content) {
        if (block.type === "tool_use") {
          const result = await executeTool(block.name, block.input);
          toolResults.push({
            type: "tool_result",
            tool_use_id: block.id,
            content: JSON.stringify(result)
          });
        }
      }

      // Step 4: Add results to history and loop
      messages.push({ role: "user", content: toolResults });
    }
  }
}

Notice the pattern:

The messages array acts like conversation memory
Claude sees the full history of what happened before
When Claude calls a tool, you execute it and add the result to the history
The loop continues until Claude says "I'm done"

This is how Claude adapts: it sees what happened in previous steps and adjusts its approach.

Question 2: What are common mistakes in agent design?

Building agents looks simple, but there are subtle pitfalls. Here are the dangerous ones:

Mistake 1: Infinite loops

// BAD: Agent keeps calling tools forever
while (true) {
  const response = await client.messages.create({ ... });
  // No check for when to stop
  executeTool(...);
}

Fix: Always check response.stop_reason. If it's "end_turn", break the loop.

Mistake 2: Context overflow

As the loop runs, messages accumulate. Eventually, the context window fills up.

// BAD: Messages keep growing until you hit the token limit
messages.push({ role: "user", content: toolResults });
// ... after 50 tool calls, messages is huge

Fix: Implement message summarization or windowing. Keep only recent messages:

// GOOD: Keep messages bounded
if (messages.length > 20) {
  // Summarize or drop old messages
  messages = messages.slice(-20);
}

Mistake 3: Hallucinated tool calls

Claude sometimes calls tools with wrong argument types or missing required fields.

// Tool expects { file: string }, but Claude sends { file: 123 }
const result = await readFile(toolInput); // Crashes

Fix: Validate and handle errors:

try {
  const result = await readFile(toolInput.file);
  return { success: true, content: result };
} catch (e) {
  return { success: false, error: e.message };
}

Mistake 4: Not giving Claude context about failures

If a tool fails, you need to tell Claude why. Otherwise it keeps trying the same broken approach.

// BAD: Just return error message
{ type: "tool_result", content: "Error" }

// GOOD: Give Claude actionable information
{
  type: "tool_result",
  content: `Error reading file: ENOENT (file not found). 
    The file /path/to/file.js doesn't exist.
    Available files in /path/to are: [list].`
}

Pause and think: If Claude calls a tool and it fails, what does Claude need to know to fix it?

Question 3: How do you test and debug agents?

Testing an agent is tricky because they're non-deterministic. Claude might take different paths on different runs.

Strategy 1: Test individual tools

Don't test the agent's decision-making, test the tools it uses:

describe("tools", () => {
  it("readFile returns file content", async () => {
    const result = await readFile({ file: "test.txt" });
    expect(result.success).toBe(true);
  });

  it("readFile handles missing files", async () => {
    const result = await readFile({ file: "nonexistent.txt" });
    expect(result.success).toBe(false);
  });
});

Strategy 2: Test the loop with mocked Claude

For integration tests, mock the API to return known responses:

// Mock Claude to return specific tool calls
jest.spyOn(client.messages, 'create').mockResolvedValue({
  stop_reason: "tool_use",
  content: [{
    type: "tool_use",
    name: "read_file",
    input: { file: "test.js" }
  }]
});

// Now test that your loop handles this correctly

Strategy 3: Log and trace

Add logging to see what the agent is doing:

console.log("User request:", userMessage);
console.log("Claude response:", response.content);
console.log("Tool calls:", response.content.filter(b => b.type === "tool_use"));
console.log("Tool results:", toolResults);

Pause and think: Why is it hard to test an agent with traditional unit tests?

Question 4: What makes a good agent design?

Some agents are robust and focused. Others are flaky and do weird things. Here's what separates them:

Clear system prompt

// VAGUE
system: "You are a helpful assistant."

// CLEAR
system: `You are a code reviewer. Your job is to find bugs and suggest improvements.
When reviewing, focus on:
1. Correctness (does it do what it intends?)
2. Safety (could this cause harm or errors?)
3. Performance (could this be optimized?)
Always cite specific lines and explain your reasoning.`

Well-defined tools

// VAGUE
{ name: "execute", description: "Execute something" }

// CLEAR
{
  name: "run_tests",
  description: "Run the Jest test suite",
  input_schema: {
    type: "object",
    properties: {
      filter: {
        type: "string",
        description: "Only run tests matching this pattern (optional)"
      }
    }
  }
}

Good error handling

Return actionable information when things go wrong. Help Claude course-correct.

Bounded scope

A "general purpose" agent that does everything is usually worse than a specialized agent that does one thing well.

// GOOD: Specialized agent
system: "You are a TypeScript linter. Find style issues in TypeScript code."

// BAD: Too broad
system: "You are a helpful coding assistant. Do whatever the user asks."

The Insight

Building a working agent is more than understanding the loop. It's about anticipating failure modes, validating tool calls, giving Claude meaningful error feedback, and testing the components that make up the system. A good agent has a clear purpose, well-designed tools, robust error handling, and a system prompt that guides behavior.

The mental model: An agent is like a person doing a task. A clear job description (system prompt) helps. Good reference materials (tool descriptions) help. Feedback when things go wrong (error handling) is essential. Without these, even a smart person stumbles.

Try It

Build a working agent that reads your codebase and finds TODO comments.

Define your tools:

const tools = [
  {
    name: "find_todos",
    description: "Search for TODO comments in code",
    input_schema: {
      type: "object",
      properties: {
        directory: {
          type: "string",
          description: "Directory to search in"
        },
        extension: {
          type: "string",
          description: "File extension to search (e.g., js, ts, py)"
        }
      },
      required: ["directory", "extension"]
    }
  },
  {
    name: "read_file",
    description: "Read a file to see its full content",
    input_schema: {
      type: "object",
      properties: {
        file: { type: "string", description: "Path to file" }
      },
      required: ["file"]
    }
  }
];

Implement tool handlers:

async function executeTool(name: string, input: any) {
  if (name === "find_todos") {
    // Use grep or fs to find TODO comments
    // Return file paths and line numbers
  }
  if (name === "read_file") {
    // Read and return file content
  }
}

Run the agentic loop:

agenticLoop("Find all TODO comments in my project and summarize them");

Observe: Watch how Claude reasons, calls tools, reads files, and compiles a summary.

Key Concepts Introduced

Concept	Definition
Agentic loop	The repeating cycle of send message → handle tool calls → loop back
Message history	The accumulated conversation that Claude sees to reason about what's happened
Tool use	When Claude calls a function to interact with systems
Tool validation	Checking that tool calls are valid before executing
Error feedback	Providing Claude with actionable information when tools fail
Context windowing	Managing message history size to avoid exceeding context limits
Agent specialization	Designing agents for specific roles rather than general purposes
System prompt	The instructions that shape agent behavior and guide tool use

Bridge to Lesson 47

You now understand how to build a functional agent with the SDK. But you've been thinking in singular: one agent, doing one task.

Tomorrow's question: What happens when you have multiple agents? How do they coordinate? How do they communicate?

We'll explore agent teams — multiple specialized agents working together on complex problems.

← Back to Curriculum · Lesson 47 →