TECHIN 510 — Spring 2026

Anatomy of Coding Agents

Week 2: Claude Code, MCP, Context Engineering, Superpowers & Evaluation

University of Washington • Global Innovation Exchange

01 / 48

What You Will Learn

Learning Objectives

1

Name the four components of an agentic coding system LLM, context window, tool calls, and agentic loop — and how they interact

2

Apply the 40/20/40 principle Allocate time to planning, coding, and testing in the correct proportion

3

Use Claude Code’s Plan Mode Review and approve an agent’s step-by-step plan before any code is written

4

Write/generate a CLAUDE.md file Project-specific context that measurably improves agent output quality

02 / 48

What You Will Learn

Learning Objectives (continued)

5

Read a git diff from an autonomous agent Explain every changed line to a teammate

6

Explain MCP (Model Context Protocol) Give one concrete example of how it extends an agent beyond the local file system

7

Apply the AI-generated code evaluation checklist Verify that agent output matches intent before shipping

8

Explain how Claude Code’s extension ecosystem works Skills, agents, and rules that enforce professional workflow standards

03 / 48

Systems Thinking

Artifact Bridge: Week 1 → Week 2

Week 1 Artifacts

.cursor/rules — first system prompt
I-P-O diagram — first architecture view
Smoke test — 3 pass/fail checks

Week 2 Evolutions

CLAUDE.md — richer context file
Agentic loop — Plan → Act → Observe
TDD & eval checklist — structured verification

Nothing resets between weeks. Each artifact evolves into a more capable version.

04 / 48

Anatomy of
Coding Agents

How agentic coding systems actually work

05 / 48

Agentic System Architecture

Four Components

1

LLM The language model — generates plans, writes code, reasons about problems

2

Context Window Everything the agent can “see” right now: your prompt, files, tool outputs, conversation history

3

Tool Calls Structured actions the LLM can request: read a file, run a command, search the web, write code

4

Agentic Loop The cycle that keeps running until the task is done: plan → act → observe → repeat

06 / 48

The Core Cycle

The Agentic Loop

The agent never writes your entire app in one shot. It reads, decides, writes, runs, observes — and loops.

07 / 48

Agentic Engineering Workflow

The 40 / 20 / 40 Principle

“If you just type ‘build me an app’ and hit enter, the agent skips planning. The code might run — but you won’t understand it. That’s the vibe coding hangover.”

40%

Planning & Research

20%

Coding

40%

Testing & Verification

The 40% planning phase is your insurance policy against the hangover.

08 / 48

Human-Agent Engineering Workflow

Research → Plan → Implement → Test

Phase	Your Job	Agent’s Job
Research (40%)	Define the problem, gather context	Read docs, search, analyze
Plan	Review and approve the plan	Propose step-by-step approach
Implement (20%)	Watch, redirect, review	Write and run code
Test (40%)	Define acceptance criteria	Run tests, fix failures

You are the supervisor. The agent is the junior engineer. Set direction, review output, catch mistakes.

09 / 48

Claude Code
Deep Dive

A terminal-native agentic coding tool

10 / 48

Philosophy

Not a Chatbot

“Claude Code is not a chatbot that happens to write code. Think of it as a junior engineer who lives in your terminal, has read every file in your project, and never gets tired.”

The key word is junior — it does exactly what you ask. Your job is to be the supervisor: set direction, review output, catch mistakes.

When Claude Code starts, it reads the current directory and looks for a CLAUDE.md file — the onboarding document you’d hand a new hire on day one.

11 / 48

Claude Code Feature

Plan Mode

What It Does

The agent proposes a complete step-by-step plan and stops — it waits for your approval before writing a single line of code.

/plan
# after claude code starts
shift+tab twice

When to Use It

Task touches more than one file
Involves user data or security
You’re not sure what the right approach is

Plan Mode is the most important feature for avoiding the vibe coding hangover.

12 / 48

Live Demo Prompt

Input Validation Task

Add input validation to all user-facing fields
in this weather app.

Validate that the city name field is non-empty
and contains only letters and spaces.

Show a clear error message inline if validation
fails.

Notice how this prompt is specific: it names the field, defines the rule, and specifies how errors should appear. Compare this to “add validation” — the agent would have to guess everything.

13 / 48

Tool Comparison

Claude Code vs. Cursor

CLAUDE CODE

Terminal-native agent (also has a desktop app)
Whole-project awareness
Autonomous multi-step execution
Plan Mode for complex tasks
Best for: large changes, refactoring, multi-file work

CURSOR

IDE with AI built in (also has a CLI version)
Inline editing & autocomplete with whole-project awareness
Fast file-level feedback
Visual diff review
Come with more model options
Best for: focused edits, learning code, quick fixes

They’re different tools for different jobs. Professionals use both. You’ll graduate from this course having used both.

14 / 48

Live Demo

Plan Mode → Autonomous Execution

15 / 48

Demo Task

The Prompt

Add a 5-day forecast section to this weather app.
Use the Open-Meteo API's daily forecast endpoint.

Claude Code will enter Plan Mode, propose its approach, and wait. We read the plan before approving anything.

16 / 48

Plan Review

Three Things to Check

1

Does it understand what I asked? Five-day forecast, daily endpoint — correct?

2

Will it break what already works? Adding below existing section, not touching current weather call?

3

Any red flags? Files you didn’t expect? API keys that shouldn’t be needed? (Open-Meteo is free.)

Approve by typing yes or pressing the approval key.

17 / 48

After Approval

Autonomous Execution

1

Stand back Let Claude Code run. Watch the terminal output. Don’t touch the keyboard.

2

Run the app streamlit run app.py — verify the 5-day forecast renders

3

Review the diff git diff app.py — your audit trail. Know exactly what changed.

18 / 48

Version Control

Git Diff: Reading Changes

The Diff

@@ -3,4 +3,5 @@ def greet(name):
 def greet(name):
-    return "Hello " + name
+    if not name:
+        return "Hello, stranger!"
+    return f"Hello, {name}!"

Reading It

− Red lines = removed code

+ Green lines = added code

Gray lines = unchanged context

This diff adds input validation — the agent handled an edge case you might have missed.

19 / 48

Watch Out

Two Rules

1

Don’t approve a plan you didn’t read Plan Mode exists so you can catch mistakes before they happen. Skimming and hitting approve because it “looked long enough” is not engineering.

2

Don’t run the result before you review the diff The diff is your audit trail. It’s what saves you at 11pm before a demo, because you’ll know exactly what changed.

20 / 48

Context
Engineering

CLAUDE.md, commands, hooks, and the context stack

21 / 48

The Key Insight

Context Quality = Output Quality

“The single most important variable in the quality of your output is not which model you pick, or how fast your laptop is. It’s context quality.”

Better Context → Better Output; Garbage In → Garbage Out

22 / 48

Context Engineering

CLAUDE.md — Bad vs. Good

Bad CLAUDE.md

# My App

Write good code. Use Python.
Make it work.
Don't break things.

Vague. No project info. Wastes a dedicated memory slot the agent reads every session.

Good CLAUDE.md

# Project: GIX Staff Portal

## Stack
- Python 3.11, Streamlit, SQLite

## Coding Standards
- PEP 8, 4-space indent
- Type annotations on all functions

## Constraints
- No external APIs (offline-first)
- User-friendly error messages

23 / 48

Decision Framework · REFERENCE

The CLAUDE.md Rule

“Would a competent senior engineer need to know this to work on my project specifically?”

YES

Put it in CLAUDE.md

NO

Don’t bother

24 / 48

Context Window · REFERENCE

Context Compaction

Managing context

You are not stuck with one growing thread—use product features and workflow habits.

Clear past context — reset or clear chat history when old turns add noise or contradict the current task.
Compact past context — let the tool summarize older turns (same idea as the diagram); expect detail loss versus full transcripts.
Spawn subagents — delegate a subtask to a separate agent run so the main thread stays smaller and focused.
New session — start fresh to separate unrelated concerns (different features, research vs. implementation).

Anything that must always apply goes in CLAUDE.md, not in an early chat message. CLAUDE.md is read fresh every session.

Compaction failure mode: Safety instructions can be dropped when context is compressed. Design “sticky” rules that survive compaction — put critical constraints in CLAUDE.md, not just in chat history.

25 / 48

Custom Workflows · REFERENCE

Slash Commands

CLAUDE.md handles persistent context. Slash commands handle repeatable workflows.

# .claude/commands/spec.md

# /spec — Write a specification before any code

When this command is invoked:
1. Ask the user what feature they want to build
2. Write a plain-English specification covering:
   - What the feature does (user-facing behavior)
   - What it does NOT do (explicit scope limits)
   - Edge cases to handle
   - Data inputs and outputs
3. Ask the user to confirm or revise the spec
4. Only begin implementation after explicit approval

The spec command lives in the repo — everyone on the team gets the same workflow.

26 / 48

Automated Quality Gates · REFERENCE

Hooks

Hooks handle automated consequences — things that happen automatically after the agent edits a file.

Hook: PostToolUse (after any .py file is edited)
└── black app.py          → auto-format code style
└── ruff check app.py     → lint for common mistakes
└── mypy app.py           → run type checking

The agent edits a file → it’s automatically formatted, linted, and type-checked. If there’s an error, the agent sees it and fixes it in the same turn.

27 / 48

Extending
Claude Code

Skills, agents, rules & the superpowers ecosystem

28 / 48

From Building Blocks to Systems

What Are Superpowers?

VANILLA CLAUDE CODE

You write your own CLAUDE.md
You create commands one at a time
You configure hooks manually
Your setup stays in one project

WITH SUPERPOWERS / ECC

156 pre-built skills auto-loaded
13 specialized agents ready to invoke
Hierarchical rules across all projects
10+ hooks enforcing quality silently
33+ slash-command workflows

Context Engineering gives you the building blocks. Superpowers is what happens when someone packages hundreds of them into a curated, opinionated system.

29 / 48

Skills

How Skills Work

A skill is a markdown file with YAML frontmatter. It lives in ~/.claude/skills/ and is injected into the agent’s context when its trigger matches.

---
name: brainstorming
trigger: auto
description: Design-first workflow
---

When the user asks to build a feature:
1. STOP. Do not write any code.
2. Ask clarifying questions.
3. Propose 2–3 design approaches.
4. Wait for explicit approval.
5. Only then begin implementation.

# Hard Gate
If the user says “just build it,”
remind them of the 40/20/40 principle
and refuse to proceed without a design.

KEY CONCEPTS

trigger: auto — the skill activates without being invoked
The “hard gate” — the skill literally refuses to write code until design is approved
Skills can be domain-specific: python-patterns, scientific-visualization, metabolomics
Or process-oriented: brainstorming, TDD enforcement, debugging workflows

156 skills means the agent has domain expertise loaded before you type a single word. This is the difference between a junior engineer and a junior engineer who read the company wiki.

30 / 48

Agents & Rules

Specialized Subagents + Persistent Standards

AGENTS (13)

Planner — enforces design-first, creates implementation plans
Code Reviewer — reviews diffs for correctness, style, security
Security Reviewer — focused threat modeling and vulnerability scan

---
name: security-reviewer
model: claude-sonnet-4
tools: [Read, Grep, Bash]
---
You are a security-focused reviewer…

RULES (HIERARCHICAL)

Common rules apply to every project, every language
Language-specific extensions: typescript/, python/, golang/
Example: “Always use type annotations” in python/ rules
Example: “Prefer interfaces over types” in typescript/ rules

~/.claude/rules/
├── common/          # Always active
├── python/
│   └── standards.md # Active in .py
└── typescript/
    └── standards.md # Active in .ts

Agents give the system specialized personas. Rules give it persistent standards. Together: every coding session starts with expertise and discipline baked in.

31 / 48

Live Demo

The Brainstorming Gate

Watch what happens when the Superpowers brainstorming skill is active and we ask Claude Code to build a feature.

1

Prompt: “Add a dark mode toggle to this app” A normal feature request. Without the skill, Claude would start coding immediately.

2

Skill activates — hard gate triggers Claude STOPS. Asks clarifying questions. Proposes 2–3 design approaches. Refuses to write code until you approve a design.

3

User reviews and approves design Only after explicit approval does implementation begin. The 40% planning phase is enforced by the system, not just by willpower.

4

Compare: disable the skill, run the same prompt Without the skill, Claude immediately writes CSS and JS. No design. No clarifying questions. Classic vibe coding.

This is 40/20/40 enforced by architecture, not discipline.

32 / 48

Professional Development

Why This Matters

1

Vanilla tools are just the starting point Every professional team customizes their tooling. Superpowers is one example of how senior engineers build leverage — not just features.

2

Architecture enforces process Instead of relying on memory or discipline, you encode your team’s standards into the system. Design review happens because the tool demands it.

3

Your course workflow uses this The brainstorming skill, TDD enforcement hooks, and /plan command are active in your lab environments. Now you know what’s running under the hood.

By the end of this course, you will have configured your own skills, hooks, and commands.

33 / 48

Model Context Protocol
(MCP)

Connecting agents to the world

34 / 48

Model Context Protocol · REFERENCE

MCP Architecture

One consistent interface. The agent calls them all the same way.

Each MCP server is an independent service. The protocol is the boundary. Same pattern returns in Week 7 when you build in-app agents with their own tool definitions.

35 / 48

Live Demos

MCP in Action

Playwright MCP

Give the agent a headless browser. It can navigate pages, click buttons, extract content, and test web apps just like a human.

Figma MCP

Connect your agent to design files. It can read component structures, extract styles, and turn designs into code directly.

These aren't just APIs—they are standardized tools the agent can discover and use autonomously.

36 / 48

Multi-Agent
Systems

Orchestrating specialized workflows

37 / 48

Multi-Agent Systems

Orchestrator + Subagents

Each subagent is independent. The orchestrator carries the shared understanding.

38 / 48

Multi-Agent Decision

When to Spawn Subagents

1

The task is decomposable There are genuinely independent pieces that can be worked on separately

2

The pieces are large enough Doing them sequentially would waste significant time

3

Coordination cost < parallelism gain Every agent you add is overhead. If the task is very short, spawning agents just slows you down. Same calculation a manager makes when deciding whether to delegate.

39 / 48

Evaluation & Testing

Verifying agent work and ensuring quality

40 / 48

Testing Methodology

Intro to Test-Driven Development (TDD)

What is TDD?

Test-Driven Development is a software practice where you write failing tests before writing the code that makes them pass.

Red → Green → Refactor

Red: Write a test that fails (no code exists).
Green: Write just enough code to pass.
Refactor: Clean up the code.

Why it works for AI

Clearly defines the success condition.
Prevents AI from testing only what it built.
Forces you to plan before execution.

41 / 48

Test-Driven Development

Write the Test First

# Tell Claude: "make this pass"

def test_forecast_returns_five_days():
    result = get_forecast("Seattle")
    assert len(result) == 5
    assert "high" in result[0]
    assert "low" in result[0]

When you hand Claude this test and say “make this pass,” you’ve defined the contract. You know exactly what you’re checking.

When Claude generates tests after the fact, it tends to test what it built — not what you needed.

42 / 48

Test-Driven Development

Green & Refactor

1. Green (Pass the test)

def get_forecast(city: str):
    # Hardcoded to pass the assertion
    # Just enough code to be "green"
    return [
        {"high": 70, "low": 50},
        {"high": 72, "low": 51},
        {"high": 68, "low": 49},
        {"high": 65, "low": 48},
        {"high": 75, "low": 55}
    ]

2. Refactor (Make it right)

def get_forecast(city: str):
    # Now integrate the real API
    # The test ensures we don't break the shape
    data = fetch_weather_api(city)
    return [
        {"high": day.max_temp, "low": day.min_temp}
        for day in data.days[:5]
    ]

With the test in place, the AI can safely refactor from a mock implementation to real API integration without losing the required output structure.

43 / 48

Test-Driven Development

Why TDD?

1

Catches bugs before they ship

A failing test written first is a bug prevented, not a bug fixed.

# Without TDD — this edge case slips through
def validate_email(email: str) -> bool:
    return "@" in email  # validate_email("") returns False... but what about " "?

# With TDD — you catch it before writing the function
def test_rejects_blank_email():
    assert validate_email("") == False
    assert validate_email("   ") == False  # This test forces you to handle whitespace

2

Acts as living documentation

Tests describe what the code should do. When requirements change, update the test first. The code follows.

3

Enables fearless refactoring

With tests in place, you can restructure code knowing immediately if you broke something. Especially important when AI rewrites your code.

      TDD is insurance. The premium is small; the payout when things go wrong is enormous.
    

44 / 48

Test-Driven Development

When to Use TDD

Use TDD When

Business logic with defined inputs/outputs
e.g. calculate_discount(price, tier)
Data transformations
Parsing CSV, cleaning API responses
Bug fixes — write a test that reproduces the bug first
AI-generated code — define the contract before the agent writes

Example: Bug Fix with TDD

# Step 1: Write a test that reproduces the bug
def test_handles_negative_price():
    result = calculate_discount(-10, "gold")
    assert result == 0  # Negative price should return 0

# Step 2: Run it — it FAILS (bug confirmed!)
# Step 3: Fix the function
def calculate_discount(price: float, tier: str) -> float:
    if price <= 0:
        return 0
    discount = TIER_RATES.get(tier, 0)
    return round(price * (1 - discount), 2)

      If you can describe the expected behavior in one sentence, you can write the test first.
    

45 / 48

Test-Driven Development

When TDD Is Not Ideal

Skip TDD When

Exploratory prototyping — you don’t know what you’re building yet
UI layout and visual design — hard to assert pixel positions
One-off scripts or throwaway code
External API integration where the response shape is unknown

Instead, Do This

Prototype first, then add tests once the design stabilizes
Use manual testing or screenshot testing for UI
For API integration: write integration tests after you understand the response shape
Remember the 40/20/40 rule — testing always happens, TDD is just one approach

      TDD is a tool, not a religion. The goal is confidence in your code — pick the approach that gets you there.
    

46 / 48

Verification

AI-Generated Code Checklist

Does it do what I actually asked?
Did I read every file it changed?
Are there hardcoded values I need to replace?
Does it handle edge cases — empty input, API errors, null values?
Could I explain this code to a teammate?

“If you cannot explain it simply, you do not understand it. And if you do not understand it, you do not own it. ”

      Discussion

      Turn to your neighbor. Apply this checklist to the TDD demo code from earlier. How many items pass? Which ones
      fail? Be ready to share one finding.
    

47 / 48

Up Next in Lab

Preview: Interview with Jason

In Lab 2 you will start with a staff interview. Our guest is Jason Evans, Academic Student Counselor (ASC). Jason handles course petition syllabus reviews.

What to capture: decision points and branches in the interviewee’s workflow (your if-then flowchart starts here); exact phrases the interviewee uses; and emotional journey moments — frustration peaks, delight, and “it depends” zones — from the lab guide.

Afterward: your own notes, one problem-statement sentence (“When [people] needs to [task]…”), and a color-coded If-Then flowchart.

48 / 48

Anatomy of Coding Agents

Learning Objectives

Learning Objectives (continued)

Artifact Bridge: Week 1 → Week 2

Anatomy ofCoding Agents

Four Components

The Agentic Loop

The 40 / 20 / 40 Principle

Research → Plan → Implement → Test

Claude CodeDeep Dive

Not a Chatbot

Plan Mode

Input Validation Task

Claude Code vs. Cursor

Live Demo

The Prompt

Three Things to Check

Autonomous Execution

Git Diff: Reading Changes

The Diff

Reading It

Two Rules

ContextEngineering

Context Quality = Output Quality

CLAUDE.md — Bad vs. Good

The CLAUDE.md Rule

Context Compaction

Slash Commands

Hooks

ExtendingClaude Code

What Are Superpowers?

How Skills Work

Specialized Subagents + Persistent Standards

The Brainstorming Gate

Why This Matters

Model Context Protocol(MCP)

MCP Architecture

MCP in Action

Multi-AgentSystems

Orchestrator + Subagents

When to Spawn Subagents

Evaluation & Testing

Intro to Test-Driven Development (TDD)

Write the Test First

Green & Refactor

Why TDD?

When to Use TDD

When TDD Is Not Ideal

AI-Generated Code Checklist

Preview: Interview with Jason

Anatomy of
Coding Agents

Claude Code
Deep Dive

Context
Engineering

Extending
Claude Code

Model Context Protocol
(MCP)

Multi-Agent
Systems