TECHIN 510 — Spring 2026

Anatomy of Coding Agents

Week 2: Claude Code, MCP, Context Engineering, Superpowers & Evaluation
University of Washington • Global Innovation Exchange
01 / 48
What You Will Learn

Learning Objectives

1
Name the four components of an agentic coding system LLM, context window, tool calls, and agentic loop — and how they interact
2
Apply the 40/20/40 principle Allocate time to planning, coding, and testing in the correct proportion
3
Use Claude Code’s Plan Mode Review and approve an agent’s step-by-step plan before any code is written
4
Write/generate a CLAUDE.md file Project-specific context that measurably improves agent output quality
02 / 48
What You Will Learn

Learning Objectives (continued)

5
Read a git diff from an autonomous agent Explain every changed line to a teammate
6
Explain MCP (Model Context Protocol) Give one concrete example of how it extends an agent beyond the local file system
7
Apply the AI-generated code evaluation checklist Verify that agent output matches intent before shipping
8
Explain how Claude Code’s extension ecosystem works Skills, agents, and rules that enforce professional workflow standards
03 / 48
Systems Thinking

Artifact Bridge: Week 1 → Week 2

Week 1 Artifacts
  • .cursor/rules — first system prompt
  • I-P-O diagram — first architecture view
  • Smoke test — 3 pass/fail checks
Week 2 Evolutions
  • CLAUDE.md — richer context file
  • Agentic loop — Plan → Act → Observe
  • TDD & eval checklist — structured verification

Nothing resets between weeks. Each artifact evolves into a more capable version.

04 / 48

Anatomy of
Coding Agents

How agentic coding systems actually work
05 / 48
Agentic System Architecture

Four Components

1
LLM The language model — generates plans, writes code, reasons about problems
2
Context Window Everything the agent can “see” right now: your prompt, files, tool outputs, conversation history
3
Tool Calls Structured actions the LLM can request: read a file, run a command, search the web, write code
4
Agentic Loop The cycle that keeps running until the task is done: plan → act → observe → repeat
06 / 48
The Core Cycle

The Agentic Loop

AGENTIC LOOP PLAN “What should I do next?” ACT Tool Calls: read() write() run() search() OBSERVE Tool output returned REPEAT Done? Yes → stop  |  No → plan again

The agent never writes your entire app in one shot. It reads, decides, writes, runs, observes — and loops.

07 / 48
Agentic Engineering Workflow

The 40 / 20 / 40 Principle

“If you just type ‘build me an app’ and hit enter, the agent skips planning. The code might run — but you won’t understand it. That’s the vibe coding hangover.”

40%

Planning & Research

20%

Coding

40%

Testing & Verification

The 40% planning phase is your insurance policy against the hangover.

08 / 48
Human-Agent Engineering Workflow

Research → Plan → Implement → Test

Phase Your Job Agent’s Job
Research (40%) Define the problem, gather context Read docs, search, analyze
Plan Review and approve the plan Propose step-by-step approach
Implement (20%) Watch, redirect, review Write and run code
Test (40%) Define acceptance criteria Run tests, fix failures

You are the supervisor. The agent is the junior engineer. Set direction, review output, catch mistakes.

09 / 48

Claude Code
Deep Dive

A terminal-native agentic coding tool
10 / 48
Philosophy

Not a Chatbot

“Claude Code is not a chatbot that happens to write code. Think of it as a junior engineer who lives in your terminal, has read every file in your project, and never gets tired.”

The key word is junior — it does exactly what you ask. Your job is to be the supervisor: set direction, review output, catch mistakes.

When Claude Code starts, it reads the current directory and looks for a CLAUDE.md file — the onboarding document you’d hand a new hire on day one.

11 / 48
Claude Code Feature

Plan Mode

What It Does

The agent proposes a complete step-by-step plan and stops — it waits for your approval before writing a single line of code.

/plan
# after claude code starts
shift+tab twice
When to Use It
  • Task touches more than one file
  • Involves user data or security
  • You’re not sure what the right approach is

Plan Mode is the most important feature for avoiding the vibe coding hangover.

12 / 48
Live Demo Prompt

Input Validation Task

Add input validation to all user-facing fields
in this weather app.

Validate that the city name field is non-empty
and contains only letters and spaces.

Show a clear error message inline if validation
fails.

Notice how this prompt is specific: it names the field, defines the rule, and specifies how errors should appear. Compare this to “add validation” — the agent would have to guess everything.

13 / 48
Tool Comparison

Claude Code vs. Cursor

CLAUDE CODE
  • Terminal-native agent (also has a desktop app)
  • Whole-project awareness
  • Autonomous multi-step execution
  • Plan Mode for complex tasks
  • Best for: large changes, refactoring, multi-file work
CURSOR
  • IDE with AI built in (also has a CLI version)
  • Inline editing & autocomplete with whole-project awareness
  • Fast file-level feedback
  • Visual diff review
  • Come with more model options
  • Best for: focused edits, learning code, quick fixes

They’re different tools for different jobs. Professionals use both. You’ll graduate from this course having used both.

14 / 48

Live Demo

Plan Mode → Autonomous Execution
15 / 48
Demo Task

The Prompt

Add a 5-day forecast section to this weather app.
Use the Open-Meteo API's daily forecast endpoint.

Claude Code will enter Plan Mode, propose its approach, and wait. We read the plan before approving anything.

16 / 48
Plan Review

Three Things to Check

1
Does it understand what I asked? Five-day forecast, daily endpoint — correct?
2
Will it break what already works? Adding below existing section, not touching current weather call?
3
Any red flags? Files you didn’t expect? API keys that shouldn’t be needed? (Open-Meteo is free.)

Approve by typing yes or pressing the approval key.

17 / 48
After Approval

Autonomous Execution

1
Stand back Let Claude Code run. Watch the terminal output. Don’t touch the keyboard.
2
Run the app streamlit run app.py — verify the 5-day forecast renders
3
Review the diff git diff app.py — your audit trail. Know exactly what changed.
18 / 48
Version Control

Git Diff: Reading Changes

The Diff

@@ -3,4 +3,5 @@ def greet(name):
 def greet(name):
-    return "Hello " + name
+    if not name:
+        return "Hello, stranger!"
+    return f"Hello, {name}!"

Reading It

− Red lines = removed code

+ Green lines = added code

Gray lines = unchanged context

This diff adds input validation — the agent handled an edge case you might have missed.

19 / 48
Watch Out

Two Rules

1
Don’t approve a plan you didn’t read Plan Mode exists so you can catch mistakes before they happen. Skimming and hitting approve because it “looked long enough” is not engineering.
2
Don’t run the result before you review the diff The diff is your audit trail. It’s what saves you at 11pm before a demo, because you’ll know exactly what changed.
20 / 48

Context
Engineering

CLAUDE.md, commands, hooks, and the context stack
21 / 48
The Key Insight

Context Quality = Output Quality

“The single most important variable in the quality of your output is not which model you pick, or how fast your laptop is. It’s context quality.”

Better Context → Better Output; Garbage In → Garbage Out

22 / 48
Context Engineering

CLAUDE.md — Bad vs. Good

Bad CLAUDE.md
# My App

Write good code. Use Python.
Make it work.
Don't break things.

Vague. No project info. Wastes a dedicated memory slot the agent reads every session.

Good CLAUDE.md
# Project: GIX Staff Portal

## Stack
- Python 3.11, Streamlit, SQLite

## Coding Standards
- PEP 8, 4-space indent
- Type annotations on all functions

## Constraints
- No external APIs (offline-first)
- User-friendly error messages
23 / 48
Decision Framework · REFERENCE

The CLAUDE.md Rule

“Would a competent senior engineer need to know this to work on my project specifically?”

YES

Put it in CLAUDE.md

NO

Don’t bother

24 / 48
Context Window · REFERENCE

Context Compaction

EARLY IN SESSION “Use SQLite” “PEP 8 please” “Offline-first” [recent messages] LATE IN SESSION [compressed summary] may lose detail [recent messages] still sharp
Managing context

You are not stuck with one growing thread—use product features and workflow habits.

Anything that must always apply goes in CLAUDE.md, not in an early chat message. CLAUDE.md is read fresh every session.

Compaction failure mode: Safety instructions can be dropped when context is compressed. Design “sticky” rules that survive compaction — put critical constraints in CLAUDE.md, not just in chat history.

25 / 48
Custom Workflows · REFERENCE

Slash Commands

CLAUDE.md handles persistent context. Slash commands handle repeatable workflows.

# .claude/commands/spec.md

# /spec — Write a specification before any code

When this command is invoked:
1. Ask the user what feature they want to build
2. Write a plain-English specification covering:
   - What the feature does (user-facing behavior)
   - What it does NOT do (explicit scope limits)
   - Edge cases to handle
   - Data inputs and outputs
3. Ask the user to confirm or revise the spec
4. Only begin implementation after explicit approval

The spec command lives in the repo — everyone on the team gets the same workflow.

26 / 48
Automated Quality Gates · REFERENCE

Hooks

Hooks handle automated consequences — things that happen automatically after the agent edits a file.

Hook: PostToolUse (after any .py file is edited)
└── black app.py          → auto-format code style
└── ruff check app.py     → lint for common mistakes
└── mypy app.py           → run type checking

The agent edits a file → it’s automatically formatted, linted, and type-checked. If there’s an error, the agent sees it and fixes it in the same turn.

27 / 48

Extending
Claude Code

Skills, agents, rules & the superpowers ecosystem
28 / 48
From Building Blocks to Systems

What Are Superpowers?

VANILLA CLAUDE CODE
  • You write your own CLAUDE.md
  • You create commands one at a time
  • You configure hooks manually
  • Your setup stays in one project
WITH SUPERPOWERS / ECC
  • 156 pre-built skills auto-loaded
  • 13 specialized agents ready to invoke
  • Hierarchical rules across all projects
  • 10+ hooks enforcing quality silently
  • 33+ slash-command workflows

Context Engineering gives you the building blocks. Superpowers is what happens when someone packages hundreds of them into a curated, opinionated system.

29 / 48
Skills

How Skills Work

A skill is a markdown file with YAML frontmatter. It lives in ~/.claude/skills/ and is injected into the agent’s context when its trigger matches.

---
name: brainstorming
trigger: auto
description: Design-first workflow
---

When the user asks to build a feature:
1. STOP. Do not write any code.
2. Ask clarifying questions.
3. Propose 2–3 design approaches.
4. Wait for explicit approval.
5. Only then begin implementation.

# Hard Gate
If the user says “just build it,”
remind them of the 40/20/40 principle
and refuse to proceed without a design.
KEY CONCEPTS
  • trigger: auto — the skill activates without being invoked
  • The “hard gate” — the skill literally refuses to write code until design is approved
  • Skills can be domain-specific: python-patterns, scientific-visualization, metabolomics
  • Or process-oriented: brainstorming, TDD enforcement, debugging workflows

156 skills means the agent has domain expertise loaded before you type a single word. This is the difference between a junior engineer and a junior engineer who read the company wiki.

30 / 48
Agents & Rules

Specialized Subagents + Persistent Standards

AGENTS (13)
  • Planner — enforces design-first, creates implementation plans
  • Code Reviewer — reviews diffs for correctness, style, security
  • Security Reviewer — focused threat modeling and vulnerability scan
---
name: security-reviewer
model: claude-sonnet-4
tools: [Read, Grep, Bash]
---
You are a security-focused reviewer…
RULES (HIERARCHICAL)
  • Common rules apply to every project, every language
  • Language-specific extensions: typescript/, python/, golang/
  • Example: “Always use type annotations” in python/ rules
  • Example: “Prefer interfaces over types” in typescript/ rules
~/.claude/rules/
├── common/          # Always active
├── python/
│   └── standards.md # Active in .py
└── typescript/
    └── standards.md # Active in .ts

Agents give the system specialized personas. Rules give it persistent standards. Together: every coding session starts with expertise and discipline baked in.

31 / 48
Live Demo

The Brainstorming Gate

Watch what happens when the Superpowers brainstorming skill is active and we ask Claude Code to build a feature.

1
Prompt: “Add a dark mode toggle to this app” A normal feature request. Without the skill, Claude would start coding immediately.
2
Skill activates — hard gate triggers Claude STOPS. Asks clarifying questions. Proposes 2–3 design approaches. Refuses to write code until you approve a design.
3
User reviews and approves design Only after explicit approval does implementation begin. The 40% planning phase is enforced by the system, not just by willpower.
4
Compare: disable the skill, run the same prompt Without the skill, Claude immediately writes CSS and JS. No design. No clarifying questions. Classic vibe coding.

This is 40/20/40 enforced by architecture, not discipline.

32 / 48
Professional Development

Why This Matters

1
Vanilla tools are just the starting point Every professional team customizes their tooling. Superpowers is one example of how senior engineers build leverage — not just features.
2
Architecture enforces process Instead of relying on memory or discipline, you encode your team’s standards into the system. Design review happens because the tool demands it.
3
Your course workflow uses this The brainstorming skill, TDD enforcement hooks, and /plan command are active in your lab environments. Now you know what’s running under the hood.

By the end of this course, you will have configured your own skills, hooks, and commands.

33 / 48

Model Context Protocol
(MCP)

Connecting agents to the world
34 / 48
Model Context Protocol · REFERENCE

MCP Architecture

CLAUDE CODE (agent, orchestrator) MCP (STANDARD PROTOCOL) GITHUB MCP Server Issues, PRs, code search SUPABASE MCP Server Schema, queries, migrations DOCS MCP Server API references, guides FETCH MCP Server Web pages, external APIs

One consistent interface. The agent calls them all the same way.

Each MCP server is an independent service. The protocol is the boundary. Same pattern returns in Week 7 when you build in-app agents with their own tool definitions.

35 / 48
Live Demos

MCP in Action

Playwright MCP

Give the agent a headless browser. It can navigate pages, click buttons, extract content, and test web apps just like a human.

Figma MCP

Connect your agent to design files. It can read component structures, extract styles, and turn designs into code directly.

These aren't just APIs—they are standardized tools the agent can discover and use autonomously.

36 / 48

Multi-Agent
Systems

Orchestrating specialized workflows
37 / 48
Multi-Agent Systems

Orchestrator + Subagents

YOUR PROMPT ORCHESTRATOR Plans, routes, holds the big picture AGENT A Backend Supabase tables, API routes Data validation AGENT B Frontend React components, routing UI interactions SYNTHESIZE & RESOLVE CONFLICTS

Each subagent is independent. The orchestrator carries the shared understanding.

38 / 48
Multi-Agent Decision

When to Spawn Subagents

1
The task is decomposable There are genuinely independent pieces that can be worked on separately
2
The pieces are large enough Doing them sequentially would waste significant time
3
Coordination cost < parallelism gain Every agent you add is overhead. If the task is very short, spawning agents just slows you down. Same calculation a manager makes when deciding whether to delegate.
39 / 48

Evaluation & Testing

Verifying agent work and ensuring quality
40 / 48
Testing Methodology

Intro to Test-Driven Development (TDD)

What is TDD?

Test-Driven Development is a software practice where you write failing tests before writing the code that makes them pass.

Red → Green → Refactor
  1. Red: Write a test that fails (no code exists).
  2. Green: Write just enough code to pass.
  3. Refactor: Clean up the code.
Why it works for AI
  • Clearly defines the success condition.
  • Prevents AI from testing only what it built.
  • Forces you to plan before execution.
41 / 48
Test-Driven Development

Write the Test First

# Tell Claude: "make this pass"

def test_forecast_returns_five_days():
    result = get_forecast("Seattle")
    assert len(result) == 5
    assert "high" in result[0]
    assert "low" in result[0]

When you hand Claude this test and say “make this pass,” you’ve defined the contract. You know exactly what you’re checking.

When Claude generates tests after the fact, it tends to test what it built — not what you needed.

42 / 48
Test-Driven Development

Green & Refactor

1. Green (Pass the test)
def get_forecast(city: str):
    # Hardcoded to pass the assertion
    # Just enough code to be "green"
    return [
        {"high": 70, "low": 50},
        {"high": 72, "low": 51},
        {"high": 68, "low": 49},
        {"high": 65, "low": 48},
        {"high": 75, "low": 55}
    ]
2. Refactor (Make it right)
def get_forecast(city: str):
    # Now integrate the real API
    # The test ensures we don't break the shape
    data = fetch_weather_api(city)
    return [
        {"high": day.max_temp, "low": day.min_temp}
        for day in data.days[:5]
    ]

With the test in place, the AI can safely refactor from a mock implementation to real API integration without losing the required output structure.

43 / 48
Test-Driven Development

Why TDD?

1
Catches bugs before they ship

A failing test written first is a bug prevented, not a bug fixed.

# Without TDD — this edge case slips through
def validate_email(email: str) -> bool:
    return "@" in email  # validate_email("") returns False... but what about " "?

# With TDD — you catch it before writing the function
def test_rejects_blank_email():
    assert validate_email("") == False
    assert validate_email("   ") == False  # This test forces you to handle whitespace
2
Acts as living documentation

Tests describe what the code should do. When requirements change, update the test first. The code follows.

3
Enables fearless refactoring

With tests in place, you can restructure code knowing immediately if you broke something. Especially important when AI rewrites your code.

TDD is insurance. The premium is small; the payout when things go wrong is enormous.
44 / 48
Test-Driven Development

When to Use TDD

Use TDD When
  • Business logic with defined inputs/outputs
    e.g. calculate_discount(price, tier)
  • Data transformations
    Parsing CSV, cleaning API responses
  • Bug fixes — write a test that reproduces the bug first
  • AI-generated code — define the contract before the agent writes
Example: Bug Fix with TDD
# Step 1: Write a test that reproduces the bug
def test_handles_negative_price():
    result = calculate_discount(-10, "gold")
    assert result == 0  # Negative price should return 0

# Step 2: Run it — it FAILS (bug confirmed!)
# Step 3: Fix the function
def calculate_discount(price: float, tier: str) -> float:
    if price <= 0:
        return 0
    discount = TIER_RATES.get(tier, 0)
    return round(price * (1 - discount), 2)
If you can describe the expected behavior in one sentence, you can write the test first.
45 / 48
Test-Driven Development

When TDD Is Not Ideal

Skip TDD When
  • Exploratory prototyping — you don’t know what you’re building yet
  • UI layout and visual design — hard to assert pixel positions
  • One-off scripts or throwaway code
  • External API integration where the response shape is unknown
Instead, Do This
  • Prototype first, then add tests once the design stabilizes
  • Use manual testing or screenshot testing for UI
  • For API integration: write integration tests after you understand the response shape
  • Remember the 40/20/40 rule — testing always happens, TDD is just one approach
TDD is a tool, not a religion. The goal is confidence in your code — pick the approach that gets you there.
46 / 48
Verification

AI-Generated Code Checklist

“If you cannot explain it simply, you do not understand it. And if you do not understand it, you do not own it. ”

Discussion
Turn to your neighbor. Apply this checklist to the TDD demo code from earlier. How many items pass? Which ones fail? Be ready to share one finding.
47 / 48
Up Next in Lab

Preview: Interview with Jason

In Lab 2 you will start with a staff interview. Our guest is Jason Evans, Academic Student Counselor (ASC). Jason handles course petition syllabus reviews.

What to capture: decision points and branches in the interviewee’s workflow (your if-then flowchart starts here); exact phrases the interviewee uses; and emotional journey moments — frustration peaks, delight, and “it depends” zones — from the lab guide.

Afterward: your own notes, one problem-statement sentence (“When [people] needs to [task]…”), and a color-coded If-Then flowchart.

48 / 48