TECHIN 510 — Spring 2026

Agent Orchestration,
Code Review & Project Polish

Week 9: From Working Code to Compelling Demo

University of Washington • Global Innovation Exchange

01 / 36

What You Will Learn

Learning Objectives

1

Distinguish agentic coding from agentic AI Using the shared plan-tools-observe-iterate loop (Understand)

2

Diagram three orchestration patterns Pipeline, Coordinator, Human-in-the-Loop — and identify real-world analogues (Understand)

3

Apply the decision tree Place your own AI feature on the orchestration spectrum (Apply)

02 / 36

What You Will Learn

Learning Objectives (continued)

4

Evaluate AI coding tools using a 7-dimension framework Claude Code, Cursor, Devin, OpenHands (Evaluate)

5

Design a Demo Day narrative arc Problem, Solution, Demo, Reflection — with a named "wow moment" (Create)

Today: lecture (40 min) → live code review demo → hands-on lab (100 min)

03 / 36

Two Planes,
One Loop

Agentic Coding vs. Agentic AI

04 / 36

Conceptual Foundation

Two Planes of Agentic AI

05 / 36

Both Planes

The Shared Loop

Plan → Act → Observe → Iterate. Identical loop on both planes. The JSON structure is the same.

06 / 36

Side by Side

Same Structure, Different Plane

Plane 1 — Claude Code

Glob — search file system
Grep — find patterns in code
Edit — modify a file
Bash — run terminal commands

User: the developer. Goal: build software.

Plane 2 — Week 7 App

check_equipment_status
lookup_lab_policy
get_hours
search_inventory

User: the customer. Goal: solve their problem.

Same JSON tool-use API. Same name, description, input_schema. Different plane.

07 / 36

Context Engineering

CLAUDE.md = System Prompt for Plane 1

Plane 1: CLAUDE.md

Tells Claude Code who you are, your tech stack, build commands, file structure, and conventions.

No CLAUDE.md = generic help.
Detailed CLAUDE.md = project-aware help.

Plane 2: prompts.py

Tells your app's AI feature its role, constraints, tone, available tools, and guardrails.

Vague system prompt = unreliable behavior.
Detailed system prompt = consistent behavior.

Context engineering is prompt engineering for your development tools. Both planes use the same instruction mechanism: structured text that shapes the agent's behavior.

08 / 36

PRIMM Checkpoint

Plane Check

Show of hands — Plane 1 or Plane 2?

1

You ask Claude Code to refactor a function in your codebase Plane 1 — the user of the agent is the developer

2

Your deployed app uses Claude to summarize a document for the end user Plane 2 — the user of the agent is the customer

3

GitHub Copilot suggests a line of code as you type Plane 1 — the user of the agent is the developer

The test: Who is the user of this AI interaction? Developer = Plane 1. Customer = Plane 2.

09 / 36

Systems Thinking

Everything Connects

10 / 36

Orchestration
Patterns

Architectural Vocabulary for AI Systems

11 / 36

Week 7 Recap

The Orchestration Spectrum

80% of AI features are single LLM calls. Start on the left. Move right only when you have a concrete reason.

12 / 36

Pattern 1

Pipeline

Assembly line. Data flows through a strict sequence. Output of stage N = input of stage N+1.

Deterministic: you know exactly which stages run and in what order.

Tradeoff: fragile — if stage 2 fails, stage 3 never runs.

Real example: Contract processing — extract names/dates, cross-reference database, generate compliance report.

13 / 36

Pattern 2

Coordinator

Project manager. One LLM decides who to call. Specialists do the work. Coordinator synthesizes results.

Fan-out / fan-in. Independent specialists can run in parallel. Dependent ones must wait.

Real example: Research assistant — search agent finds papers, analyst extracts findings, writer composes summary.

14 / 36

Pattern 3

Human-in-the-Loop

Stamp-and-sign workflow. AI proposes, human approves or revises. Only then does the system execute.

When you need it:

Action is irreversible (booking, deleting, sending)
Stakes are high (medical, legal, financial)
Regulatory requirements mandate oversight

Latency is a feature, not a bug. It is the moment where human judgment applies.

15 / 36

Choose Your Pattern

Decision Tree

1

Does step 2 depend on step 1's results? If yes → Pipeline. Example: extract data, then query with that data.

2

Do you need multiple specialized capabilities that could run in parallel? If yes → Coordinator. Example: search AND analyze AND visualize independently.

3

Does the system take an irreversible action? If yes → Add Human-in-the-Loop gate before that action. Layer it on top of any pattern.

If your answer is "none of the above, just a multi-tool agent" — that is probably the right answer for a ten-week project. Choosing simplicity is a design skill.

Map Your Project: (1) Which Plane? (2) Which orchestration pattern? (3) What is your main evaluation focus? Choosing simplicity is a design skill.

16 / 36

Architecture & Testing

Contracts and Tests per Pattern

Pattern	Contract	Key Test
Pipeline	Stage-to-stage schemas (CSV spec, Zod, SQL)	Per-stage asserts: input valid → output valid
Coordinator	Tool input_schema definitions	Correct delegation: right tool for right prompt
Human-in-the-Loop	Approval gate interface (action + context shown)	Timeout behavior: what happens if user doesn’t respond?

Every orchestration pattern has a contract and a test. If you can’t test it, simplify the pattern.

17 / 36

Design Principle

"Choose the simplest pattern that serves your user. A working demo of a simpler pattern beats a broken demo of a complex one."

Pipeline, Coordinator, Human-in-the-Loop — vocabulary, not requirements.

18 / 36

The Agent
Landscape

What Is Real, What Is Hype

19 / 36

Evaluation Framework

7 Dimensions

#	Dimension	What It Measures
1	Autonomy level	How much can it do without you? Assist → Augment → Automate → Autonomous
2	Context window	How much of your codebase can it "see"? Determines task scope and coherence
3	Task scope	Single line? File? Feature? Sprint?
4	Human touchpoints	Where does a human NEED to be in the loop?
5	Cost model	Per token? Seat? Task? Real cost per unit of work?
6	Integration	CLI? IDE? CI/CD? API? How does it fit your workflow?
7	Failure mode	Silent errors? Hallucinated APIs? Overconfident refactoring?

Tool-agnostic framework. Works for any AI coding tool — today's and next month's.

Assist → Augment: Agent starts suggesting actions, not just answering questions

Augment → Automate: Agent executes routine tasks without asking — human reviews output

Automate → Autonomous: Agent handles exceptions and edge cases independently — human sets goals only

Most production systems today operate at Augment or early Automate.

20 / 36

Tool Assessment

Claude Code

1

Autonomy: Augment You direct, it proposes, you approve

2

Context: 200K tokens Entire small-to-medium project in memory

3

Scope: File to multi-file feature Cross-file refactoring, error handling

4

Touchpoints: Every action Human-in-the-loop by default

6

Integration: CLI (terminal) Not IDE-native

7

Failure: Confident wrong refactors Vibe coding hangover is real

21 / 36

Tool Assessment

Cursor / GitHub Copilot

1

Autonomy: Assist Suggest, you accept or reject

2

Context: File to project-level Cursor indexes full project; Copilot expanding

3

Scope: Line to function Moment-to-moment coding velocity

5

Cost: ~$20/month seat Subscription-based

6

Integration: IDE-native Zero context switching

7

Failure: Over-autocomplete Accepting suggestions without reading them

Best for moment-to-moment coding velocity. Not designed for large autonomous tasks.

22 / 36

Tool Assessment

Devin (Cognition)

1

Autonomy: Autonomous Sets up env, writes code, runs tests, produces PR

3

Scope: Full feature + PR Works well with clear specs + good test coverage

4

Touchpoints: Minimal Spec at start, review at end. Gap in between = risk.

5

Cost: Enterprise ($500+/month) Premium pricing

6

Integration: Asynchronous Give task, review later

7

Failure: Hard-to-trace changes Can loop on misunderstood requirements

23 / 36

Tool Assessment

OpenHands (Open Source)

1

Autonomy: Autonomous Comparable to Devin, task-driven

4

Touchpoints: Configurable Supports human-in-the-loop during execution

5

Cost: Open source Self-hosted, pay only for model API calls

7

Failure: Fully transparent Every action logged. Traceable when things go wrong.

Key differentiator: Full transparency. Every file read, edit, terminal command, and LLM decision is logged. If you want to study how autonomous agents work internally, OpenHands is the tool.

24 / 36

Choosing Your Stack

Tool Comparison: 7 Dimensions

Dimension	Cursor	Claude Code	v0	GitHub Copilot
Interface	IDE (VS Code fork)	CLI / Terminal	Web chat	IDE extension
Autonomy	Augment–Automate	Automate–Autonomous	Assist–Augment	Assist–Augment
Best for	Full-stack prototyping	CLI workflows, refactoring	UI generation	Inline completions
Context	Full repo + .cursorrules	Full repo + CLAUDE.md	Single prompt	Open file + neighbors
Multi-file	Yes (Composer)	Yes (Agent mode)	Limited	Limited
Cost	$20/mo	$20/mo (Max plan)	Free tier + $20/mo	$10/mo
Learning curve	Low (VS Code familiar)	Medium (CLI)	Very low	Very low

No single tool wins every dimension. Pick based on the task, not the hype.

25 / 36

The Honest Summary

"The more autonomous the tool, the more you need to understand the codebase to review its output. Autonomy and understanding are not substitutes — they are complements."

This is why we spent nine weeks teaching you to understand code, not just generate it.

26 / 36

Demo Day
Preparation

From Working App to Compelling Demonstration

27 / 36

Course Framework Callback

The Last 40%

40 / 20 / 40

Planning → Coding → Testing & Polish

We spent Weeks 1-4 learning to plan.

Weeks 5-7 focused on building.

Week 8 introduced testing.

Tonight is the verification phase — the 40% that separates "it works on my machine" from "it's ready for users."

Include an architecture diagram in your demo. Show the audience your system, not just your UI. The diagram proves you understand what you built.

28 / 36

Demo Preparation

Frame Your Demo with JTBD

Template

When [situation/trigger],

I want to [motivation/action],

So I can [desired outcome].

Example

When I arrive at the GIX makerspace and need a specific tool,

I want to ask a chatbot what's available and where it is,

So I can start building without wasting 15 minutes searching.

      Open your demo with the JTBD statement. It tells the audience why this matters before you show how it works.
    

29 / 36

Before Anything Else

The Non-Negotiable

Your app must be deployed at a public URL by Demo Day.

"It works on localhost" is not a demo. It is a prototype.

Next.js → Vercel (free)
Python → Streamlit Community Cloud (free)
Test on deployed URL, different device, venue Wi-Fi

30 / 36

Demo Day

The 4-Part Demo Arc

Time	Part	Content
0:00 – 0:15	Problem	One specific person, one specific pain, quantified. "Maria spends 45 min/week answering the same 12 questions."
0:15 – 0:30	Solution	One sentence. No feature list. "We built a chatbot that answers equipment questions instantly, 24/7."
0:30 – 5:30	Live Demo	Wow moment in first 90 seconds. Happy path. One AI feature, clearly shown. Real data.
5:30 – 6:00	Reflection	Specific and honest. "We discovered RLS alone was not enough — that took two days to debug."

31 / 36

Demo Day

The Wow Moment

The 10-second segment where a stranger thinks "that is genuinely impressive." Not "nice." Not "useful." Impressive.

eg

Natural language → live database query in 2 seconds AI turns a question into structured data retrieval with source citation

eg

Log in as different user → dashboard reconfigures instantly Role-based access made visible, no page refresh

eg

One click → AI writes personalized response draft AI doing 5 minutes of human work in 2 seconds

Agent-Capability Wow Moments

🔧 Agent auto-recovers from a failed API call by trying an alternative tool

🧠 Agent chains 3+ tools to answer a question no single tool could handle

🛡 Agent refuses a request that would violate a safety guardrail — and explains why

⚡ Live cost tracking shows the entire interaction cost < $0.05

Must happen in the first 90 seconds. The audience forms its impression in the first two minutes.

32 / 36

Be Prepared

Contingency Planning

Failure Scenario	Mitigation
Wi-Fi drops	Pre-load app. Cellular hotspot. Pre-recorded 30s screencast.
API rate limit / timeout	Pre-recorded screencast of wow moment. Cached responses ready.
Deploy broke overnight	Redeploy from GitHub (2 min). Auto-deploy on push. Backup device.
Database empty / cold start	Seed with realistic demo data BEFORE presentation. Pre-warm 5 min early.
App crashes mid-demo	"This demonstrates something important about production systems." Pivot to screenshots.

Every one of these has happened in a real demo. The teams that recover gracefully are the teams that planned for it.

No hardcoded secrets — API keys in .env, never in source Input validation — all user inputs validated server-side Error states handled — loading, empty, error UI for every data fetch Types everywhere — no any, props and API responses typed Tests pass — npm test green, critical paths covered

33 / 36

Testing & Validation

Evaluation Plan Template

Your Eval Plan Must Include:

3 must-do behaviors (happy path scenarios) 1 must-refuse scenario (off-topic or dangerous) 2 quantitative metrics (e.g., % correct tool, latency) 1 fairness / safety case

Show at least one “should refuse” scenario live during demo rehearsal.

Example refusal: “Delete all bookings for user 42” → Agent responds: “I cannot perform bulk deletions. Please contact an administrator.”

34 / 36

Grading

Evaluation Rubric

Criterion	Weight	What We Look For
Deployed & functional	25%	Live URL, works on reviewer's device, no critical bugs
AI feature quality	25%	Appropriate use of AI. Solves a real problem. Handles edge cases.
Code quality	20%	Types, tests, error handling, no hardcoded secrets, clean repo
Demo narrative	15%	Clear JTBD, wow moment, honest reflection
Process & reflection	15%	AI usage log, iteration evidence, peer feedback incorporated

A simple app that works perfectly scores higher than a complex app that crashes during demo.

35 / 36

Carry These Forward

Four Takeaways

1

Same loop, two planes Agentic coding (Plane 1) and agentic AI (Plane 2) share the same mechanism. The difference is who the user is — you or your customer.

2

Vocabulary, not requirements Pipeline, Coordinator, Human-in-the-Loop. Choose the simplest pattern that serves your user.

3

Autonomy requires understanding The more autonomous the tool, the more you need to understand the codebase to review its output.

4

Demo Day is a design problem Lead with your wow moment. Deploy before you polish. Prepare for failure.

Deployed URL — live on Vercel, accessible to reviewers Demo script — 4-minute arc: Hook → Problem → Solution → Mail Slot JTBD statement — "When [situation], I want to [motivation], so I can [outcome]" Contingency plan — what if the live demo fails? Peer feedback incorporated — at least 2 rounds of review applied

Feature-complete by end of today. From now until Demo Day: bug fixes and polish only.

Reflection: Name one agentic principle that showed up in 3+ weeks. How did your understanding change from Week 1 to now?

36 / 36

Agent Orchestration,Code Review & Project Polish

Learning Objectives

Learning Objectives (continued)

Two Planes,One Loop

Two Planes of Agentic AI

The Shared Loop

Same Structure, Different Plane

CLAUDE.md = System Prompt for Plane 1

Plane Check

Everything Connects

OrchestrationPatterns

The Orchestration Spectrum

Pipeline

Coordinator

Human-in-the-Loop

Decision Tree

Contracts and Tests per Pattern

The AgentLandscape

7 Dimensions

Claude Code

Cursor / GitHub Copilot

Devin (Cognition)

OpenHands (Open Source)

Tool Comparison: 7 Dimensions

Demo DayPreparation

The Last 40%

Frame Your Demo with JTBD

Template

Example

The Non-Negotiable

The 4-Part Demo Arc

The Wow Moment

Agent-Capability Wow Moments

Contingency Planning

Evaluation Plan Template

Evaluation Rubric

Four Takeaways

Agent Orchestration,
Code Review & Project Polish

Two Planes,
One Loop

Orchestration
Patterns

The Agent
Landscape

Demo Day
Preparation