Today: lecture (40 min) → live code review demo → hands-on lab (100 min)
Plan → Act → Observe → Iterate. Identical loop on both planes. The JSON structure is the same.
User: the developer. Goal: build software.
User: the customer. Goal: solve their problem.
Same JSON tool-use API. Same name, description, input_schema. Different plane.
Tells Claude Code who you are, your tech stack, build commands, file structure, and conventions.
No CLAUDE.md = generic help.
Detailed CLAUDE.md = project-aware help.
Tells your app's AI feature its role, constraints, tone, available tools, and guardrails.
Vague system prompt = unreliable behavior.
Detailed system prompt = consistent behavior.
Context engineering is prompt engineering for your development tools. Both planes use the same instruction mechanism: structured text that shapes the agent's behavior.
Show of hands — Plane 1 or Plane 2?
The test: Who is the user of this AI interaction? Developer = Plane 1. Customer = Plane 2.
80% of AI features are single LLM calls. Start on the left. Move right only when you have a concrete reason.
Assembly line. Data flows through a strict sequence. Output of stage N = input of stage N+1.
Deterministic: you know exactly which stages run and in what order.
Tradeoff: fragile — if stage 2 fails, stage 3 never runs.
Real example: Contract processing — extract names/dates, cross-reference database, generate compliance report.
Project manager. One LLM decides who to call. Specialists do the work. Coordinator synthesizes results.
Fan-out / fan-in. Independent specialists can run in parallel. Dependent ones must wait.
Real example: Research assistant — search agent finds papers, analyst extracts findings, writer composes summary.
Stamp-and-sign workflow. AI proposes, human approves or revises. Only then does the system execute.
When you need it:
Latency is a feature, not a bug. It is the moment where human judgment applies.
If your answer is "none of the above, just a multi-tool agent" — that is probably the right answer for a ten-week project. Choosing simplicity is a design skill.
Map Your Project: (1) Which Plane? (2) Which orchestration pattern? (3) What is your main evaluation focus? Choosing simplicity is a design skill.
| Pattern | Contract | Key Test |
|---|---|---|
| Pipeline | Stage-to-stage schemas (CSV spec, Zod, SQL) | Per-stage asserts: input valid → output valid |
| Coordinator | Tool input_schema definitions | Correct delegation: right tool for right prompt |
| Human-in-the-Loop | Approval gate interface (action + context shown) | Timeout behavior: what happens if user doesn’t respond? |
Every orchestration pattern has a contract and a test. If you can’t test it, simplify the pattern.
"Choose the simplest pattern that serves your user. A working demo of a simpler pattern beats a broken demo of a complex one."
Pipeline, Coordinator, Human-in-the-Loop — vocabulary, not requirements.
| # | Dimension | What It Measures |
|---|---|---|
| 1 | Autonomy level | How much can it do without you? Assist → Augment → Automate → Autonomous |
| 2 | Context window | How much of your codebase can it "see"? Determines task scope and coherence |
| 3 | Task scope | Single line? File? Feature? Sprint? |
| 4 | Human touchpoints | Where does a human NEED to be in the loop? |
| 5 | Cost model | Per token? Seat? Task? Real cost per unit of work? |
| 6 | Integration | CLI? IDE? CI/CD? API? How does it fit your workflow? |
| 7 | Failure mode | Silent errors? Hallucinated APIs? Overconfident refactoring? |
Tool-agnostic framework. Works for any AI coding tool — today's and next month's.
Assist → Augment: Agent starts suggesting actions, not just answering questions
Augment → Automate: Agent executes routine tasks without asking — human reviews output
Automate → Autonomous: Agent handles exceptions and edge cases independently — human sets goals only
Most production systems today operate at Augment or early Automate.
Best for moment-to-moment coding velocity. Not designed for large autonomous tasks.
Key differentiator: Full transparency. Every file read, edit, terminal command, and LLM decision is logged. If you want to study how autonomous agents work internally, OpenHands is the tool.
| Dimension | Cursor | Claude Code | v0 | GitHub Copilot |
|---|---|---|---|---|
| Interface | IDE (VS Code fork) | CLI / Terminal | Web chat | IDE extension |
| Autonomy | Augment–Automate | Automate–Autonomous | Assist–Augment | Assist–Augment |
| Best for | Full-stack prototyping | CLI workflows, refactoring | UI generation | Inline completions |
| Context | Full repo + .cursorrules | Full repo + CLAUDE.md | Single prompt | Open file + neighbors |
| Multi-file | Yes (Composer) | Yes (Agent mode) | Limited | Limited |
| Cost | $20/mo | $20/mo (Max plan) | Free tier + $20/mo | $10/mo |
| Learning curve | Low (VS Code familiar) | Medium (CLI) | Very low | Very low |
No single tool wins every dimension. Pick based on the task, not the hype.
"The more autonomous the tool, the more you need to understand the codebase to review its output. Autonomy and understanding are not substitutes — they are complements."
This is why we spent nine weeks teaching you to understand code, not just generate it.
40 / 20 / 40
Planning → Coding → Testing & Polish
We spent Weeks 1-4 learning to plan.
Weeks 5-7 focused on building.
Week 8 introduced testing.
Tonight is the verification phase — the 40% that separates "it works on my machine" from "it's ready for users."
Include an architecture diagram in your demo. Show the audience your system, not just your UI. The diagram proves you understand what you built.
When [situation/trigger],
I want to [motivation/action],
So I can [desired outcome].
When I arrive at the GIX makerspace and need a specific tool,
I want to ask a chatbot what's available and where it is,
So I can start building without wasting 15 minutes searching.
Your app must be deployed at a public URL by Demo Day.
"It works on localhost" is not a demo. It is a prototype.
| Time | Part | Content |
|---|---|---|
| 0:00 – 0:15 | Problem | One specific person, one specific pain, quantified. "Maria spends 45 min/week answering the same 12 questions." |
| 0:15 – 0:30 | Solution | One sentence. No feature list. "We built a chatbot that answers equipment questions instantly, 24/7." |
| 0:30 – 5:30 | Live Demo | Wow moment in first 90 seconds. Happy path. One AI feature, clearly shown. Real data. |
| 5:30 – 6:00 | Reflection | Specific and honest. "We discovered RLS alone was not enough — that took two days to debug." |
The 10-second segment where a stranger thinks "that is genuinely impressive." Not "nice." Not "useful." Impressive.
🔧 Agent auto-recovers from a failed API call by trying an alternative tool
🧠 Agent chains 3+ tools to answer a question no single tool could handle
🛡 Agent refuses a request that would violate a safety guardrail — and explains why
⚡ Live cost tracking shows the entire interaction cost < $0.05
Must happen in the first 90 seconds. The audience forms its impression in the first two minutes.
| Failure Scenario | Mitigation |
|---|---|
| Wi-Fi drops | Pre-load app. Cellular hotspot. Pre-recorded 30s screencast. |
| API rate limit / timeout | Pre-recorded screencast of wow moment. Cached responses ready. |
| Deploy broke overnight | Redeploy from GitHub (2 min). Auto-deploy on push. Backup device. |
| Database empty / cold start | Seed with realistic demo data BEFORE presentation. Pre-warm 5 min early. |
| App crashes mid-demo | "This demonstrates something important about production systems." Pivot to screenshots. |
Every one of these has happened in a real demo. The teams that recover gracefully are the teams that planned for it.
Show at least one “should refuse” scenario live during demo rehearsal.
Example refusal: “Delete all bookings for user 42” → Agent responds: “I cannot perform bulk deletions. Please contact an administrator.”
| Criterion | Weight | What We Look For |
|---|---|---|
| Deployed & functional | 25% | Live URL, works on reviewer's device, no critical bugs |
| AI feature quality | 25% | Appropriate use of AI. Solves a real problem. Handles edge cases. |
| Code quality | 20% | Types, tests, error handling, no hardcoded secrets, clean repo |
| Demo narrative | 15% | Clear JTBD, wow moment, honest reflection |
| Process & reflection | 15% | AI usage log, iteration evidence, peer feedback incorporated |
A simple app that works perfectly scores higher than a complex app that crashes during demo.
Feature-complete by end of today. From now until Demo Day: bug fixes and polish only.
Reflection: Name one agentic principle that showed up in 3+ weeks. How did your understanding change from Week 1 to now?