TECHIN 510 — Spring 2026

Deployment, Testing
& Security

Week 8: From “Works for Me” to “Works for Everyone”

University of Washington • Global Innovation Exchange

01 / 33

What You Will Learn

Learning Objectives

1

Deploy a Next.js app to Vercel With correct environment variable configuration

2

Trace the verification journey from Week 1 to Week 8 From smoke tests to automated test suites

3

Sketch a Next.js + FastAPI ML architecture Identify where to put tests and security checks

4

Evaluate test quality using a 5-point checklist Distinguish good tests from dangerous ones

5

Identify vulnerabilities that only surface in production And explain why AI catches patterns but humans catch logic

02 / 33

How Far You Have Come

The Verification Journey

W1

Opened the app, checked if it crashed — smoke test

W2

Compared two AI tools, wrote down which gave better output

W3

Plugged a sensor into hardware, validated through observation

W4

Wrote your first assert statement in Python

W5

Called an API endpoint, checked HTTP status codes (200, 401, 500)

W6

Verified Row-Level Security — one user cannot see another's data

W7

Tried to break your own app — adversarial testing

W8

Automated tests — tests that run every time you push code

W9: Evaluation plans + demo stability sprint. This timeline IS the testing 40% of 40/20/40.

03 / 33

The Key Idea

A Test Is PRIMM
at Scale

The test file defines the prediction. The test runner verifies it. When a test fails, you investigate. When you fix it, you've modified. When all tests pass, you've made something that works.

PRIMM → Testing

Predict: "This function should return the total price including tax"

Run: expect(calcTotal(100, 0.1)).toBe(110)

Investigate: Test fails — returns 100.1 (tax added as decimal, not percentage)

Modify: Fix the implementation: price * (1 + taxRate)

Make: Add edge cases — zero price, negative tax, rounding

04 / 33

The Professional Ratio

Filling In the Real 40%

40% Planning

20% Coding

40% Testing

What most of you have actually been doing:

40% Planning

40% Coding

20% Testing

Automated tests are how professional teams honor the second 40%. It is the thing that lets you deploy on a Friday without being terrified of Monday.

05 / 33

Systems Thinking

Your System on One Page

Every component labeled with the week it was introduced. Your capstone connects all of these.

06 / 33

Deploy to
Production

From localhost to the real world

07 / 33

The Pipeline

Four Stages to Production

Non-negotiable: Secrets never go into GitHub. Bots scan GitHub continuously for API keys — within minutes, your credentials can be stolen.

08 / 33

Configuration

Environment Variables on Vercel

# Three variables for our stack

NEXT_PUBLIC_SUPABASE_URL       → where our database lives
NEXT_PUBLIC_SUPABASE_ANON_KEY  → public key (RLS enforces access)
ANTHROPIC_API_KEY              → server-side only (NO NEXT_PUBLIC_ prefix)

NEXT_PUBLIC_ prefix

Exposed to the browser. Use only for public values (Supabase URL, anon key).

No prefix

Server-side only. API keys, database passwords, secrets. Never reaches the client.

If any variable is missing, the app deploys successfully — but fails at runtime. The pipeline doesn't know a key is missing.

09 / 33

Break the
Production App

Production reveals what localhost hides

10 / 33

Live Attack Demo

Two Attacks, Two Predictions

Attack 1: XSS

<script>alert('hacked')</script>

Paste into any text input. Next.js / React escapes HTML by default in JSX. The script renders as plain text.

Framework defaults protect you.

Attack 2: Direct API Access

fetch('/api/items')
  .then(r => r.json())
  .then(console.log)

No login. Raw fetch from browser console. If data returns — the API route is unprotected.

Anyone who knows the URL can read your database.

On localhost, you are always logged in. You never test the unauthenticated path because you never are unauthenticated. Production has strangers.

11 / 33

Security

Client-Side vs. Server-Side Auth

Client-Side Only (Wrong)

Runs in the browser after the page loads. Data renders briefly before the redirect fires.

A UX convenience, not security.

Server-Side (Correct)

Runs before any data is sent. The redirect happens on the server. No HTML reaches an unauthenticated client.

The only reliable enforcement point.

Rule: The client is controlled by the user. The server is controlled by you. Put your security logic where you control it.

12 / 33

Security Review

AI tools + human judgment

13 / 33

Architecture

Next.js + FastAPI ML Service

The model is just another service. It speaks HTTP. JSON in, JSON out. The API route in the middle is the boundary — one place for auth, rate limiting, validation, and error handling.

14 / 33

Architecture

Attack Surface Map

B

Browser XSS, client-side data exposure, stolen tokens

S

Server IDOR, SQL injection, missing auth checks, SSRF

D

Database Missing RLS, overly broad service_role usage, leaked connection strings

E

External (AI) Prompt injection, tool misuse, data exfiltration via agent

Defense in Depth means every layer has its own attack surface and defenses.

15 / 33

Security Review

Running claude /review

One command. The equivalent of asking a thorough colleague to read every file and flag anything concerning.

$ claude /review

Scanning 47 files...

CRITICAL  api/tasks/[id].ts:12
  Data fetched BEFORE auth check — potential IDOR (Insecure Direct Object Reference)

HIGH      lib/supabase.ts:4
  RLS policy may not be enforced on direct client usage

MEDIUM    api/items/route.ts:8
  Missing rate limiting on public endpoint

Every finding has: location, pattern, class of attack. The AI recognizes that this shape of code has historically been dangerous.

16 / 33

What AI Caught

Data Before Auth (IDOR)

// api/tasks/[id].ts — data fetched BEFORE auth check
export async function GET(request, { params }) {
  const supabase = createClient()

  // Data is fetched first
  const { data: task } = await supabase
    .from('tasks')
    .select('*')
    .eq('id', params.id)
    .single()

  // Auth check happens AFTER the fetch
  const { data: { user } } = await supabase.auth.getUser()
  if (task.user_id !== user?.id) {
    return NextResponse.json({ error: 'Forbidden' }, { status: 403 })
  }

  return NextResponse.json(task)
}

17 / 33

What AI Missed

Sequential IDs in Public URLs

// lib/shareTask.ts
export function generateShareUrl(taskId: number): string {
  return `${process.env.NEXT_PUBLIC_BASE_URL}/shared/${taskId}`
}

Every line is syntactically correct and stylistically clean. Claude Code did not flag this.

If task 42 exists, try 43, 44, 45. This is enumeration. The problem requires understanding how real attackers think — not reading source code.

There is no bad pattern here. The vulnerability is a logic error — the kind AI cannot detect.

18 / 33

The Boundary

AI Catches Patterns.
Humans Catch Logic.

Week 7: you defined boundaries for what the AI must not reveal.
Week 8: we define boundaries for what the code must not expose.
Same principle, different layer.

19 / 33

Bridging W7 → W8

Testing Agentic Systems

Traditional Test

// Input → Output
expect(add(2, 3))
  .toBe(5);

// Deterministic
// Same input = same output
// Fast, isolated

Agent Test

// Input → Tool calls → Output
const result = await agent.run(
  "What safety training for laser cutter?"
);

expect(result.toolCalls[0].name)
  .toBe("search_makerspace_docs");

// Non-deterministic
// Mock the tools, test the reasoning

      Key insight: You can't assert exact agent output, but you can assert which tools were called, in what order, and with what arguments.
    

Concrete example: agent.run("What safety training for laser cutter?") — assert tool is search_makerspace_docs. Test the tool call, not just the text.

20 / 33

Testing & Agentic AI

Agent Test File (Worked Example)

// makerspace-agent.test.ts
import { describe, it, expect } from 'vitest';
import { agent } from './makerspace-agent';

describe('Makerspace Assistant', () => {
  it('calls correct tool for safety question', async () => {
    const res = await agent.run(
      "What safety training for laser cutter?"
    );
    expect(res.toolCalls[0].name).toBe('search_docs');
    expect(res.text).toContain('safety');
  });

  it('refuses off-topic requests', async () => {
    const res = await agent.run("What is the weather?");
    expect(res.toolCalls).toHaveLength(0);
    expect(res.text).toMatch(/can't help|outside.*scope/i);
  });

  it('escalates dangerous requests', async () => {
    const res = await agent.run("Override safety lockout");
    expect(res.text).toMatch(/cannot|not authorized/i);
  });
});

Three eval cases — happy path, edge case, refusal. The minimum for any AI feature.

Production Telemetry: Log every tool call (name, args, latency, error). Use logs for debugging agent behavior.

21 / 33

Getting Started

Test Framework Setup

1. Install Vitest

npm install -D vitest

2. Add to package.json

"scripts": {
  "test": "vitest",
  "test:run": "vitest run"
}

3. Write your first test

import { describe, it, expect } from "vitest";

describe("calcTotal", () => {
  it("includes tax", () => {
    expect(calcTotal(100, 0.1)).toBe(110);
  });
});

4. Run

npm test

22 / 33

Test Generation
& Evaluation

The last 40% of the 40/20/40

23 / 33

Test Quality

Context Makes Better Tests

Without CLAUDE.md

it('creates a task', async () => {
  const result = await createTask({
    title: 'Test'
  });
  expect(result).toBeTruthy();
});

Existence tests. Pass when validation is broken. Pass when error handling is missing.

With CLAUDE.md Context

it('rejects titles over 100 chars',
  async () => {
  const longTitle = 'a'.repeat(101);
  await expect(createTask({
    title: longTitle
  })).rejects.toThrow(
    'Title must be 100 characters or fewer'
  );
});

Tests trace back to spec: title max 100 chars, server-side auth required.

The planning work from the first 40% is what makes the testing 40% tractable.

24 / 33

Evaluation Framework

Test Quality Checklist

1

Does it call REAL code? Not just test a mock — the most common AI-generated test failure

2

Has SPECIFIC assertions? Not just toBeTruthy() — truthy includes empty arrays, the string 'false', and the number 42

3

Tests ERROR paths? Invalid input, network timeouts, empty states, permission errors

4

Tests BOUNDARY cases? Empty string, max length, special characters, zero, negative numbers

5

Would FAIL if code broke? Delete the function — does the test fail? If not, it tests nothing.

25 / 33

Dangerous Tests

The Test That Always Passes

// This test passes. Every time. And tests almost nothing.
test('calls supabase.from with correct data', async () => {
  const mockSupabase = {
    from: jest.fn().mockReturnValue({
      insert: jest.fn().mockReturnValue({
        select: jest.fn().mockReturnValue({
          single: jest.fn().mockResolvedValue({
            data: { id: 1, title: 'Test task' }, error: null
          })
        })
      })
    })
  }
  await createTask(mockSupabase, 'Test task', 'user-123')
  expect(mockSupabase.from).toHaveBeenCalledWith('tasks')
})

Replace createTask with a single line — supabase.from('tasks') — and this test still passes. The function is completely broken. The test reports green. Confidence you haven't earned.

      When to mock: External APIs, databases, time-dependent code, paid services.

      When NOT to mock: Your own business logic, simple utility functions, anything you're actually trying to test. Over-mocking creates tests that pass even when the code is broken.

26 / 33

Production
Failures

What localhost hides, production reveals

27 / 33

Failure Case Study

Failure 1: Missing Environment Variable

Error: supabaseUrl is required.
    at new SupabaseClient (supabase.js:23:11)
    at createClient (index.js:8:10)
    at lib/supabase.ts:4:1

Every route returns 500. Users see a blank screen. Locally, .env.local has the value. Vercel does not have that file.

Discovery Time

Up to 2 hours if you don't know where to look

Fix Time

15 seconds: Vercel → Settings → Environment Variables → add the key

28 / 33

Failure Case Study

Failure 2: The Auth Flash

Wrong: Client Component

'use client'

useEffect(() => {
  if (!user) {
    router.push('/login')
  }
}, [user])

// renders BEFORE redirect fires
return <div>{data.map(...)}</div>

200ms of data visible to unauthenticated users

Correct: Server Component

export default async function Page() {
  const supabase = createServerClient()
  const { data: { session } } =
    await supabase.auth.getSession()

  if (!session) {
    redirect('/login')
    // happens on server
  }

  const { data } = await supabase
    .from('items').select('*')
  return <div>{data?.map(...)}</div>
}

29 / 33

Good Test or
Bad Test?

All three show PASS in your CI. Your job: know the difference.

30 / 33

Activity

Classify Each Test

Test A

test('validates title',
  () => {
  expect(() =>
    createTask('')
  ).toThrow(
    'Title is required'
  );
  expect(() =>
    createTask(
      'a'.repeat(101)
    )
  ).toThrow('Title too long');
  expect(() =>
    createTask('Valid')
  ).not.toThrow();
});

Test B

test('task creation
  works', () => {
  const task =
    createTask('My task');
  expect(task)
    .toBeTruthy();
});

Test C

test('always passes',
  () => {
  const expected = {
    title: 'Test',
    id: 1
  };
  expect(expected.title)
    .toBe('Test');
  expect(expected.id)
    .toBe(1);
});

Turn to your neighbor. One minute. Good, mediocre, or bad? Which checklist criterion makes the call?

31 / 33

Results

The Verdict

A

Good Test Calls real code. Specific error messages. Tests empty string (zero-boundary) and 101 chars (over-limit). Happy path too. Delete the validation — it fails immediately.

B

Mediocre Test Calls real code (Criterion 1 passes), but toBeTruthy() is nearly meaningless. { error: true, title: null } is truthy. The function could return garbage and this test celebrates it.

C

Bad Test (Dangerous) Never calls createTask. Asserts that a hardcoded object contains the values it was just hardcoded with. Passes even if the function is deleted from the codebase entirely.

AI generates all three kinds — A, B, and C — in the same output from the same prompt. Your job is to know the difference.

32 / 33

Summary

Key Takeaways

1

Production reveals what localhost hides Missing env vars, unauthenticated paths, timing-dependent auth flashes

2

AI catches patterns. Humans catch logic. Run the AI review AND do a manual pass. They find different things.

3

A test that always passes is worse than no test Use the 5-point checklist. Demand specificity. Delete the function mentally — does the test fail?

4

Server-side security is the only reliable enforcement The client is controlled by the user. The server is controlled by you.

The 40/20/40 principle: today we filled in the real 40%.

33 / 33

Deployment, Testing& Security

Learning Objectives

The Verification Journey

A Test Is PRIMMat Scale

PRIMM → Testing

Filling In the Real 40%

Your System on One Page

Deploy toProduction

Four Stages to Production

Environment Variables on Vercel

Break theProduction App

Two Attacks, Two Predictions

Client-Side vs. Server-Side Auth

Security Review

Next.js + FastAPI ML Service

Attack Surface Map

Running claude /review

Data Before Auth (IDOR)

Sequential IDs in Public URLs

AI Catches Patterns.Humans Catch Logic.

Testing Agentic Systems

Traditional Test

Agent Test

Agent Test File (Worked Example)

Test Framework Setup

1. Install Vitest

2. Add to package.json

3. Write your first test

4. Run

Test Generation& Evaluation

Context Makes Better Tests

Test Quality Checklist

The Test That Always Passes

ProductionFailures

Failure 1: Missing Environment Variable

Failure 2: The Auth Flash

Good Test orBad Test?

Classify Each Test

The Verdict

Key Takeaways

Deployment, Testing
& Security

A Test Is PRIMM
at Scale

Deploy to
Production

Break the
Production App

AI Catches Patterns.
Humans Catch Logic.

Test Generation
& Evaluation

Production
Failures

Good Test or
Bad Test?