Skip to content
Michi v2026.05.20
Save the Tokens

Context Capture

The quality of verification is bounded by the quality of shared context. An agent that understands how users interact with the app will write better scenarios than one that only understands the code structure.

For brownfield projects (existing code and users), context capture is a preparatory activity that produces reference artifacts the agent consumes during planning.

From a verification experiment on a real web application:

  1. Playwright trace (best for your own apps) — structured data: actions, DOM snapshots, screenshots, network requests. Everything the agent needs in one artifact. The agent estimated a 3-5x ramp-up improvement over poking at the app manually.

  2. HAR + annotated screenshots — good supplement, useful when Playwright isn’t practical. HAR files capture every HTTP request/response with payload shapes and timing. Screenshots capture visual state. Together they show what happened and what it looked like.

  3. Video → keyframes + transcription — fallback for third-party apps where you can’t instrument the browser. Extract keyframes with ffmpeg, transcribe narration if present. Provides visual context without the structured data.

  4. Raw video — doesn’t work for agent consumption. In experiments, the agent ignored video files entirely. Even when video appeared to “work” in a prior session, investigation revealed the agent had actually processed extracted keyframes, not watched the video.

HAR files need reduction. Raw HAR captures include every asset (CSS, images, fonts). Strip to API calls only. Scrub auth tokens.

Screenshots at key states. A screenshot of every state transition is more useful than a screen recording. The agent can reference specific screenshots during scenario design.

The capture session teaches the human too. Walking through the app to capture a trace is itself a learning activity — Kaner’s technique #8: “work alongside users to see how they work and what they do.”

No app to observe yet. Shared context comes from specs, mockups, analogous systems, and prototypes.

But greenfield doesn’t stay greenfield for long. After the first milestone, you have something running. Context capture becomes a between-milestone activity:

M1: Spec → Implement → Verify (lightweight, spec-based)
Context capture: use what M1 built, record interactions
M2: Richer context → Better scenarios → Better verification
Context capture: use what M1+M2 built...

Each iteration produces both working software and richer context for the next iteration’s verification. This is the spiral applied to verification quality.