michi-debrief

The debrief skill runs after an implementation session completes. It is how Michi iterates spirally rather than in circles — each debrief improves the next session by capturing learnings, evolving verification scenarios, and calibrating how much autonomy the agent has earned. It can run in the same session (benefits from retained context) or a fresh session (clean perspective, no compaction bias).

When to use

After completing a /michi-session milestone
After multiple milestones have accumulated without review
When you want to assess what was delivered versus what was planned
When decisions from a session need human review before the next milestone

What it does

Delivery assessment. The agent compares what was planned against what was delivered, checking each acceptance criterion explicitly. It tallies results — test counts for code, exit criteria for non-code — and verifies all milestones are committed.

Decision and discussion review. The agent reads the Decisions and Discussion sections from each milestone’s plan doc. For decisions: were they reasonable? Should any be reversed or codified as patterns? For discussion items: resolve now, defer with rationale, or promote to a project-level question.

Bug and gap analysis. For each bug found during the session, the agent asks what scenario would have caught it and writes that scenario in Given-When-Then format. This is how the verification set evolves — driven by actual failures, not imagined ones. Cross-package gaps get specific attention (missed schema updates, hardcoded assumptions).

Process observations. What worked, what didn’t, where the skill guidance helped or hindered. Scenario quality gets its own assessment — were Level A scenarios useful? Too granular or too vague? Did the agent execute them faithfully? The debrief also determines the lifecycle of verification artifacts: promote new scenarios to the catalog, update scenarios broken by intentional changes, retire ones no longer relevant.

Knowledge capture. Domain learnings go to the epic’s journal. Process learnings go to patterns. Human interventions on code quality get captured as applied examples in docs/reference/code-style.md. Memory-worthy content — collaboration patterns, corrections, confirmed approaches — updates docs/memory.md.

Trust calibration. The agent assesses signals that trust increased (all criteria met, decisions well-documented, no post-completion bugs) versus signals it decreased (premature “done” claims, skipped verification, unescalated decisions). It recommends an autonomy level for the next session.

What it produces

Journal entry in the epic’s journal.md — session summary, metrics, findings, learnings
Pattern updates to docs/reference/patterns.md — new patterns with high confidence
CLAUDE.md updates — durable, broadly applicable rules surfaced by the session
STATUS.md update reflecting current state and what’s next
Scenario catalog updates — new scenarios from error analysis, updated or retired existing ones
Code-style updates to docs/reference/code-style.md — applied examples from human interventions
Memory update to docs/memory.md — collaboration patterns and mental model changes
Trust recommendation for the next session’s autonomy level

Key things to know

Long sessions benefit from a fresh-session debrief — the agent’s compacted context may have lost nuance from early in the session.
The most valuable scenarios come from actual failures. Error analysis during debrief is how the verification set gets stronger over time.
The debrief is where human code-quality interventions get captured. If you corrected the agent’s approach during the session, the debrief ensures that calibration persists.
The natural next step is /michi-planning for the next milestone, or /michi-sustainability if accumulated work needs a broader health check.

For the full agent instructions, see the SKILL.md source.