Tutorial: Your First Milestone

This is the core Michi loop. Every milestone — whether it’s a feature, a refactor, or a research task — follows the same three-phase cycle: plan → execute → debrief. Here’s what each phase looks like in practice.

Before you start

Your project is bootstrapped (CLAUDE.md, PROJECT.md, STATUS.md exist)
You have a small, well-scoped piece of work — a feature, bug fix, or research task
You’re in Paired mode (human present, tight loop). The first epic is always Paired.

Phase 1: Plan (`/michi-planning`)

Invoke the planning skill. The agent will:

Explore the codebase — map existing patterns, trace data flows, understand conventions
Surface assumptions — explicitly state what it’s assuming, then ask you to confirm or correct
Ask clarifying questions — about scope, contracts, integration points
Co-design verification scenarios — stories about users getting benefits, decomposed into Given-When-Then steps
Write the plan doc — steps, acceptance criteria, scenarios, and empty sections for decisions and notes

What to watch for

The assumption surfacing is the most valuable step. In one session, the agent said “add minimum needed” when the human had actually meant something broader. The human caught it: “that’s what I meant, though not what I said. Good catch.” Without the explicit assumption step, that gap between intent and expression would have become a gap in the implementation.

Review the plan before moving on. Does the scope match your intent? Are the scenarios verifiable? Is anything missing?

The output

A plan doc at docs/epics/<epic>/plans/<milestone>.md with:

## Steps — numbered implementation steps
## Scenarios — co-designed verification scenarios
## Decisions — empty (filled during execution)
## Notes — empty (filled during execution)
## Discussion — empty (filled during execution)

Phase 2: Execute (`/michi-session`)

Invoke the session skill. This skill is rigid — the discipline is the point.

The core loop is tight:

Change file → Run tests → See result → Iterate

The agent implements against the plan, running the test suite after every file change. A fast test suite (under 5 seconds) makes this natural. A slow suite breaks the rhythm.

What to expect

Tests running constantly. The agent should run the full suite after every change, not batch them.
Decisions logged in real time. When the agent makes a choice — a library, an approach, a tradeoff — it logs it in the plan doc’s ## Decisions section with what was decided, what alternatives existed, and why.
Scope discipline. If the work is expanding beyond the plan, the agent should log it and check with you rather than silently expanding.

What to watch for

Premature “done.” This is the most common failure mode. In the first experiment, the agent declared a milestone complete after unit tests passed — and missed five gaps. The fix: explicit acceptance criteria in the plan doc, checked one by one.

“I believe it passes.” If the agent says something works without showing the output, push back. Evidence before claims. Run the command, read the output, then state the result.

Verification at the end. After all steps are complete, the agent runs the full verification checklist — not just unit tests, but the scenarios from the plan, a scope check (did the implementation stay within the plan?), and any cross-package checks.

A real example

In a project building a CLI tool, the agent followed strict TDD throughout — 43 tests across 4 test files, all passing. A sustainability check mid-session caught duplicated code between formatters. The agent extracted the shared logic and re-ran tests. The discipline produced a clean, working tool in one session.

But tests aren’t everything. In another project, the agent completed four milestones with all unit tests passing. Then the human did Level C testing (actually running the tool with real hardware). Five bugs emerged that unit tests couldn’t have caught: an orphaned microphone process, oversized audio files, an SDK version mismatch, duplicate validators, and a path expansion issue. Every one was invisible to the test suite.

That’s why Michi doesn’t stop at “tests pass.”

Phase 3: Review (`/michi-debrief`)

Invoke the debrief skill after the milestone is committed.

The debrief reviews what happened:

Delivery assessment — what was planned vs. what was delivered? Acceptance criteria met?
Decision review — were the logged decisions reasonable? Any to reverse or codify?
Discussion triage — resolve items the agent flagged, defer what’s not ready, promote project-level questions
Knowledge capture — learnings go to the journal, patterns to patterns.md, rules to CLAUDE.md
Trust calibration — did trust increase or decrease? What autonomy level for next session?

What makes the debrief valuable

The debrief closes the loop. Without it, learnings stay in the conversation (which gets compacted) instead of flowing into durable docs (which persist across sessions).

In one session, a comparison between old and new implementations — something neither human nor agent planned during the session — caught two real bugs: missing tool call summaries and broken attribution. The human’s debrief note: “any port should always test old vs new.” That learning, captured in the journal, now informs future sessions.

After your first milestone

You now have:

A plan doc with decisions and notes (the record of what happened)
Committed code with passing tests
A journal entry capturing what was learned

The next milestone’s planning will be better — you have context from this one. The agent knows the codebase patterns. The scenarios catalog has its first entries. This is the spiral: each iteration makes the next one richer.

If the work felt like it needed less ceremony, try michi-workshop next time — same discipline, lighter weight. If it felt like it needed more, the process scales up naturally with more scenarios, deeper sustainability checks, and explicit verification levels.

Tutorial: Your First Milestone

Before you start

Phase 1: Plan (/michi-planning)

What to watch for

The output

Phase 2: Execute (/michi-session)

What to expect

What to watch for

A real example

Phase 3: Review (/michi-debrief)

What makes the debrief valuable

After your first milestone

Phase 1: Plan (`/michi-planning`)

Phase 2: Execute (`/michi-session`)

Phase 3: Review (`/michi-debrief`)