Run an Epic or Project

In short

When work is too large for one pass — a feature set, a large refactor, a multi-step migration — you’re running an epic. An epic runs the iteration cycle at two scales: the whole effort is one spiral (Explore → Brainstorm → Plan → Execute → Verify → Document), and each milestone inside it runs the same cycle at smaller scope.

The skills that carry each phase:

/michi-explore — orient before the first plan; map the codebase, surface assumptions, agree on scope
/michi-planning — write the milestone contract (acceptance criteria, verification scenarios, numbered steps) before any code
/michi-session — execute with discipline: change → test → iterate, logging decisions in real time
/michi-debrief — review delivery, catch doc drift, promote learnings, calibrate trust for the next milestone

The first epic is always Paired — you stay engaged through each milestone. Entrusted autonomy is earned once the cycle has produced journals, scenarios, and verification infrastructure the agent can rely on.

Use this guide when you’re starting work that needs more than one session to land something complete.

A sample session

The following is an abridged session running a full-text search epic across five milestones — index schema, indexing pipeline, query API, search UI, and a final hardening pass. What follows shows the cycle’s shape: how scope gets settled before planning starts, how a library decision surfaces mid-execution and the user picks a direction, how a small tool-use correction lands without derailing progress, and how a mid-epic debrief catches doc drift before the next milestone plans against stale docs.

You: I want to add full-text search to the notes app.

Agent: Before I scope anything, let me look at how notes are stored and what the existing API layer looks like.

[Agent reads the notes model, the API route handlers, and the existing test structure.]

Notes are stored as rows in a relational database — title, body, created-at, tags as a JSON column. No existing search infrastructure. The API layer is a thin Express service; tests use a test database seeded from fixtures.

A few things to settle before I scope the milestones. First: scope of “full-text search” — body only, or title and tags too? Second: do results need ranking by relevance, or is filtering enough for now? Third: is this a server-side search or does the UI need to work offline?

You: Title and body. Relevance ranking yes, even if it’s basic. Server-side only — no offline requirement.

[A beat of clarification: the agent asks whether tags should be searchable and whether the existing tag-filter UI should merge with the new search, or stay separate.]

Agent: Got it — tags out of scope for now, tag-filter UI stays separate. That keeps M1 clean.

On the search backend: the database already has SQLite full-text search extensions available. That’s the low-friction path. The alternative is a standalone search library like Lunr or MiniSearch running server-side — more control over ranking, but more moving parts. My lean is to use the database’s built-in FTS — it’s already there, no new dependency. Worth confirming before I write the plan.

You: Agreed, start with the database FTS. We can always migrate later if ranking is too limited.

/michi-planning epic full-text-search

Agent: Writing the epic plan and M1 plan now.

[Agent creates docs/epics/full-text-search/plan.md and docs/epics/full-text-search/plans/m1-index-schema.md. M2–M5 plans are not drafted yet — convention is to write each plan at the close of the previous milestone.]

One thing worth flagging before /michi-session: M1 creates the FTS index and migrates existing notes into it. The migration runs once — if it fails midway, the index is in a partial state. I’ll write it as an idempotent migration so it’s safe to re-run, but confirm that running a migration in the test environment is acceptable.

You: /michi-session

Agent: Starting M1. Pre-flight first.

[Agent reads the existing migration files and the database schema to check for anything the plan didn’t account for.]

Pre-flight is surfacing two decisions the plan didn’t fully resolve. Flagging before any source changes.

The FTS index needs a tokenizer setting. The database default is unicode61, which handles most cases well, but porter stemming would improve recall for English text (searching “running” finds notes with “run”). Three options: use unicode61 (default, safe), add porter stemmer now (better recall, small correctness risk on non-English notes), or leave it configurable via an environment flag. Which direction?

You: Option B — porter stemming. The notes are English-only.

Agent: Logging the decision. Executing the migration and index setup.

[Agent writes the migration, creates the FTS virtual table, populates it from existing notes, adds a trigger to keep the index in sync on insert/update/delete, and runs 34/34 tests.]

You: why are you using sed to read the migration file? you have the Read tool.

Agent: You’re right — I defaulted to sed out of habit. Switching to Read with offset/limit from here.

[Agent re-reads the file with the Read tool and continues.]

M1 complete. All 8 acceptance criteria met. Migration is idempotent and tested against an empty database and a seeded one. Proposed commit.

You: /michi-planning m2

Agent: Planning M2 — the indexing pipeline (keeping the FTS index in sync as notes are created, edited, and deleted). Pre-flight done. Writing the plan.

[One decision surfaces during planning: the existing note-update path has two code sites that both write to the notes table — one in the API route handler and one in a background sync path. Both need to fire the index-update trigger. Decision is pre-recorded in the plan doc and flagged before session start.]

You: /michi-session

Agent: Starting M2. Pre-flight. Executing.

[Agent wires the FTS sync into both write paths, adds integration tests covering the background sync path explicitly, and runs 41/41 tests.]

M2 done. One finding to flag: the background sync path had no test coverage before this milestone — four tests added. Worth noting in the debrief.

[M3 follows the same planning → session rhythm, building the query API endpoint. During M3’s planning, the agent flags that pagination and sorting need to be settled before the endpoint contract is written — both are agreed before planning closes.]

[M3 executes and closes. A cross-cutting finding: the FTS snippet function returns raw offsets by default; wrapping them in highlight markers requires an explicit format string. This gets documented for M4, which builds the search UI.]

You: /michi-debrief

Agent: Before committing to scope — what shape debrief do you want for M1–M3? We’ve shipped three milestones with about 14 logged decisions and a few cross-cutting findings worth promoting. M4 and M5 are still pending.

Option A (recommended): full combined debrief of M1–M3 now — delivery, decisions, doc drift, archive candidates, memory updates. Then a delta debrief after M5.

Option B: short journal summary now, full debrief at epic close.

Option C: skip, go straight to M4.

I lean A — Pass 2’s “what docs lie now” is most valuable before M4 plans against the current docs. The background-sync coverage gap and the FTS snippet format finding both belong in the journal before they get lost.

You: a

Agent: Running three passes.

[Pass 2 finds one stale reference: ARCHITECTURE.md still describes the notes-search path as “planned.” Fixed. Pass 3 refreshes stamps on ARCHITECTURE.md and docs/epics/full-text-search/plan.md. Journal updated with the two cross-cutting findings. Memory updated.]

Debrief complete. Both findings are now in the journal — M4’s plan can reference the snippet format string directly rather than rediscovering it. Recommend same autonomy for M4 and M5 — Entrusted with paired check at close.

You: /michi-planning m4

Agent: Planning M4 — the search UI. Pre-flight confirms the query API from M3 matches what I’m planning to call from the frontend. Writing the plan.

Ready for /michi-session.

You: /michi-session

Agent: Starting M4. Pre-flight.

[Agent builds the search input component, wires it to the query API, renders highlighted snippets using the format string documented in M3’s journal entry, adds debounce, and runs 52/52 tests including UI interaction tests.]

M4 done. The search input, result list, and snippet highlighting are working end-to-end. Zero console errors. Proposed commit.

You: /michi-planning m5

Agent: Planning M5 — hardening. Pre-flight scans for edge cases the earlier milestones left open: empty-query handling, notes with no body, very long notes, special characters in search terms. Found two: empty queries currently return a 500 from the FTS layer, and notes with only a title (no body) are excluded from results due to a NULL coalesce issue. Both are minimal fixes. Plan written.

You: /michi-session

Agent: Starting M5. Pre-flight.

[Agent fixes the empty-query guard and the NULL coalesce, adds regression tests for both, runs 58/58 tests, and does a final end-to-end pass through the search UI with edge-case inputs.]

M5 done. Epic complete. Full-text search is live across index, pipeline, API, and UI, with hardening coverage. Proposed commit.

You: /michi-debrief

Agent: This is a clean epic close — five milestones, clean delivery. I’ll run the full three-pass debrief and write the artifact.

[Pass 2 finds one more stale reference caught by the grep: README.md still says “search is not yet supported.” Fixed. Pass 3 archives the per-milestone plan docs, leaves the epic plan and journal in place. Patterns promoted: the idempotent migration pattern and the two-write-path sync discipline.]

Debrief complete. Epic closed.

The full guide

You have work that won’t fit in a single pass. It spans multiple milestones — maybe a new feature set, a large refactor, a migration, or a multi-chapter research effort. You need to plan it, execute it incrementally, and land something working at each stage.

That’s an epic.

The shape of an epic

An epic runs the iteration cycle at two scales simultaneously. The whole epic is one spiral: you Explore the problem space, Brainstorm a milestone structure, and Document what you learned so the next epic is better. Each milestone inside the epic is also a full cycle: its own plan, execution, verification, and debrief.

This is what “fractal” means in practice. The same six phases — Explore → Brainstorm → Plan → Execute → Verify → Document — apply whether you’re looking at the whole effort or a single two-day milestone. What changes is depth and ceremony, not the shape.

The phases

Explore

Before committing to a milestone structure, understand the problem space. For a brownfield epic — existing code, existing conventions — this is heavier. You’re reading the codebase, tracing data flows, finding the load-bearing parts.

Run /michi-explore before writing the first milestone plan. The explore skill runs as a structured conversation: you and the agent orient together before anyone touches implementation. Surface what you know, what you’re assuming, and what you’d need to find out.

For a greenfield epic — new project, no existing code — Explore is lighter. You’re establishing intent, not mapping an existing terrain. Even so, do it: a few minutes of shared orientation prevents the wrong-problem problem later.

Brainstorm

Shape the approach and break the work into milestones. Each milestone should deliver a working increment — not a half-built foundation, but something you can run, test, or read.

Two questions to answer in this phase:

What’s the right milestone sequence? (What does each milestone unlock for the next one?)
Where are the risky or uncertain parts? Those belong in early milestones, not late ones.

For a non-code epic — research, a roadmap, documentation — the milestone structure looks different but the logic is the same. Milestone 1 might produce a research synthesis; Milestone 2 the recommendation; Milestone 3 the implementation of that recommendation. Working increments, verifiable against exit criteria.

Plan

Invoke /michi-planning for each milestone. The plan doc is a contract — it defines scope, acceptance criteria, and verification scenarios before any implementation starts.

The skill will:

Explore the relevant code (or docs, or domain — whatever the milestone touches)
Surface assumptions explicitly — name them so you can confirm or correct them
Co-design verification scenarios with you — for code, these are Given-When-Then stories; for non-code, they’re exit criteria
Write the plan doc with numbered steps, scenarios, and empty sections for decisions and notes

Plan docs survive context compaction. Conversation memory doesn’t. This is why the plan exists: when a long session rolls context off the window, the plan doc is still there. Everything essential goes in it.

Review the plan before moving on. Scope mismatch caught here costs minutes. Scope mismatch caught mid-execution costs hours.

Execute

Invoke /michi-session to run the milestone. The session skill is rigid — intentionally. The discipline is the point.

The core loop:

Change → Test → See result → Iterate

The agent implements against the plan, running the test suite after every change. It logs decisions in real time — when it chooses a library, an approach, or a tradeoff, the decision goes into the plan doc’s ## Decisions section with what was chosen, what alternatives existed, and why.

If work is expanding beyond the plan, the agent flags it and checks with you — no silent scope drift.

Verify

The milestone isn’t done when the implementation is done. Verify means: check the acceptance criteria, one by one. Run the scenarios. The session skill builds in a verification checklist; don’t skip it.

“Tests pass” is not Verify. A test suite that passes tells you the code does what the tests check. It doesn’t tell you whether the right things were checked. Verification runs the scenarios you co-designed in planning — stories about the system doing what it’s supposed to do.

For non-code epics, Verify means the exit criteria from the plan: does this research answer the question? Does this doc cover what it set out to cover? “Not ready” is a valid Verify outcome — naming the gap beats forcing a false completion.

Document

Invoke /michi-debrief after the milestone is committed. This is what makes the spiral work.

The debrief reviews:

Delivery — what was planned versus what was delivered? Were all acceptance criteria met?
Decisions — were the logged decisions sound? Any to reverse, refine, or codify?
Knowledge capture — learnings go to the journal; patterns to docs/reference/patterns.md; rules to CLAUDE.md
Invalidation — did anything in the existing docs turn out to be wrong? Mark it.
Trust calibration — did the milestone go well enough to increase autonomy for the next one?

The debrief’s output becomes the next milestone’s starting context. That’s the spiral: each iteration makes the next one richer and faster. Without the debrief, learnings stay in the conversation — which gets compacted — instead of flowing into durable docs that persist across sessions.

Between milestones, run /michi-sustainability for a health check: test coverage, doc drift, code patterns, archive candidates. It’s not mandatory every milestone, but don’t skip it for more than a few.

Paired and Entrusted

The first epic is always Paired. You’re present, engaged, and reviewing at each step. This isn’t ceremony — it’s how shared context gets established. The agent needs to understand the codebase conventions, your intent, and what “done” looks like for you. That understanding comes from tight loops, not from handing over and checking back.

Entrusted — wider autonomy, review at gates — is earned. Once the cycle has produced journals, scenarios, and verification infrastructure the agent can rely on, the agent can operate with more independence. Moving back to Paired when something is uncertain isn’t regression; it’s knowing when to slow down.

Sub-cases

Greenfield vs. brownfield

A brownfield epic invests more in Explore — there’s an existing codebase with conventions, debt, and a shape that matters. Plan docs should capture any conventions the agent discovers: naming patterns, test structure, existing abstractions. These observations belong in ## Notes; the important ones get promoted to CLAUDE.md during the debrief.

A greenfield epic establishes conventions instead of discovering them. The first milestone often sets patterns the rest of the epic follows. Pay attention to what you’re creating — these early decisions have long reach.

Code vs. non-code

A non-code epic — research, a roadmap, documentation — runs the identical cycle. The difference is in Verify: instead of test scenarios, you verify against exit criteria. Does the research answer the question? Is the decision backed by evidence? Is the doc complete?

Plan docs for non-code epics have the same structure: scope, exit criteria, numbered steps (of the research or writing process), and the same ## Decisions / ## Notes / ## Discussion sections. The agent logs decisions about framing, approach, and what to defer just as it would log technical decisions.

Pitfalls

Skipping the cycle on later milestones. It’s tempting to run multiple milestones in one session and skip the debrief between them. Don’t. Each milestone’s debrief feeds the next milestone’s planning. Skipping it means later milestones plan from stale context.

Silent scope drift. The scope of a milestone evolves — that’s legitimate, as long as it’s a logged, agreed reactive change. Unlogged scope drift (the agent adds something “while it’s in there”) produces a plan doc that doesn’t match the code and a verification checklist that doesn’t cover what was actually built.

Wrong-approach misreads. If the Brainstorm phase doesn’t nail down the approach explicitly, the agent may interpret “redesign the adapter” when you meant “mechanical wrap.” Clarify the approach in Brainstorm and Plan, before execution starts. The cost of misreads rises steeply once code is written.

Flailing under pressure. When the same fix fails three times in a row, the instinct is to try a fourth variant. Stop instead. The problem is usually that the Explore phase isn’t done yet — the root cause hasn’t been found. Step back, reframe, and build a reliable reproduction first.

Verification deflection. “I believe it passes” and “pending your review” are not verification. Run the scenarios. Read the output. Then state the result.

A worked example

A team is modernizing a legacy web app. The work spans five milestones: data model update, API layer, UI refactor, integration with a new service, and a final health sweep.

Before Milestone 1, they run /michi-explore to map the existing codebase — data model shapes, API conventions, test patterns. The explore session produces an orientation doc and surfaces two assumptions: which parts of the data model are still actively used, and what the deployment pipeline expects from the API.

Milestone 1 (/michi-planning) co-designs scenarios against the new data model before any migration runs. Execution (/michi-session) runs migrations with tests after every change; the decision to use a migration library over hand-rolled SQL gets logged immediately. Verify checks all scenarios. Debrief (/michi-debrief) captures a gotcha about the ORM’s handling of nullable fields — it goes in the journal and into a new CLAUDE.md rule.

Milestone 2 plans against what Milestone 1 established. The agent already knows the ORM quirk. The scenarios are sharper because Milestone 1’s debrief refined them.

By Milestone 3, the trust calibration supports moving to Entrusted: the journal has context, the scenarios catalog has coverage, and the agent has demonstrated alignment over two full cycles. Milestone 3 runs with the agent operating more independently, with the human reviewing the debrief rather than each step.

The final milestone’s debrief runs a sustainability check, archives the milestone plans, and captures the patterns that will inform the next epic. The spiral closes.