Skip to content
Michi v2026.05.20
Save the Tokens

Patterns & Anti-Patterns

Reusable lessons from running Michi. Each entry is tagged with a confidence level: high = repeatedly observed, * medium* = context-dependent, low = prescription, not yet validated.


Confidence: High — the single most effective quality gate

Run the test suite after every file change. A sub-second unit test suite enables this without friction. If your suite takes longer than 5 seconds, it’s too slow — split into fast (unit) and slow (integration) tiers.

The tight loop — change → test → see result → iterate — is where quality comes from. A sub-second vitest suite lets the agent fall into this loop naturally.

Confidence: High — well-patterned codebases are the biggest autonomy accelerator

A codebase with consistent conventions lets the agent produce correct code by following existing examples. plugins-web.jsplugins-chat.js, matching test patterns, replicating repository patterns.

Before running Michi, invest in making your codebase’s patterns explicit and consistent. The agent amplifies whatever conventions exist — good or bad.

Confidence: High

Use Explore subagents to deeply analyze relevant code before writing anything. Map interfaces, trace data flows, understand existing patterns. This front-loaded investment saves significant implementation time.

P4: Incremental milestones with working increments

Section titled “P4: Incremental milestones with working increments”

Confidence: High

Each milestone should produce something that works end-to-end, even if partially. Verification is possible at each step, bugs surface early, you can review tangible progress.

Confidence: High

Write the plan before implementing. Plans serve as scope agreements, self-checks for completeness, progress trackers, and decision records. They survive context compaction better than conversation memory. More on plan docs →

P6: Document judgment calls as they happen

Section titled “P6: Document judgment calls as they happen”

Confidence: Low — prescription, not always practiced

When the agent makes an autonomous decision, document it immediately in the plan doc. Retroactive documentation works while context is hot, but becomes unreliable in longer sessions. The mandatory ## Decisions section in plan docs is the mechanism.

P7: Grep for caller assumptions when adding entry points

Section titled “P7: Grep for caller assumptions when adding entry points”

Confidence: High

When adding a new caller to an existing pipeline: grep for hardcoded values related to caller identity, check all validation schemas for enum values needing updates, check downstream services for assumptions about who’s calling. This class of bug — missing enum value, hardcoded identity — is a frequent escapee of unit tests.

Confidence: Medium

Post a structured message (Slack or similar) at milestone completion: what was built, test results, deviations from plan, whether the agent is blocked or continuing. Good for async review.

Confidence: High — a class of bug that keeps surfacing

When updating an API version — or any contract-changing dependency migration — audit the adjacent dimensions, not just the one you migrated. Endpoint paths, response shapes, auth headers, rate limits, pagination, error semantics. The bug is almost never in the dimension you deliberately changed; it’s in the neighbor you didn’t think to re-check.

After any migration, write the audit list before declaring done. Each item gets a one-line check: “I confirmed <dimension> still works under the new version.” If you can’t confirm one, that’s the one that will bite.

P10: Doc-update-after-feature (before claiming done)

Section titled “P10: Doc-update-after-feature (before claiming done)”

Confidence: High

After adding a CLI flag, schema change, environment variable, or public-API change, update the README and relevant examples before declaring done. Docs drift is how users get confused in the very next session; the agent writes the right code and the wrong instructions.

Treat the doc update as part of the feature, not post-completion polish. A feature that behaves correctly but has wrong docs is not done — the user who reads the docs next can’t reproduce the behavior.


Confidence: High — the core risk

The agent writes code, then writes tests for that code. The tests validate the agent’s implementation against the agent’s understanding. This is circular. More on the verification approach →

Confidence: High — simple to prevent, expensive when it happens

Always run git branch before exploring any repo. An agent exploring the wrong branch produces an incomplete picture and leads to wasted cycles.

Confidence: High

A --dry-run flag that skips the real API call tests everything except the most important thing. Smoke tests must use real calls.

AP4: Batching milestones in a single session

Section titled “AP4: Batching milestones in a single session”

Confidence: High

Milestones rushed through a single autonomous stretch accumulate bugs that full plan-implement-verify treatment would have caught.

Each milestone gets full treatment. If you’re batching, you’re scoping wrong.

Confidence: High

Context compaction evicts Read tool cache. Edit requires a preceding Read. In late sessions: re-read before every edit, or use Write (full file replacement) instead of Edit.

AP6: Mocked MongoDB tests for operator semantics

Section titled “AP6: Mocked MongoDB tests for operator semantics”

Confidence: High

Mocking MongoDB for unit tests catches whether the right method was called, but misses operator conflicts (e.g., a field in both $set and $setOnInsert). Use mongodb-memory-server for repository-level tests.

AP7: Agent declares “done” before verification

Section titled “AP7: Agent declares “done” before verification”

Confidence: High

The agent’s “done” is typically narrower than yours — “code written + unit tests pass” — while yours is broader: all verification scenarios covered, docs updated, manual verification run. The fix: explicit acceptance criteria in the plan doc, checked one by one.

Confidence: Medium

Building adapters for undocumented formats produces fragile code. Mark these as explicitly fragile, add extensive comments, don’t refactor without end-to-end testing against real data.

Confidence: High

Git branch operations as verbal step-by-step instructions between human and agent accumulate errors and can end with branches reversed or state corrupted. Use scripts for branch operations.

AP10: Agent doesn’t search outside its workspace

Section titled “AP10: Agent doesn’t search outside its workspace”

Confidence: Medium

The agent won’t look in sibling repos for tools or utilities unless told. A “tools and resources” section in CLAUDE.md listing relevant utilities eliminates this.


TaskCreate/TaskUpdate is useful for structured milestones with multiple steps but becomes overhead in rapid work. Use task tracking for milestones with 5+ steps. Skip for small work.

Too many questions slows the agent. Too few leads to wrong assumptions. Ask during planning (scope, criteria, ambiguities). During implementation, decide and document. Ask only when genuinely blocked.

Commit after each milestone passes verification. Creates save points, makes review easier. Don’t commit mid-milestone — partial commits are worse than no commits.