A step-by-step walkthrough of the commands,
workflows, and daily patterns that make it work.
⚠ Evolving rapidly — some parts are still experimental. The core workflow — review, plan, build, deliberate — is stable. Some advanced features (memory persistence, lineage tracking, and routing) are still experimental and will change. Pin a version if you need stability; expect the experimental parts to move.
Start Here
It’s a set of commands and rules you layer on top of Claude Code. Instead of chatting with the AI directly, you use structured commands — /review, /plan, /deliberate — that route your request through a team of specialist AI agents, capture the reasoning behind every decision, and run a quality gate before any code is committed. You stay the decision-maker; the framework makes sure nothing important gets skipped or lost.
Every /command in this guide is typed into the Claude Code CLI — not your normal terminal. Claude Code is an Anthropic product and needs a paid account (Claude Pro or Max) to run the agents.
It doesn’t change how you write code — it changes what happens around it. If you don’t invoke the commands, none of the review, capture, or learning value fires.
Every decision and review is saved to discussions/. In three months the why behind a choice is what saves you — not the diff.
Who it’s for: anyone using Claude Code to build real software who wants independent review, a record of decisions, and guardrails — without writing that machinery themselves. Not sure yet? Slide 9 (“What Do I Do When…”) maps common situations straight to a command.
Before You Start
Most of the planning happens before you write a line of code. This is the suggested approach — research first, validate second, build last.
Explore your domain, then turn a raw idea into something concrete. The framework gives you tools for this stage — it isn’t all manual:
grill-me skill — have the agent interview you relentlessly (one question at a time) to pressure-test and build out a fuzzy idea before you commit.deep-research (a Claude Code built-in) — a fan-out, multi-source, fact-checked research report (or use Gemini / manual reading). Save findings as markdown./seed · /spawn-project — bootstrap that idea into a framework project (scaffold structures, wire hooks, first discussion).Ideation — but framework-assisted, not off to the side.
Paste your research findings into /deliberate. The agent panel evaluates from architecture, security, UX, and anti-groupthink perspectives. One deliberation per major theme.
Bridges external research into framework-captured reasoning.
Use /discover-projects to find reference implementations on GitHub, then /analyze-project to score their patterns. If 3+ projects implement a pattern well, that’s strong evidence.
Grounds research against actual working code.
Run /deliberate again with all evidence — research, specialist findings, and project analysis scores. Produce ADRs (Architecture Decision Records) documenting your decisions with the alternatives you considered.
Synthesizes everything into traceable decisions.
Run /plan to produce a structured spec. Specialists review it. Once approved, /build_module takes over and builds it with integrated quality gates.
Getting Started
/commandsclaude once and sign in.gh auth login for /discover-projects, /shipA ready-made team of AI specialists (security, architecture, QA, UX, and more), the commands that coordinate them, and automatic capture of every decision and review. Under the hood: 12 agents, 24 commands, 21 on-demand skills, 9 hooks, and the four-layer capture stack.
Before your first real task, open CLAUDE.md and fill in Project Identity, your stack, and (optionally) Domain Safety Constraints. This is what turns the template into your project. Keep it lean (<~200 lines).
git clone on the left. Good for trying it out./spawn-project inside Claude Code to scaffold a fresh project from the template, in its own folder.Already have a codebase? Don’t clone — run /apply-framework. It assesses value/risk first, surfaces collisions, and (only on your go-ahead) deploys onto a back-out branch. For a deep takeover afterward it offers /onboard (maps code, reverse-engineers ADRs, builds a stabilization plan + debt ledger).
Daily Pattern
Build features, fix bugs, refactor. The framework doesn't change how you write code — it changes what happens after.
/reviewThe facilitator — the lead agent that runs every multi-agent workflow — assesses risk, assembles specialists, and produces a structured review with a verdict. A full panel runs in parallel, usually a couple of minutes.
Blocking findings must be fixed. Advisory findings are recommendations. The review report tells you which is which.
Run python scripts/quality_gate.py or just commit — the pre-commit hook runs it automatically.
If the quality gate passes and the review is clean, commit your code. The entire review is saved in discussions/ — your searchable record of why each decision was made.
For complex changes, run /walkthrough and /quiz to verify you understand the AI-generated code.
Core Command
The facilitator creates a discussion, assesses risk level, selects 2-5 specialists, dispatches them in parallel, collects findings, synthesizes a verdict, writes a report to docs/reviews/, and seals the discussion.
2-3 agents. QA + 1 domain specialist. Ensemble mode (specialists work in parallel).
3-4 agents. QA + Architecture + domain. Structured Dialogue mode (specialists see each other’s findings and respond).
4-5 agents. Full panel including Security + Independent Perspective.
Approve · Approve with changes · Request changes · Reject
Building Features
The spec is reviewed by specialists before any code is written. Catching an architecture mistake at the spec stage is far cheaper than unwinding it from working code.
When a build task creates a new module, touches security code, or changes database schema, 2 specialists automatically review the code mid-build. Max 2 rounds per checkpoint.
After building, you get a summary with: tasks completed, checkpoint results, unresolved concerns (if any), and a recommendation for the final /review.
Decision Making
Use /deliberate when you face a decision with multiple valid options and non-obvious trade-offs. The specialists bring different professional perspectives that surface considerations you might miss alone.
Each specialist analyzes the question from their domain (security, performance, architecture, etc.) without seeing others' answers.
Specialists see each other's findings and can refine their positions. Disagreements are surfaced, not hidden.
The facilitator synthesizes all perspectives into a recommendation with explicit trade-offs. You make the final call.
Every deliberation is sealed to discussions/ with a full event stream you can re-read later:
Six months later, you can read exactly why a decision was made.
Quick Reference
Research first (Phase 1), then /deliberate to pressure-test, then /plan to create a spec.
Use the grill-me skill — the agent interviews you one question at a time (with a recommended answer for each) until every branch is resolved, checkpointing decisions to disk as it goes.
Run /deliberate with both options. The specialist panel will surface trade-offs you might miss.
/discover-projects to find reference repos, then /analyze-project to learn from their approach and evaluate which patterns could benefit your project — with full attribution.
/review src/ — the specialist panel reviews from security, architecture, performance, and quality perspectives.
Write a regression test tagged @pytest.mark.regression, add it to the regression ledger (the running list that keeps fixed bugs from coming back), then /review the fix.
/walkthrough for a guided explanation, then /quiz to verify you actually get it before shipping.
/apply-framework assesses value/risk and deploys onto a back-out branch; it then offers /onboard for a deep takeover (maps code, reverse-engineers decisions, debt ledger).
/retro analyzes all discussions, surfaces recurring patterns, and proposes process improvements. Run /batch-evaluate to audit pending adoptions.
Learning & Growth
Generates a guided reading path through code you (or AI) wrote. Explains decisions, trade-offs, and how components interact. Progressive disclosure from mental model to implementation details.
Bloom's taxonomy quiz on code you're about to ship. Tests understanding at 4 levels: recall, application, analysis, and evaluation. Includes debug scenarios and change-impact questions.
Points the specialist team at an external project to evaluate patterns worth adopting. Produces a scored recommendation report. Patterns scoring 20+/25 are recommended.
Searches GitHub for interesting projects to analyze. Filters by topic, language, or keywords. Checks for AI integration artifacts and ranks candidates for /analyze-project.
Continuous Improvement
Run at the end of each sprint. Queries all discussions from the period, identifies recurring patterns, evaluates adopted patterns (PENDING → CONFIRMED or REVERTED), and proposes process adjustments.
The big one. Quarterly assessment of framework effectiveness: agent scoring, architectural drift, rule updates, decision churn. Drives framework-level evolution.
Quick health check on all 5 pipeline layers. Reports on discussion volume, SQLite index completeness, findings extraction, pattern clustering, and curated memory currency.
/review captures findings. Agent reflections after each discussion note what worked and what didn't.
/retro aggregates findings, evaluates adoptions, surfaces stale advisories, and proposes adjustments.
/meta-review assesses the framework itself. Which agents are valuable? Which rules need updating? Is there architectural drift?
When a pattern proves valuable across multiple reviews, promote it to curated memory — the framework’s durable Layer 3 — with /promote. Requires your explicit approval (Principle #7).
Reviews all PENDING pattern adoptions from /analyze-project runs. Checks whether each pattern was actually implemented, verifies evidence, and presents verdicts for your approval.
Safety Net
Formatting (ruff format) → Linting (ruff check) → Tests (pytest) → Coverage (≥ 80%) → ADR completeness → Review existence → Regression ledger → BUILD_STATUS freshness (advisory)
Scans for 12 patterns (API keys, JWT, AWS, PATs) before any file write. Blocks the write if secrets are detected.
Prevents concurrent agent edits to the same file. Locks auto-expire after 120 seconds.
Every Python file is auto-formatted with ruff after every edit. You never commit unformatted code.
Blocks direct pushes to main. Create a feature branch, open a PR, then merge. Keeps your history clean.
BUILD_STATUS.md is auto-saved before context compaction and auto-loaded on resume. Work-in-progress survives sessions.
Async Autonomy
The agent keeps building. Gating decisions reach your phone. You tap to answer and it continues — no session needed. (Requires NTFY_TOPIC set in your .env — see setup.)
It needs your input before proceeding — an architectural choice, a risky operation, a confidence gate. Rather than blocking or guessing, it sends the question to your phone.
Up to 3 labeled options appear as push-notification buttons (via ntfy.sh, a free phone-notification service). Tap one — no app, no session, no context switch. Just a phone tap.
The reply is validated against an allow-list, matched to the question, and the agent continues. Free-text replies go to the developer-only channel; the agent never trusts raw reply text directly.
Use collab_loop check to look back at recent messages on the reply channel before starting a poll. This is the resume primitive — avoids missed answers.
The agent publishes to NTFY_TOPIC (MAIN). Your replies go to NTFY_TOPIC-reply (REPLY). An empty title on MAIN is the agent's signal for your free-text — titled messages are the agent's own structured output. The two channels never cross.
ask — push a question with tap-to-answer buttons (max 3)
poll — stream live replies under a persistent Monitor
check — one-shot lookback for missed answers (resume primitive)
say — send a status or acknowledgement with no reply expected
Replies are unauthenticated. The framework validates every reply against a fixed allow-list before acting. Raw reply text is never passed to a command, path, or eval sink. The topic name is never printed in conversation output. See the collaborating-async skill and CLAUDE.md always-on invariants.
Release & Lineage
Full release workflow: quality gate verification, testing checklist, version bump, changelog generation, and rollback strategy. Everything you need to ship with confidence.
Shows how your project relates to the upstream template. Detects drift, validates the manifest, and reports divergence distance. Intentional divergences can be pinned as traits.
When you fork this template for a real project, the framework-lineage.yaml manifest tracks:
The Steward is the framework's institutional memory for genealogy. It knows where your project came from, how far it has diverged, and which divergences are intentional vs. accidental drift.
Reference
All /commands below are typed inside Claude Code; the python lines run in your terminal.
Reference
Skills are on-demand playbooks — the agent loads one automatically when the situation matches, so you rarely type them. A few (like grill-me) you can also invoke by name. There are 21 in .claude/skills/, plus Claude Code built-ins like deep-research; here are the ones worth knowing.
grill-me — relentless one-question-at-a-time interview to stress-test an idea or design.deep-research (built-in) — fan-out, multi-source, fact-checked research report.searching-prior-art — grep for existing solutions / known-broken approaches before you build.committing-changes — the full commit protocol (quality gate → review → education → commit).selecting-review-gates — how risk tier picks the specialist panel and thresholds.handling-micro-fixes — when a tiny change can skip /plan and /build_module.recovering-from-failures — named recovery steps for the 8 failure classes (blocked hook, blocked commit, lost session state…).wrapping-up-sessions — checkpoint a long session and write a paste-ready handoff before quality degrades.notifying-the-developer — push a question to your phone and read the reply safely.collaborating-async — the two-way loop that lets the agent keep working while you’re out (see Slide 13).documenting-decisions · adr-writing — what to document where, and how to write an ADR.syncing-framework-docs — keep specs and these decks in sync when the framework changes.Not shown: domain playbooks (python-project-patterns, testing-playbook, security-checklist, performance-playbook), dispatch internals (cross-agent-dispatch, multi-instance-dispatch, orchestrating-lean-dispatch), running-build-checkpoints, and feature-status-registry.
Avoid These Traps
The most common mistake is treating this like vanilla Claude Code — typing “build X” and accepting the first answer. Then you pay for a framework you’re not using. The commands are the framework.
If you don’t invoke them, none of the capture, independent review, or education value fires. Use /plan → /build_module → /review.
/reviewNever. The agent that wrote the code can’t objectively judge it — independent review is the whole point (Principle #4).
The durable asset is the reasoning in discussions/. In three months, the why is what saves you — not the diff.
Match ceremony to risk. Micro-fixes skip /plan and /build_module. Don’t burn a 5-specialist /deliberate on a label change.
When Claude asks a clarifying question before building, that’s the ~95% confidence gate (Principle #9) saving you wrong-path tokens. Answer it — don’t auto-dismiss it.
The education gate (/walkthrough + /quiz) keeps you the owner. If you can’t explain it back, don’t ship it.
To change how the framework behaves, use the evolution path (Steward gate → your approval → /review) — not ad-hoc edits that drift silently.
You set up CLAUDE.md during install — keep it lean (<~200 lines). Detail belongs in path-scoped rules and on-demand skills, not always-loaded context.
The framework adds structure to AI-assisted development
without slowing you down.
AI-Native Agentic Development Framework v3.5 · Diviner Dojo
diviner-dojo@gmail.com