Practical Guide

How to Use the
Agentic Development Framework

A step-by-step walkthrough of the commands,
workflows, and daily patterns that make it work.

⚠ Evolving rapidly — some parts are still experimental. The core workflow — review, plan, build, deliberate — is stable. Some advanced features (memory persistence, lineage tracking, and routing) are still experimental and will change. Pin a version if you need stability; expect the experimental parts to move.

Navigate: arrow keys

What This Actually Is

It’s a set of commands and rules you layer on top of Claude Code. Instead of chatting with the AI directly, you use structured commands — /review, /plan, /deliberate — that route your request through a team of specialist AI agents, capture the reasoning behind every decision, and run a quality gate before any code is committed. You stay the decision-maker; the framework makes sure nothing important gets skipped or lost.

1. It runs inside Claude Code

Every /command in this guide is typed into the Claude Code CLI — not your normal terminal. Claude Code is an Anthropic product and needs a paid account (Claude Pro or Max) to run the agents.

2. The commands are the framework

It doesn’t change how you write code — it changes what happens around it. If you don’t invoke the commands, none of the review, capture, or learning value fires.

3. The reasoning is the asset

Every decision and review is saved to discussions/. In three months the why behind a choice is what saves you — not the diff.

Who it’s for: anyone using Claude Code to build real software who wants independent review, a record of decisions, and guardrails — without writing that machinery themselves. Not sure yet? Slide 9 (“What Do I Do When…”) maps common situations straight to a command.

From Idea to Code: Five Phases

Most of the planning happens before you write a line of code. This is the suggested approach — research first, validate second, build last.

1. Seed 2. Anchor 3. Scout 4. Reconcile 5. Specify

1. Seed — Research & Bootstrap Your Idea

Explore your domain, then turn a raw idea into something concrete. The framework gives you tools for this stage — it isn’t all manual:

grill-me skill — have the agent interview you relentlessly (one question at a time) to pressure-test and build out a fuzzy idea before you commit.
deep-research (a Claude Code built-in) — a fan-out, multi-source, fact-checked research report (or use Gemini / manual reading). Save findings as markdown.
/seed · /spawn-project — bootstrap that idea into a framework project (scaffold structures, wire hooks, first discussion).

Ideation — but framework-assisted, not off to the side.

2. Anchor — Pressure-Test with Specialists

Paste your research findings into /deliberate. The agent panel evaluates from architecture, security, UX, and anti-groupthink perspectives. One deliberation per major theme.

Bridges external research into framework-captured reasoning.

3. Scout — Find Real-World Validation

Use /discover-projects to find reference implementations on GitHub, then /analyze-project to score their patterns. If 3+ projects implement a pattern well, that’s strong evidence.

Grounds research against actual working code.

4. Reconcile — Make Decisions

Run /deliberate again with all evidence — research, specialist findings, and project analysis scores. Produce ADRs (Architecture Decision Records) documenting your decisions with the alternatives you considered.

Synthesizes everything into traceable decisions.

5. Specify — Create the Executable Plan

Run /plan to produce a structured spec. Specialists review it. Once approved, /build_module takes over and builds it with integrated quality gates.

This is where
coding begins.

Setup: From Clone to First Command

# 1. Clone, then create an isolated environment
cd agent_framework_template
python -m venv .venv # then activate it
# macOS/Linux: source .venv/bin/activate
# Windows: .venv\Scripts\activate
# 2. Install + initialize (downloads a ~90MB model on first run)
pip install -r requirements.txt
cp .env.example .env # optional: phone push via ntfy.sh, etc.
python scripts/init_db.py
# 3. Verify (you should see tests pass)
pytest tests/ -v
# 4. Now launch Claude Code IN this folder, sign in, then
# type your first command at its prompt:
claude # starts the Claude Code session
/deliberate "What should I build first here?"

Prerequisites

Required:
Claude Code — the CLI that runs all /commands
A paid Anthropic account (Claude Pro or Max) — required; the agents won’t run without it. After install, run claude once and sign in.
Recommended:
VS Code — editor with Claude Code integration
GitHub CLI (gh) — run gh auth login for /discover-projects, /ship

What You Get

A ready-made team of AI specialists (security, architecture, QA, UX, and more), the commands that coordinate them, and automatic capture of every decision and review. Under the hood: 12 agents, 24 commands, 21 on-demand skills, 9 hooks, and the four-layer capture stack.

Make It Yours

Before your first real task, open CLAUDE.md and fill in Project Identity, your stack, and (optionally) Domain Safety Constraints. This is what turns the template into your project. Keep it lean (<~200 lines).

Three Ways In

Work in the template — the git clone on the left. Good for trying it out.
Brand-new project — run /spawn-project inside Claude Code to scaffold a fresh project from the template, in its own folder.
Existing codebase — see “For Existing Projects” below.

For Existing Projects

Already have a codebase? Don’t clone — run /apply-framework. It assesses value/risk first, surfaces collisions, and (only on your go-ahead) deploys onto a back-out branch. For a deep takeover afterward it offers /onboard (maps code, reverse-engineers ADRs, builds a stabilization plan + debt ledger).

The Typical Development Flow

Write Code /review Fix Findings Quality Gate Commit

1. Write Your Code

Build features, fix bugs, refactor. The framework doesn't change how you write code — it changes what happens after.

2. Run /review

The facilitator — the lead agent that runs every multi-agent workflow — assesses risk, assembles specialists, and produces a structured review with a verdict. A full panel runs in parallel, usually a couple of minutes.

3. Address Findings

Blocking findings must be fixed. Advisory findings are recommendations. The review report tells you which is which.

4. Quality Gate

Run python scripts/quality_gate.py or just commit — the pre-commit hook runs it automatically.

5. Commit

If the quality gate passes and the review is clean, commit your code. The entire review is saved in discussions/ — your searchable record of why each decision was made.

6. Education Gate

For complex changes, run /walkthrough and /quiz to verify you understand the AI-generated code.

/review — Multi-Agent Code Review

# Review specific files
/review src/routes.py src/models.py
# Review an entire directory
/review src/
# Review with context
/review src/auth.py # "focus on the new JWT validation"

What Happens Behind the Scenes

The facilitator creates a discussion, assesses risk level, selects 2-5 specialists, dispatches them in parallel, collects findings, synthesizes a verdict, writes a report to docs/reviews/, and seals the discussion.

Risk-Based Specialist Assembly

LOW

Config, docs, simple fixes

2-3 agents. QA + 1 domain specialist. Ensemble mode (specialists work in parallel).

MED

New features, refactoring

3-4 agents. QA + Architecture + domain. Structured Dialogue mode (specialists see each other’s findings and respond).

HIGH

Security, architecture changes

4-5 agents. Full panel including Security + Independent Perspective.

Verdicts

Approve · Approve with changes · Request changes · Reject

/plan + /build_module — Spec-Driven Development

Step 1: Plan the feature

/plan "Add user authentication with JWT tokens"
# Produces a structured spec with:
# - Task breakdown
# - File list (new + modified)
# - Architecture decisions
# - Risk assessment
# - Specialist review of the plan
# Waits for your approval before proceeding

Step 2: Build from the spec

/build_module docs/sprints/auth-spec.md
# Executes the spec task-by-task
# Mid-build checkpoint reviews fire automatically
# Tests run after each task
# Education gate activates at the end

Why Spec-First?

The spec is reviewed by specialists before any code is written. Catching an architecture mistake at the spec stage is far cheaper than unwinding it from working code.

Mid-Build Checkpoints

When a build task creates a new module, touches security code, or changes database schema, 2 specialists automatically review the code mid-build. Max 2 rounds per checkpoint.

The Build Summary

After building, you get a summary with: tasks completed, checkpoint results, unresolved concerns (if any), and a recommendation for the final /review.

/deliberate — Structured Multi-Agent Discussion

/deliberate "Should we use SQLAlchemy ORM or raw SQL
  for the new reporting module?"
/deliberate "What's the best approach for handling
  file uploads larger than 100MB?"
/deliberate "Should we split the monolith into
  microservices or keep it modular?"

When to Deliberate

Use /deliberate when you face a decision with multiple valid options and non-obvious trade-offs. The specialists bring different professional perspectives that surface considerations you might miss alone.

What You Get

1

Independent Analysis

Each specialist analyzes the question from their domain (security, performance, architecture, etc.) without seeing others' answers.

2

Cross-Pollination

Specialists see each other's findings and can refine their positions. Disagreements are surfaced, not hidden.

3

Synthesis & Recommendation

The facilitator synthesizes all perspectives into a recommendation with explicit trade-offs. You make the final call.

What the capture looks like

Every deliberation is sealed to discussions/ with a full event stream you can re-read later:

discussions/2026-06-14/
  DISC-…-reporting-orm/
    events.jsonl # every turn
    transcript.md # readable

Six months later, you can read exactly why a decision was made.

What Do I Do When…

“I have an idea but don’t know where to start”

Research first (Phase 1), then /deliberate to pressure-test, then /plan to create a spec.

“I want to think through or stress-test an idea”

Use the grill-me skill — the agent interviews you one question at a time (with a recommended answer for each) until every branch is resolved, checkpointing decisions to disk as it goes.

“I’m choosing between two approaches”

Run /deliberate with both options. The specialist panel will surface trade-offs you might miss.

“I want to see how others solved this”

/discover-projects to find reference repos, then /analyze-project to learn from their approach and evaluate which patterns could benefit your project — with full attribution.

“I wrote code and need a second opinion”

/review src/ — the specialist panel reviews from security, architecture, performance, and quality perspectives.

“I found a bug and fixed it”

Write a regression test tagged @pytest.mark.regression, add it to the regression ledger (the running list that keeps fixed bugs from coming back), then /review the fix.

“AI wrote code I don’t fully understand”

/walkthrough for a guided explanation, then /quiz to verify you actually get it before shipping.

“I have an existing project I want to improve”

/apply-framework assesses value/risk and deploys onto a back-out branch; it then offers /onboard for a deep takeover (maps code, reverse-engineers decisions, debt ledger).

“The sprint is over — what now?”

/retro analyzes all discussions, surfaces recurring patterns, and proposes process improvements. Run /batch-evaluate to audit pending adoptions.

Commands That Make You Better

/walkthrough education

Generates a guided reading path through code you (or AI) wrote. Explains decisions, trade-offs, and how components interact. Progressive disclosure from mental model to implementation details.

/walkthrough src/routes.py

/quiz assessment

Bloom's taxonomy quiz on code you're about to ship. Tests understanding at 4 levels: recall, application, analysis, and evaluation. Includes debug scenarios and change-impact questions.

/quiz src/routes.py

/analyze-project external

Points the specialist team at an external project to evaluate patterns worth adopting. Produces a scored recommendation report. Patterns scoring 20+/25 are recommended.

/analyze-project tiangolo/fastapi

/discover-projects search

Searches GitHub for interesting projects to analyze. Filters by topic, language, or keywords. Checks for AI integration artifacts and ranks candidates for /analyze-project.

/discover-projects "fastapi multi-agent"

You Improve It — With Structure

/retro sprint

Run at the end of each sprint. Queries all discussions from the period, identifies recurring patterns, evaluates adopted patterns (PENDING → CONFIRMED or REVERTED), and proposes process adjustments.

/retro

/meta-review quarterly

The big one. Quarterly assessment of framework effectiveness: agent scoring, architectural drift, rule updates, decision churn. Drives framework-level evolution.

/meta-review

/knowledge-health diagnostic

Quick health check on all 5 pipeline layers. Reports on discussion volume, SQLite index completeness, findings extraction, pattern clustering, and curated memory currency.

/knowledge-health

The Improvement Cadence

D

Daily

/review captures findings. Agent reflections after each discussion note what worked and what didn't.

S

Sprint

/retro aggregates findings, evaluates adoptions, surfaces stale advisories, and proposes adjustments.

Q

Quarterly

/meta-review assesses the framework itself. Which agents are valuable? Which rules need updating? Is there architectural drift?

/promote — Layer 3

When a pattern proves valuable across multiple reviews, promote it to curated memory — the framework’s durable Layer 3 — with /promote. Requires your explicit approval (Principle #7).

/batch-evaluate — Audit

Reviews all PENDING pattern adoptions from /analyze-project runs. Checks whether each pattern was actually implemented, verifies evidence, and presents verdicts for your approval.

The Quality Gate — Your Last Line of Defense

# Run manually
python scripts/quality_gate.py
# Auto-fix formatting and lint
python scripts/quality_gate.py --fix
# Skip specific checks
python scripts/quality_gate.py --skip-reviews
# It also runs automatically on every git commit
Quality Gate: 8/8 passed

What It Checks

Formatting (ruff format) → Linting (ruff check) → Tests (pytest) → Coverage (≥ 80%) → ADR completeness → Review existence → Regression ledger → BUILD_STATUS freshness (advisory)

Hooks That Protect You

!

Secret Detection

Scans for 12 patterns (API keys, JWT, AWS, PATs) before any file write. Blocks the write if secrets are detected.

🔒

File Locking

Prevents concurrent agent edits to the same file. Locks auto-expire after 120 seconds.

Auto-Format

Every Python file is auto-formatted with ruff after every edit. You never commit unformatted code.

Branch Protection

Blocks direct pushes to main. Create a feature branch, open a PR, then merge. Keeps your history clean.

Session Continuity

BUILD_STATUS.md is auto-saved before context compaction and auto-loaded on resume. Work-in-progress survives sessions.

Working While You’re Away From Keyboard

The agent keeps building. Gating decisions reach your phone. You tap to answer and it continues — no session needed. (Requires NTFY_TOPIC set in your .env — see setup.)

1

Agent hits a decision point

It needs your input before proceeding — an architectural choice, a risky operation, a confidence gate. Rather than blocking or guessing, it sends the question to your phone.

2

You tap to answer

Up to 3 labeled options appear as push-notification buttons (via ntfy.sh, a free phone-notification service). Tap one — no app, no session, no context switch. Just a phone tap.

3

Agent resumes

The reply is validated against an allow-list, matched to the question, and the agent continues. Free-text replies go to the developer-only channel; the agent never trusts raw reply text directly.

4

Resume after a break? Check first.

Use collab_loop check to look back at recent messages on the reply channel before starting a poll. This is the resume primitive — avoids missed answers.

Two Topics, One Model

The agent publishes to NTFY_TOPIC (MAIN). Your replies go to NTFY_TOPIC-reply (REPLY). An empty title on MAIN is the agent's signal for your free-text — titled messages are the agent's own structured output. The two channels never cross.

Four Modes

ask — push a question with tap-to-answer buttons (max 3)
poll — stream live replies under a persistent Monitor
check — one-shot lookback for missed answers (resume primitive)
say — send a status or acknowledgement with no reply expected

Security Invariants

Replies are unauthenticated. The framework validates every reply against a fixed allow-list before acting. Raw reply text is never passed to a command, path, or eval sink. The topic name is never printed in conversation output. See the collaborating-async skill and CLAUDE.md always-on invariants.

Shipping and Tracking Your Fork

/ship release

Full release workflow: quality gate verification, testing checklist, version bump, changelog generation, and rollback strategy. Everything you need to ship with confidence.

/ship

/lineage drift

Shows how your project relates to the upstream template. Detects drift, validates the manifest, and reports divergence distance. Intentional divergences can be pinned as traits.

/lineage
Drift status: current (distance: 0)
Pinned traits: 0

Template → Project Relationship

When you fork this template for a real project, the framework-lineage.yaml manifest tracks:

Fork point — When you diverged from the template
Drift status — How far you've diverged
Pinned traits — Intentional divergences you want to keep
Custodian — Who approves framework changes

The Steward Agent

The Steward is the framework's institutional memory for genealogy. It knows where your project came from, how far it has diverged, and which divergences are intentional vs. accidental drift.

Command Cheat Sheet

All /commands below are typed inside Claude Code; the python lines run in your terminal.

Every Day

/review src/            # Code review
/walkthrough src/file.py  # Explain code
/quiz src/file.py      # Test understanding

Building Features

/plan "description"      # Create spec
/build_module spec.md   # Build from spec
/deliberate "question"  # Discuss trade-offs

Quality (terminal)

python scripts/quality_gate.py # Run checks
python scripts/quality_gate.py --fix # Auto-fix

Process Improvement

/retro                  # Sprint retro
/meta-review          # Quarterly review
/knowledge-health     # Pipeline check
/promote               # Promote to Layer 3

External Learning

/discover-projects "topic" # Find repos
/analyze-project owner/repo  # Analyze
/batch-evaluate          # Audit adoptions

Release & Lineage

/ship                    # Release workflow
/lineage                # Drift status
/apply-framework     # Adopt on existing repo

Skill Cheat Sheet

Skills are on-demand playbooks — the agent loads one automatically when the situation matches, so you rarely type them. A few (like grill-me) you can also invoke by name. There are 21 in .claude/skills/, plus Claude Code built-ins like deep-research; here are the ones worth knowing.

Think & Research

grill-me — relentless one-question-at-a-time interview to stress-test an idea or design.
deep-research (built-in) — fan-out, multi-source, fact-checked research report.
searching-prior-art — grep for existing solutions / known-broken approaches before you build.

Ship Safely

committing-changes — the full commit protocol (quality gate → review → education → commit).
selecting-review-gates — how risk tier picks the specialist panel and thresholds.
handling-micro-fixes — when a tiny change can skip /plan and /build_module.

When It Breaks / Long Sessions

recovering-from-failures — named recovery steps for the 8 failure classes (blocked hook, blocked commit, lost session state…).
wrapping-up-sessions — checkpoint a long session and write a paste-ready handoff before quality degrades.

Work While You’re Away From Keyboard

notifying-the-developer — push a question to your phone and read the reply safely.
collaborating-async — the two-way loop that lets the agent keep working while you’re out (see Slide 13).

Keep Docs & Decisions Honest

documenting-decisions · adr-writing — what to document where, and how to write an ADR.
syncing-framework-docs — keep specs and these decks in sync when the framework changes.

Not shown: domain playbooks (python-project-patterns, testing-playbook, security-checklist, performance-playbook), dispatch internals (cross-agent-dispatch, multi-instance-dispatch, orchestrating-lean-dispatch), running-build-checkpoints, and feature-status-registry.

Getting the Most Out of It

The most common mistake is treating this like vanilla Claude Code — typing “build X” and accepting the first answer. Then you pay for a framework you’re not using. The commands are the framework.

Skipping the commands

If you don’t invoke them, none of the capture, independent review, or education value fires. Use /plan/build_module/review.

Committing code you didn’t /review

Never. The agent that wrote the code can’t objectively judge it — independent review is the whole point (Principle #4).

Treating the code as the deliverable

The durable asset is the reasoning in discussions/. In three months, the why is what saves you — not the diff.

Convening a panel for a typo

Match ceremony to risk. Micro-fixes skip /plan and /build_module. Don’t burn a 5-specialist /deliberate on a label change.

Reflexively overriding “are you sure?”

When Claude asks a clarifying question before building, that’s the ~95% confidence gate (Principle #9) saving you wrong-path tokens. Answer it — don’t auto-dismiss it.

Merging code you can’t explain

The education gate (/walkthrough + /quiz) keeps you the owner. If you can’t explain it back, don’t ship it.

Hand-hacking agents & rules

To change how the framework behaves, use the evolution path (Steward gate → your approval → /review) — not ad-hoc edits that drift silently.

Letting CLAUDE.md bloat

You set up CLAUDE.md during install — keep it lean (<~200 lines). Detail belongs in path-scoped rules and on-demand skills, not always-loaded context.

Start Building

The framework adds structure to AI-assisted development
without slowing you down.

# Clone, isolate, install, init
cd agent_framework_template
python -m venv .venv # + activate
pip install -r requirements.txt
python scripts/init_db.py
# Launch Claude Code + sign in, then
claude
# Your first command (works with no code yet):
/deliberate "What should I build
  first, and how should I structure src/?"
# Once you've written code:
/review src/

AI-Native Agentic Development Framework v3.5 · Diviner Dojo
diviner-dojo@gmail.com