v2.1

AI-Native Agentic
Development Framework

Structured Multi-Agent Collaboration for
AI-Assisted Software Engineering

"Reasoning is the primary artifact. Code is output."

Navigate: ← β†’ arrow keys

The Problem with AI-Assisted Development Today

πŸ•³οΈ

No Decision Trail

AI makes hundreds of choices β€” architecture, patterns, trade-offs β€” but the reasoning disappears. No audit trail. No traceability. No way to understand why six months later.

πŸͺž

Confirmation Bias

The same AI that writes the code evaluates it. It grades its own homework. No independent review means no safety net for blind spots, hallucinations, or compounding errors.

🎲

Vibe Coding

Unstructured, ad-hoc AI interactions produce wildly inconsistent quality. No repeatable process. No standards enforcement. Each session is a gamble.

πŸ’¨

Knowledge Evaporates

Insights from AI sessions aren't captured, curated, or reused. Every session starts from zero. Patterns discovered once are lost and rediscovered again and again.

There has to be a better way.

8 Non-Negotiable Principles

Click any principle to expand.

1 Reasoning is the primary artifact. β–Ό
Code is output. Deliberation, trade-offs, and decision lineage are the durable assets. Every significant decision must be traceable to the discussion that produced it. Why: Six months from now, the reasoning is more valuable than the code.
2 Capture must be automatic. β–Ό
If logging depends on model compliance, it will fail. Structured commands guarantee event-level recording. The AI cannot opt out. Why: Voluntary compliance doesn't work for audit trails.
3 Collaboration precedes adversarial rigor. β–Ό
Multi-perspective analysis is the default. Adversarial modes are scoped exclusively to: security review, fault injection, and anti-groupthink checks. Why: Productive tension beats manufactured opposition.
4 Independence prevents confirmation loops. β–Ό
The agent that generates code must not be the sole evaluator. At minimum, one specialist who did not participate in generation must perform independent review. Why: Self-review finds what it expects to find.
5 ADRs are never deleted. β–Ό
Architecture Decision Records are only superseded, with references to the replacing decision. This creates an immutable decision history. Why: Understanding past decisions prevents repeating mistakes.
6 Education gates before merge. β–Ό
Walkthrough β†’ Quiz β†’ Explain-back β†’ Merge. Proportional to complexity and risk. Why: Developers must understand AI-generated code β€” not just accept it.
7 Layer 3 promotion requires human approval. β–Ό
No discussion insight is promoted to curated memory automatically. Requires 2+ independent confirmations plus explicit human sign-off. Why: Institutional knowledge must be deliberately curated, not auto-accumulated.
8 Least-complex intervention first. β–Ό
Prefer prompt changes β†’ command changes β†’ agent changes β†’ architecture changes. Lower-complexity interventions are cheaper, more reversible, and faster to validate. Why: Don't restructure when a prompt tweak will do.

Four-Layer Capture Architecture

How AI reasoning gets captured. Click a layer to expand.

1

Immutable Discussion Capture

Raw event streams sealed after every reasoning session

β–Ό
Every canonical AI reasoning session produces a sealed, immutable record in the discussions/ directory. Each discussion contains events.jsonl (machine-readable event stream) and transcript.md (human-readable rendering). Events track: agent identity, intent type (proposal, critique, question, evidence, synthesis, decision, reflection), confidence scores, and risk flags. After closure, these files are locked β€” corrections require new discussions that reference the original.
2

Relational Index

SQLite database for querying and metrics across all discussions

β–Ό
All events are ingested into a SQLite database with 5 tables and 10 indexes. This enables cross-discussion analytics: agent contribution scoring, false positive rate tracking, time-to-consensus measurement, reopened decision analysis, decision churn metrics, and drift detection. The relational layer makes raw reasoning data queryable β€” essential for the meso and macro learning loops.
3

Curated Memory

Human-approved patterns, decisions, and rules promoted from Layers 1–2

β–Ό
The memory/ directory holds promoted knowledge: decision summaries, code patterns, agent reflections, lessons learned, and graduated rules. Promotion requires 2+ independent confirmations plus explicit human approval. Every promoted artifact has a 90-day forgetting curve β€” it must be reconfirmed or it gets archived. This prevents knowledge rot and keeps the curated layer deliberately lean.
4

Optional Vector Acceleration

Semantic retrieval when the corpus outgrows keyword search

β–Ό
The vector layer activates only when the discussion corpus grows large enough that keyword and full-text search become insufficient. It never replaces the relational structure β€” it accelerates retrieval only. This layer is intentionally deferred; most projects won't need it until they reach significant scale.

From raw events β†’ queryable metrics β†’ curated knowledge. Nothing is lost.

11 Specialist Agents, One Orchestrator

● Opus β€” Complex reasoning ● Sonnet β€” Analysis & review ● Haiku β€” Lightweight tasks
πŸ‘‘
Facilitator
opus
Orchestrates all multi-agent workflows
Risk assessment, specialist assembly, cross-pollination rounds, synthesis of findings, capture enforcement. The facilitator never renders specialist verdicts β€” it coordinates but does not evaluate.
πŸ›οΈ
Architecture Consultant
opus
Structural integrity & ADR validation
Boundary enforcement, pattern consistency, dependency direction validation. Activated for architecture changes, new modules, and significant refactoring. Guards against unnecessary abstraction.
πŸ›‘οΈ
Security Specialist
sonnet
OWASP Top-10 & trust boundaries
Red-team thinking, auth/authz review, input validation, secret management. Activated for auth code, API security, data handling, and external integrations. Avoids security theater.
πŸ§ͺ
QA Specialist
sonnet
Test adequacy & edge cases
Coverage analysis, error handling review, boundary conditions. Activated for every code review. Focuses on meaningful assertions, not just coverage numbers. Prevents tests that assert nothing.
⚑
Performance Analyst
sonnet
Complexity, hot paths & scalability
Algorithmic complexity analysis, database query efficiency, scalability assessment. Activated for data processing, DB operations, and API endpoints. Avoids premature optimization.
πŸ“š
Docs / Knowledge
sonnet
Documentation completeness & ADR quality
Self-healing docs: detects when documentation drifts from implementation. Validates ADR completeness, CLAUDE.md currency, and docstring coverage. Active in every review at light intensity.
πŸ”
Independent Perspective
sonnet
Anti-groupthink & hidden assumptions
Pre-mortem analysis, alternative exploration, challenge of consensus positions. Activated for medium/high risk changes. Provides the dissenting voice that prevents echo-chamber failures.
πŸ”¬
Project Analyst
sonnet
External project scouting & pattern evaluation
Two-phase workflow: Survey phase scouts the target project, then Orchestrate phase dispatches domain specialists. The only subagent that can delegate to others β€” a special orchestrator for external analysis.
🎨
UX Evaluator
sonnet
Interaction flow & accessibility
Evaluates UI code for UX friction, interaction flow, state feedback, platform conventions, and accessibility. Activated for any user-facing changes.
πŸ›οΈ
Steward
sonnet
Framework lineage & drift tracking
Tracks how derived projects relate to the canonical template. Detects drift, validates manifest integrity, and manages the project's upstream relationship. The framework's institutional memory for genealogy.
πŸŽ“
Educator
haiku
Walkthroughs, quizzes & Bloom's assessment
Generates Bloom's taxonomy assessments, guided walkthroughs, and comprehension quizzes. Tracks mastery tiers and adapts difficulty. Scaffolding intensity fades as demonstrated competence grows.

Every agent has a defined lane, explicit triggers, and anti-patterns it must avoid.

5 Modes of Multi-Agent Collaboration

1

Ensemble

Independent contributions, no inter-agent exchange

Low risk
2

Yes, And

Collaborative building β€” each agent adds to previous

Additive
3

Structured Dialogue

Coopetitive multi-round discussion

Default
4

Dialectic Synthesis

Thesis-antithesis-synthesis with ACH matrix

High stakes
5

Adversarial

Red team β€” security, fault injection, anti-groupthink only

Scoped

Exploration Intensity (orthogonal axis)

IntensityDescription
LowPrimary analysis with brief notes on alternatives
Medium2–3 alternatives with trade-off analysis
HighThorough exploration of edge cases & failure modes

Key Concept: Coopetition

Agents share goals but have different professional priorities β€” a security specialist and a performance analyst will naturally surface different concerns. This creates productive tension without manufactured opposition.

16 Commands That Drive the Workflow

Core Workflow
/review Multi-agent code review /deliberate Structured discussion /build_module Spec-driven construction /plan Feature planning
Analysis & Learning
/analyze-project External patterns /discover-projects GitHub search /retro Sprint retrospective /meta-review Quarterly evolution
Knowledge & Education
/promote Promote to Layer 3 /onboard Project takeover /quiz Bloom's assessment /walkthrough Code explanation /knowledge-health Pipeline health check /batch-evaluate Audit adoptions
Release & Lineage
/ship Release workflow /lineage Drift & manifest

The /review Pipeline

A 10-step automated workflow β€” from risk assessment to sealed report:

Risk Assessment β†’ Discussion Creation β†’ Specialist Assembly β†’ Independent Analysis β†’ Cross-Pollination β†’ Synthesis β†’ Verdict β†’ Report β†’ Sealed Record

Every command auto-captures reasoning via the capture pipeline. The model cannot opt out of logging β€” it's enforced at the tooling layer.

7 Hooks β€” Automated Guardrails

Safety is enforced at the tooling layer, not by asking the AI to behave.

Before Operations

Pre-Write

File Locking

Atomic locks prevent concurrent agent edits, auto-expire after 120 seconds

Pre-Write

Secret Detection

Scans for 12 secret patterns: API keys, AWS keys, JWT, PATs, private keys, and more

Pre-Write

Protected Files

Blocks edits to .env, .git/, evaluation.db, and critical config files

Pre-Commit

Quality Gate

Formatting, linting, tests, and coverage must all pass before any commit

Pre-Push

Main Branch Protection

Blocks direct pushes to main/master with remediation instructions

After Operations

Post-Write

Auto-Format

Runs ruff format + ruff check --fix on every Python file after every edit

Post-Write

Lock Release

Releases file locks after write/edit completes β€” cleanup is automatic

Session Lifecycle

PreCompact

State Save

Saves in‑flight task state to BUILD_STATUS.md before context compaction

SessionStart

Context Restore

Reads BUILD_STATUS.md on session resume to restore working context

Three Nested Learning Loops

Macro
Quarterly evolution
Meso
Sprint retros
Micro
Per-discussion

Micro Loop β€” Per-Discussion

After each discussion, agents write structured reflections: what they missed, what they'd improve, confidence calibration. Reflections are stored in SQLite and feed candidate improvement rules.

Meso Loop β€” Sprint Retrospective

The /retro command queries SQLite for: reopened decisions, override frequency, frequent issue tags, time-to-resolution stats, and adoption pattern evaluation (PENDING β†’ CONFIRMED or REVERTED).

Macro Loop β€” Quarterly Evolution

The /meta-review command produces: agent effectiveness scoring, drift analysis, rule update candidates, and decision churn index. Drives framework-level evolution.

Double-Loop Learning

Single-loop: tune thresholds within existing rules. Double-loop: change what counts as "good" based on accumulated evidence. The framework doesn't just follow rules β€” it evolves them.

Education Gate: Developers Must Understand AI Code

1

Walkthrough

AI explains the code step by step β€” what it does, why decisions were made, how components interact

2

Quiz

Bloom's taxonomy assessment β€” from recall to analysis to evaluation. Includes debug scenarios and change-impact questions.

3

Explain-Back

Developer explains the code in their own words β€” proving comprehension, not just recognition

βœ“

Merge

Only after completing all three steps. Proportional to complexity and risk.

Bloom's Taxonomy Levels

Level 1 Remember β€” Recall facts and definitions
Level 2 Understand β€” Explain concepts in own words
Level 3 Apply β€” Use knowledge in new situations
Level 4 Analyze β€” Break down and debug components
Level 5 Evaluate β€” Assess change impact and trade-offs
Level 6 Create β€” Produce original explanations

70% pass threshold. At least 1 debug scenario + 1 change-impact question per quiz. Scaffolding fades as competence grows.

AI writes the code, but the human must own it.

Learning From the Ecosystem

/discover-projects β†’ /analyze-project β†’ 5-Dimension Scoring β†’ Adopt / Defer / Reject β†’ Adoption Audit

Scoring Rubric (max 25 points)

Prevalence

How common?

Elegance

How clean?

Evidence

Proven results?

Fit

Compatible?

Maintenance

Sustainable?

Rule of Three

Patterns seen in 3+ independent projects get priority consideration. Validates that a pattern isn't a one-off novelty but a genuinely useful practice.

Threshold: only patterns scoring β‰₯ 20/25 are recommended for adoption. Every adoption and rejection is documented with reasoning β€” decision lineage is preserved per Principle #1.

Framework's Own Evolution

8
Projects Analyzed
77
Patterns Evaluated
42
Adopted
20
Deferred
20
Rejected

4 patterns achieved Rule of Three status β€” validated across 3+ independent projects. The framework practices what it preaches: its own evolution follows the structured analysis pipeline.

What a Project Looks Like

Click folders to explore. The framework lives alongside your code.

β–Έ .claude/ β€” Framework configuration
β–Έ agents/ β€” 11 specialist definitions
facilitator.md
architecture-consultant.md
security-specialist.md
qa-specialist.md
... (7 more)
β–Έ commands/ β€” 16 slash commands
review.md
deliberate.md
build_module.md
analyze-project.md
... (12 more)
custodian/ β€” Lineage tracking
hooks/ β€” 7 lifecycle hooks
rules/ β€” 7 auto-loaded standards
β–Έ discussions/ β€” Layer 1: Sealed reasoning
DISC-YYYYMMDD-slug/
events.jsonl
transcript.md
β–Έ memory/ β€” Layer 3: Curated knowledge
decisions/
patterns/
reflections/
lessons/
rules/
archive/
metrics/ β€” Layer 2: SQLite index
β–Έ docs/ β€” Documentation
adr/ β€” Architecture Decision Records
reviews/ β€” Review reports
templates/ β€” Artifact templates
scripts/ β€” Capture pipeline utilities
src/ β€” Your application code
tests/ β€” Test suite
CLAUDE.md β€” Project constitution

Framework β‰  Separate Tool

The framework doesn't run as an external service. It lives inside your project's directory structure β€” agent definitions, commands, hooks, and rules are all version-controlled files alongside your source code.

CLAUDE.md β€” The Project Constitution

A single file that codifies all project conventions, principles, boundaries, and ID formats. Every agent reads it. It's the source of truth for how the framework operates in this project.

Everything is a Markdown File

Agent definitions, commands, rules, ADRs, reviews β€” all Markdown with YAML frontmatter. Human-readable, version-controllable, and diff-friendly. No proprietary formats.

Runs Inside VS Code + Claude Code

The framework is designed for Claude Code inside VS Code. Slash commands integrate directly into the Claude Code interface. Hooks fire automatically via Claude Code's hook system.

Automated Quality Enforcement

Every commit must pass the quality gate. No exceptions. No --no-verify.

βœ“Formatting β€” ruff format
βœ“Linting β€” ruff check
βœ“Tests β€” pytest (full suite, deterministic)
βœ“Coverage β€” β‰₯ 80% for new/modified code
βœ“ADR Completeness β€” all decisions documented
βœ“Review Existence β€” code changes require /review

Git Pre-Commit Hook

The quality gate runs automatically as a git pre-commit hook. If any check fails, the commit is blocked. This is non-negotiable β€” the hook cannot be bypassed without explicit developer override and documented reason.

Trend Analysis

Every quality gate run appends a JSONL record to a log file. This data feeds into sprint retrospectives and framework meta-reviews β€” enabling trend analysis of code quality over time.

Two-Gate Commit Protocol

Gate 1: Quality gate (automated). Gate 2: Multi-agent code review via /review (agent-assisted). Both must pass before code is committed.

Auto-Fix Available

Run the quality gate with --fix to automatically remediate formatting and lint issues. Tests and coverage still require manual attention.

From Vibe Coding to Engineering Discipline

βœ— AI decisions disappear β†’ βœ“ Every decision captured with full lineage
βœ— Same AI writes and reviews β†’ βœ“ Independent multi-agent evaluation
βœ— Unstructured ad-hoc sessions β†’ βœ“ 16 structured commands with auto-capture
βœ— Knowledge lost between sessions β†’ βœ“ Four-layer capture with curated memory
βœ— Developers blindly accept AI code β†’ βœ“ Education gates ensure understanding
βœ— No quality enforcement β†’ βœ“ Automated quality gates on every commit
βœ— Static unchanging process β†’ βœ“ Three nested loops drive continuous improvement

"Reasoning is the primary artifact. Code is output."

Get Started

# Install dependencies
pip install -r requirements.txt
# Initialize the metrics database
python scripts/init_db.py
# Try the framework commands
/review src/ # Multi-agent code review
/deliberate "topic" # Structured discussion
/walkthrough src/ # Guided code walkthrough
/quiz src/ # Comprehension assessment

AI-Native Agentic Development Framework v2.1 Β· Diviner Dojo
diviner-dojo@gmail.com