Structured Multi-Agent Collaboration for
AI-Assisted Software Engineering
"Reasoning is the primary artifact. Code is output."
The Challenge
AI makes hundreds of choices β architecture, patterns, trade-offs β but the reasoning disappears. No audit trail. No traceability. No way to understand why six months later.
The same AI that writes the code evaluates it. It grades its own homework. No independent review means no safety net for blind spots, hallucinations, or compounding errors.
Unstructured, ad-hoc AI interactions produce wildly inconsistent quality. No repeatable process. No standards enforcement. Each session is a gamble.
Insights from AI sessions aren't captured, curated, or reused. Every session starts from zero. Patterns discovered once are lost and rediscovered again and again.
There has to be a better way.
Foundation
Click any principle to expand.
Core Architecture
How AI reasoning gets captured. Click a layer to expand.
Raw event streams sealed after every reasoning session
discussions/ directory. Each discussion contains events.jsonl (machine-readable event stream) and transcript.md (human-readable rendering). Events track: agent identity, intent type (proposal, critique, question, evidence, synthesis, decision, reflection), confidence scores, and risk flags. After closure, these files are locked β corrections require new discussions that reference the original.
SQLite database for querying and metrics across all discussions
Human-approved patterns, decisions, and rules promoted from Layers 1β2
memory/ directory holds promoted knowledge: decision summaries, code patterns, agent reflections, lessons learned, and graduated rules. Promotion requires 2+ independent confirmations plus explicit human approval. Every promoted artifact has a 90-day forgetting curve β it must be reconfirmed or it gets archived. This prevents knowledge rot and keeps the curated layer deliberately lean.
Semantic retrieval when the corpus outgrows keyword search
From raw events β queryable metrics β curated knowledge. Nothing is lost.
The Team
Every agent has a defined lane, explicit triggers, and anti-patterns it must avoid.
Collaboration
Independent contributions, no inter-agent exchange
Collaborative building β each agent adds to previous
Coopetitive multi-round discussion
Thesis-antithesis-synthesis with ACH matrix
Red team β security, fault injection, anti-groupthink only
| Intensity | Description |
|---|---|
| Low | Primary analysis with brief notes on alternatives |
| Medium | 2β3 alternatives with trade-off analysis |
| High | Thorough exploration of edge cases & failure modes |
Agents share goals but have different professional priorities β a security specialist and a performance analyst will naturally surface different concerns. This creates productive tension without manufactured opposition.
Workflow
/review PipelineA 10-step automated workflow β from risk assessment to sealed report:
Every command auto-captures reasoning via the capture pipeline. The model cannot opt out of logging β it's enforced at the tooling layer.
Safety
Safety is enforced at the tooling layer, not by asking the AI to behave.
Atomic locks prevent concurrent agent edits, auto-expire after 120 seconds
Scans for 12 secret patterns: API keys, AWS keys, JWT, PATs, private keys, and more
Blocks edits to .env, .git/, evaluation.db, and critical config files
Formatting, linting, tests, and coverage must all pass before any commit
Blocks direct pushes to main/master with remediation instructions
Runs ruff format + ruff check --fix on every Python file after every edit
Releases file locks after write/edit completes β cleanup is automatic
Saves inβflight task state to BUILD_STATUS.md before context compaction
Reads BUILD_STATUS.md on session resume to restore working context
Self-Improvement
After each discussion, agents write structured reflections: what they missed, what they'd improve, confidence calibration. Reflections are stored in SQLite and feed candidate improvement rules.
The /retro command queries SQLite for: reopened decisions, override frequency, frequent issue tags, time-to-resolution stats, and adoption pattern evaluation (PENDING β CONFIRMED or REVERTED).
The /meta-review command produces: agent effectiveness scoring, drift analysis, rule update candidates, and decision churn index. Drives framework-level evolution.
Single-loop: tune thresholds within existing rules. Double-loop: change what counts as "good" based on accumulated evidence. The framework doesn't just follow rules β it evolves them.
Human Ownership
AI explains the code step by step β what it does, why decisions were made, how components interact
Bloom's taxonomy assessment β from recall to analysis to evaluation. Includes debug scenarios and change-impact questions.
Developer explains the code in their own words β proving comprehension, not just recognition
Only after completing all three steps. Proportional to complexity and risk.
70% pass threshold. At least 1 debug scenario + 1 change-impact question per quiz. Scaffolding fades as competence grows.
AI writes the code, but the human must own it.
Evolution
How common?
How clean?
Proven results?
Compatible?
Sustainable?
Patterns seen in 3+ independent projects get priority consideration. Validates that a pattern isn't a one-off novelty but a genuinely useful practice.
Threshold: only patterns scoring β₯ 20/25 are recommended for adoption. Every adoption and rejection is documented with reasoning β decision lineage is preserved per Principle #1.
4 patterns achieved Rule of Three status β validated across 3+ independent projects. The framework practices what it preaches: its own evolution follows the structured analysis pipeline.
Structure
Click folders to explore. The framework lives alongside your code.
The framework doesn't run as an external service. It lives inside your project's directory structure β agent definitions, commands, hooks, and rules are all version-controlled files alongside your source code.
A single file that codifies all project conventions, principles, boundaries, and ID formats. Every agent reads it. It's the source of truth for how the framework operates in this project.
Agent definitions, commands, rules, ADRs, reviews β all Markdown with YAML frontmatter. Human-readable, version-controllable, and diff-friendly. No proprietary formats.
The framework is designed for Claude Code inside VS Code. Slash commands integrate directly into the Claude Code interface. Hooks fire automatically via Claude Code's hook system.
Quality
Every commit must pass the quality gate. No exceptions. No --no-verify.
The quality gate runs automatically as a git pre-commit hook. If any check fails, the commit is blocked. This is non-negotiable β the hook cannot be bypassed without explicit developer override and documented reason.
Every quality gate run appends a JSONL record to a log file. This data feeds into sprint retrospectives and framework meta-reviews β enabling trend analysis of code quality over time.
Gate 1: Quality gate (automated). Gate 2: Multi-agent code review via /review (agent-assisted). Both must pass before code is committed.
Run the quality gate with --fix to automatically remediate formatting and lint issues. Tests and coverage still require manual attention.
Summary
"Reasoning is the primary artifact. Code is output."
AI-Native Agentic Development Framework v2.1 Β· Diviner Dojo
diviner-dojo@gmail.com