Beta calibration
This page is the calibration sheet for the current Beta release. Read it before kicking off serious work on a project. Knowing the honest shape of what the engine can and cannot do is the difference between a smooth run and a frustrated one.
The foundation is the producer-collapse / skills-first model: there are no fixed role identities — an agent is a producer-with-skills, and tasks are skill-routed to whichever producer holds the matching skill. Three structural seats remain (the Leader plans + decides, producers do the work, Quality Control (QC) verifies). On top of that the engine ships QC-as-fixer (when a producer can’t clear the bar after exhausting retries, QC patches the artifact from its own findings and the task completes — on by default), familial assemblers that join a multi-piece deliverable by the artifact’s kind, deliverable fidelity that verifies the assembled whole against the brief, Job Templates for recurring setup, and provider thinking on/off control. It ships with a known-limitations list rather than discovering them mid-run.
What the Beta does well
Section titled “What the Beta does well”- Single-deliverable artifact production. One cohesive output that ships in one or two producer passes — a contained essay, a small Python module, a focused report section, a tutorial chapter. This is the engine’s sweet spot; every layer is sized for it.
- Multi-piece deliverables, assembled by the engine. A set of related outputs
(chapters, modules, records, clips) is joined by a family of assemblers chosen
by the artifact’s kind —
document/code/data/media— and the engine verifies the whole product against the brief (the deliverable-fidelity check: declared per-part floor, engine-generated framing, consistent numbering). Wide independent fan-outs run as a parallel wave. - Python repos, end to end. The
repo_mapextracts a symbol-aware digest (classes, methods, signatures, module docstrings) for any Python project; the team sees the existing shape and edits it coherently. - Code that the team can run + verify. Producers can execute
run_shelland QC verifies code by running it (e.g.py_compile/ a script smoke), not just reading it — so a code deliverable is checked against execution, not eyeballed. - Recurring work, codified. A Job Template captures the interview + parameter schema + output contract for a class of job, runs headless on a schedule, and the team offers to template a job you keep repeating. Recurring failures get codified back into the skill library so cheap producers improve over time.
- Plan-mode conversational planning. The Leader interrogates open-ended requests, asks about deliverable shape when the size is ambiguous, and produces a reviewable plan you authorize before any work starts.
What the Beta does NOT do (yet)
Section titled “What the Beta does NOT do (yet)”Be honest with yourself about whether your project hits any of these.
Fully-autonomous production-scale, multi-phase efforts
Section titled “Fully-autonomous production-scale, multi-phase efforts”A 200-page novel, a full application across many feature releases, or a multi-month research program with dozens of deliverables is still phased work: the engine is happy to produce a reviewable Phase 1 with you gating between phases. Job Templates + multi-file output close part of the old gap (recurring setup persists; a deliverable that exceeds per-artifact ceilings decomposes at logical boundaries), but there is no fully-automatic cross-phase planner that runs the whole multi-month program unattended. If a request smells production-scale, the Leader’s clarifying question surfaces it; answer “production-scale” and you’ll get a Phase 1 you can ship and review. See the Roadmap.
Multi-language repos with full symbol awareness
Section titled “Multi-language repos with full symbol awareness”repo_map’s symbol extraction is richest for Python. For other languages the team
leans on a filename listing + team_canvas digests + your standards file rather than
automatic symbol/signature extraction — JS / TS / Rust / Go projects work, but the team
is blinder than in a Python repo. modulatio doctor surfaces this calibration.
Deep iterate-until-green autonomy
Section titled “Deep iterate-until-green autonomy”Producers can run code and QC verifies by executing it, and a task that can’t clear the bar routes through the self-heal ladder (a last-resort QC patch → re-decompose on overflow → automatic escalation to a stronger model). What’s still maturing is the tight iterate-on-failure loop — run the failing test, read the trace, patch, re-run, repeat autonomously until green — across many rounds on a large codebase. For deep debugging marathons, expect to stay in the loop between rounds.
Persona / identity continuity across long-running deployments
Section titled “Persona / identity continuity across long-running deployments”Each kickoff and plan-resumption is treated as a fresh team that reads its seed-skill prompts + the team-state doc; there is no enforced “who are we?” anchor that survives context-window resets, long idle gaps, or Layer 2 compression. In a long-lived deployment with a recurring crew identity (a named editorial voice, a sustained mascot, a domain role), expect some identity drift across sessions unless the convention is re-stated in the kickoff. A future release closes this gap as a sibling to Job Templates — see the Roadmap.
Reporting bugs and feedback
Section titled “Reporting bugs and feedback”When something doesn’t behave the way this page describes, that’s a bug. We want to know.
- Crash-class bugs: the engine writes a redacted log and points you at the issue
template. Open the bug at github.com/ModulatioAI/modulatio/issues
using the Bug report template — it asks for the redacted log, your
modulatio doctoroutput, and the plan ID. - Regression-class issues: use the Regression template (we treat regressions as higher-priority than equivalent-severity new bugs).
- Feature requests: use the Feature request template, or Discussions for open-ended threads.
Quick reference — sizing rules of thumb
Section titled “Quick reference — sizing rules of thumb”| Deliverable shape | Verdict |
|---|---|
| Single deliverable (1-2 passes) | Sweet spot |
| Multi-piece deliverable (assembled + fidelity-checked) | Comfortable |
| Production-scale, Phase 1 only (you gate between phases) | Supported |
| Production-scale, fully autonomous across all phases | Not yet — produce + review phase by phase |
When in doubt, smaller. Phase plans grow when execution reveals real work; over-decomposed plans lock the team into churn before it discovers anything.