Beta calibration

This page is the calibration sheet for the current Beta release. Read it before kicking off serious work on a project. Knowing the honest shape of what the engine can and cannot do is the difference between a smooth run and a frustrated one.

The foundation is the producer-collapse / skills-first model: there are no fixed role identities — an agent is a producer-with-skills, and tasks are skill-routed to whichever producer holds the matching skill. Three structural seats remain (the Leader plans + decides, producers do the work, Quality Control (QC) verifies). On top of that the engine ships QC-as-fixer (when a producer can’t clear the bar after exhausting retries, QC patches the artifact from its own findings and the task completes — on by default), familial assemblers that join a multi-piece deliverable by the artifact’s kind, deliverable fidelity that verifies the assembled whole against the brief, Job Templates for recurring setup, and provider thinking on/off control. It ships with a known-limitations list rather than discovering them mid-run.

What the Beta does well

Single-deliverable artifact production. One cohesive output that ships in one or two producer passes — a contained essay, a small Python module, a focused report section, a tutorial chapter. This is the engine’s sweet spot; every layer is sized for it.
Multi-piece deliverables, assembled by the engine. A set of related outputs (chapters, modules, records, clips) is joined by a family of assemblers chosen by the artifact’s kind — document / code / data / media — and the engine verifies the whole product against the brief (the deliverable-fidelity check: declared per-part floor, engine-generated framing, consistent numbering). Wide independent fan-outs run as a parallel wave.
Python repos, end to end. The repo_map extracts a symbol-aware digest (classes, methods, signatures, module docstrings) for any Python project; the team sees the existing shape and edits it coherently.
Code that the team can run + verify. Producers can execute run_shell and QC verifies code by running it (e.g. py_compile / a script smoke), not just reading it — so a code deliverable is checked against execution, not eyeballed.
Recurring work, codified. A Job Template captures the interview + parameter schema + output contract for a class of job, runs headless on a schedule, and the team offers to template a job you keep repeating. Recurring failures get codified back into the skill library so cheap producers improve over time.
Plan-mode conversational planning. The Leader interrogates open-ended requests, asks about deliverable shape when the size is ambiguous, and produces a reviewable plan you authorize before any work starts.

What the Beta does NOT do (yet)

Be honest with yourself about whether your project hits any of these.

Fully-autonomous production-scale, multi-phase efforts

A 200-page novel, a full application across many feature releases, or a multi-month research program with dozens of deliverables is still phased work: the engine is happy to produce a reviewable Phase 1 with you gating between phases. Job Templates + multi-file output close part of the old gap (recurring setup persists; a deliverable that exceeds per-artifact ceilings decomposes at logical boundaries), but there is no fully-automatic cross-phase planner that runs the whole multi-month program unattended. If a request smells production-scale, the Leader’s clarifying question surfaces it; answer “production-scale” and you’ll get a Phase 1 you can ship and review. See the Roadmap.

Multi-language repos with full symbol awareness

repo_map’s symbol extraction is richest for Python. For other languages the team leans on a filename listing + team_canvas digests + your standards file rather than automatic symbol/signature extraction — JS / TS / Rust / Go projects work, but the team is blinder than in a Python repo. modulatio doctor surfaces this calibration.

Deep iterate-until-green autonomy

Producers can run code and QC verifies by executing it, and a task that can’t clear the bar routes through the self-heal ladder (a last-resort QC patch → re-decompose on overflow → automatic escalation to a stronger model). What’s still maturing is the tight iterate-on-failure loop — run the failing test, read the trace, patch, re-run, repeat autonomously until green — across many rounds on a large codebase. For deep debugging marathons, expect to stay in the loop between rounds.

Persona / identity continuity across long-running deployments

Each kickoff and plan-resumption is treated as a fresh team that reads its seed-skill prompts + the team-state doc; there is no enforced “who are we?” anchor that survives context-window resets, long idle gaps, or Layer 2 compression. In a long-lived deployment with a recurring crew identity (a named editorial voice, a sustained mascot, a domain role), expect some identity drift across sessions unless the convention is re-stated in the kickoff. A future release closes this gap as a sibling to Job Templates — see the Roadmap.

Reporting bugs and feedback

When something doesn’t behave the way this page describes, that’s a bug. We want to know.

Crash-class bugs: the engine writes a redacted log and points you at the issue template. Open the bug at github.com/ModulatioAI/modulatio/issues using the Bug report template — it asks for the redacted log, your modulatio doctor output, and the plan ID.
Regression-class issues: use the Regression template (we treat regressions as higher-priority than equivalent-severity new bugs).
Feature requests: use the Feature request template, or Discussions for open-ended threads.

Quick reference — sizing rules of thumb

Deliverable shape	Verdict
Single deliverable (1-2 passes)	Sweet spot
Multi-piece deliverable (assembled + fidelity-checked)	Comfortable
Production-scale, Phase 1 only (you gate between phases)	Supported
Production-scale, fully autonomous across all phases	Not yet — produce + review phase by phase

When in doubt, smaller. Phase plans grow when execution reveals real work; over-decomposed plans lock the team into churn before it discovers anything.