Roadmap

v0.9.6 is the current Beta release — a reliability release: the team always finishes the job (a per-task budget plus a quality backstop that recovers a run instead of dropping it), a leaner team, lifecycle tooling, and fail-closed confinement for subscription seats. This page consolidates what’s shipping next (the 1.0 line) and the long-horizon pillars under design. Sections are intentionally light on dates — Modulatio ships when it’s ready, not on a calendar — and heavy on what each direction is for.

If you want a one-page summary of the engine’s current ceilings (what works, what doesn’t), see the Beta calibration page. If you want to know what’s already merged but not yet released, the CHANGELOG [Unreleased] section is the source of truth.

Current — v0.9.8.5 (reliability + leadership polish)

The current release is v0.9.8.5 — reliability + leadership polish (release notes): producers run thinking-off by default while the judgment seats (Leader, planner, QC) keep reasoning; task count follows the work — the fixed per-goal cap is gone, each task sizes to a producer’s context budget; a Clay Leader sees the team’s deliverables (granted the run dir read-only, so it can inspect but never mutate them); and the end-of-run sign-off shows the real verdict plus a Product Quality Report digest. A round-one cadre BLOCK caught a real hole — a Clay Leader’s visibility grant was mounted read-write — fixed to read-only and re-verified before sign-off. Cleared by a four-reviewer code cadre.

The prior release was v0.9.8 — the Feng-Tui interface, finished (release notes): the full layout overhaul across every screen, the two-column CONSOLE command floor, paste-to-attach, and the TUI over an SSH login — built on v0.9.7 — project management (release notes).

The prior release was v0.9.6 — reliability (release notes): the team always finishes the job. A producer’s attempts are now budgeted per task — across every retry, hand-off, and re-run, never reset — so a model can’t loop forever or skirt the quality gate; when that budget is spent, QC finishes the work itself (patching the existing draft, or writing the artifact from the task’s brief), so a run lands a real deliverable instead of wedging. Around that: the Leader is the only required role (add QC and producers as you want them); lifecycle tooling (modulatio uninstall / modulatio repair); clearer signals (each QC review shows in the activity feed against its task, and a plain message when the Leader’s model is unavailable); and fail-closed confinement for a Clay subscription seat running as producer/QC. Every change cleared a four-lens cadre review (security, hull/terminal-state, code-quality, coherence), remediated and re-verified by a reviewer who ran the code.

The prior release was v0.9.5 — subscription seats (release notes): bring your own Claude and GPT-5.5 to the team. Clay runs any seat through your Claude Code subscription — claude -p, the official harness, so it spends the subscription you already pay for and never a metered key; it’s treated like any other agent in its role (confined to its own folder, the same operator approval to widen) and is purely additive to the existing Anthropic API-key path. GPT-5.5 runs the same way through your OpenAI Codex subscription. And each seat can carry fallback models — if a seat’s model is unavailable (rate limit, auth, 5xx), the engine warns and restarts the whole task on the next backup, so a down provider never stalls the team. Clay cleared full code cadre review (coherence / hull / bypass-surface / contract), with its one real finding remediated and re-verified on a live round-trip before merge.

The prior release was v0.9.4 — the two-lane Leader (release notes): the same Leader that orchestrates the team can now also work on its own — a standalone coding agent you pair with directly, reading, editing, and running files in a folder you point it at with /work. Its hands are confined by default to the Leader’s own per-project workspace (a structural cheat-guard — it physically can’t touch the team’s deliverables); widening to a real folder is an explicit, scoped approval (once / this session / always / deny; /rp revokes everything; a dotfile secret-floor keeps .env/.ssh refused even inside a granted folder; sandbox-required, fail-closed for anything it runs). Three autonomy modes turn it loose within bounds — /yolo auto-grants capabilities (network/shell), /goal delegates judgment (decide how without asking), /yolo-goal both — and one fence holds through all of them: running free outside your own yard always needs permission (no mode opens the folder gate; the capability controls and the filesystem fence compose as independent gates, both must pass, the fence checked first and regardless of mode). It carries an embedded runbook so it stays rigorous working alone. Every arc cleared full design and code cadre review (coherence, hull, bypass-surface, contract) before merge.

Prior — v0.9.3 (Feng-Tui terminal reskin)

v0.9.3 — Feng-Tui, the harmonious terminal interface (release notes). A full phosphor-terminal reskin of the TUI: a pure-black ground, thin frames, and a single monochrome accent in one of three live-cycling variants — amber / green / cyan, switched with F2 and remembered across launches — with state read as glyph + WORD rather than colour alone, a low-res boot splash on launch, a shared full-height-divider master-detail layout across the list tabs, a read-only skills preview, app-wide copy/paste, and uniform delete-confirmation guards. Layout-only — no backend wiring changed — reviewed across coherence, hull, and hooks/regression passes. It is the visual foundation the conversation-first TUI overhaul (below, in the 1.0 line) builds on.

v0.9.1 — agent role refinement (release notes). Producers, the Leader, and QC work to a per-operation standard — every task is classified by the kind of work it is and that selects the definition of “done” the work is judged against, the approach guidance the producer gets, and the bar the Leader and QC verify against, so a fix is judged on the reported problem actually being gone and a research task on its sources being real and synthesized. No behavior change for work that declares no operation — it defaults to a strict general standard, never a loose one.

Prior — v0.9.0 (stability + reporting)

v0.9.0 — stability + reporting (release notes). Two full-codebase debug passes — an exhaustive primary sweep, then an independent re-debug — each adversarially verified and reviewed by a multi-model cadre (an independent hull pass and a coherence pass), plus a producer/product/output agnostic audit: hundreds of edge-case, error-path, concurrency, and cost fixes, with no behavior change for a normal run — the engine is simply harder to wedge. Plus one net-new operator feature — a built-in crash / error / doctor log system (a LOGS tab in the TUI and modulatio logs on the CLI) that captures failures locally and lets you review and send them to the team, capture-always, submit-on-consent, auto-redacted (secrets, tokens, Authorization headers — titles as well as bodies) before anything reaches a public issue.

Prior — v0.8.9 (security hardening)

v0.8.9 — security hardening (release notes). A full-codebase security audit of the agent engine, then two independent mirror-audits — an adversarial hull pass and a coherence pass, each a different model reviewing the whole tree fresh — closed nine findings, the keystone (a tool-call authorization bypass that let a skill reach a tool outside its declared loadout) found by the independent pass, not the first. No functional change for a normal run — defense-in-depth on the surfaces a prompt-injected model could otherwise reach, every fix an engine-bound invariant. The guiding rule: a permission is a key to a door inside the ship; it never opens the sea valves.

v0.8.8 — deterministic assembly validation + codify-the-win

v0.8.8 — deterministic assembly validation + codify-the-win (release notes). Two arcs on one north star (cheap producers generate; the smart QC reviews cheaply and patches only the errors; the cost curve bends toward the cheap model): QC can now cheap-pass a code or media assembly it can prove correct — the engine proves the composite contains the declared units (code wiring statically checked, SaaS imports expected not failures; a bundle by exact byte equality; lossy media falls back honestly), so the assembled bytes never re-enter the model; and the self-codification loop now learns from QC recoveries, not just repeated fails — a rescue the smart QC keeps having to write is a technique the producer lacked, codified project-local with a non-independent spot-check flag. It builds on v0.8.6 — Leader self-remediation + JT generativity (the lead fixes fixable concerns in place under a typed gate + engine-owned fix window; the engine refuses a saved template a job can’t fill, derives a fitting one, and skips a drifted cron slot — release notes) and v0.8.4 — deliverable fidelity (the engine verifies the whole assembled deliverable against the brief: a DeliverableSpec, an engine-extracted digest + readable twin, a produce-bound per-part floor, generated framing, normalized numbering — release notes). The full release history is in the Releases menu.

v0.6.0 — role-language migration

The headline of v0.6.0 is the role-language migration — three structural fixes the engine needed before its next chapter, each cleared a fresh hull + coherence review. See the v0.6.0 release notes for the full delta.

v0.6.0 brings:

Routing reality on every headless path. The keystone’s capability + availability routing was wired only on the interactive path; the daemon, cron, Job-Templates, plan-mode, and TUI paths silently collapsed all producer work onto one model. v0.6.0 wires the per-agent model pools on every path and both producer channels — proven on real models through the daemon. Research routes by capability too, instead of a hardcoded role call.
No more roles — only producers. specialist and researcher are gone; there are only producers that compose skills, and research is a capability a producer holds. Old defaults.json keeps working through a read-fallback chain (producer → specialist → leader) — no migration step.
An operator-presence-aware Leader. The Leader judges when it runs autonomously (headless — it is the only check past QC) and defers when you’re present, replacing a blunt global “bias toward continue” with a principled gate. Its between-task self-correction ships on by default when nobody’s watching; the defer-to-you channel is the engine seam the coming conversational TUI plugs into.

v0.5.0 — per-job output folders + Job Templates

The headline of v0.5.0 is per-job output folders + Job Templates — the setup-side of the Alfred loop. Where v0.4.0 codified recurring failures into skills, v0.5.0 codifies recurring setup into reusable templates, and gives every job its own output folder so runs stop clobbering each other. See the v0.5.0 release notes for the full delta.

v0.5.0 brings:

Per-job output folders. Each job’s deliverables land in their own named subfolder (<job> <date>, hex tiebreaker only on collision), with the Product Quality Report inside, so each run is a self-contained package. Back-compatible: no name → the old flat path, byte-identical.
Job Templates. The Leader’s own self-authored interview + parameter schema + output contract for a class of job — domain-agnostic (the engine branches only on output cardinality, never on the job’s subject), git-versioned with a name-dedup guard, forked from the skill-library machinery.
An engine-enforced output contract. A “one deliverable per item over N items” template binds N deterministically — overriding the planner’s batching heuristic rather than hoping a prompt sentence holds — and post-validates the plan, reporting any shortfall firmly in the Product Quality Report instead of shipping it quietly.
Cron is a bound template. cron add --jt <name> --jt-params <json> validates the bound params against the schema at add-time (never fails at 3am) and runs headless with no interview. Existing template-less crons are unchanged.
Setup-side self-codification. At end of run the Leader judges recurrence over recent job history (≈3 of a kind, or an operator redo) and judges whether to codify a template — the setup-side mirror of skill self-codification. A self-improvement can never add a hard-required param without a default (which would break a bound cron). One env var from off (MODULATIO_JT_CODIFICATION=0).

v0.4.0 — autonomous skill self-codification

The headline of v0.4.0 was autonomous skill self-codification: the team learns from its own repeated failures. At the end of a run the Leader reads QC’s recent fail verdicts, judges what recurred enough to be worth remembering, and codifies the correction into durable, git-versioned skill guidance that cheap producers load next time. See the v0.4.0 release notes for the full delta.

v0.4.0 brought:

Self-codification at end of run. The Leader judges recurrence over QC’s fail verdicts (roughly three of the same defect) and either improves an existing skill (a “Learned” section + a version bump) or creates a new single-purpose one. Recurrence is the model’s judgment over the log, not a mechanical counter.
A git-backed skill library. Every codification is versioned and committed, so a lesson earned at a token cost is never lost and is always revertible. The git layer is best-effort and inert when git is absent — it never raises, never touches global config.
No second review of what the team learned. QC already voted via the repeated fails the lesson is built from, and a weaker QC gating the strongest seat’s judgment would invert the capability floor — so the Leader’s call stands, mirroring QC-as-fixer. Runtime QC still reviews the artifacts the skill later influences.
Observable + reversible by design. A silently stalled loop leaves a skill_codification_skipped breadcrumb without ever breaking a run; the whole loop is one env var away from off (MODULATIO_SKILL_CODIFICATION=0).

v0.3.0 — the skill-library keystone

The headline of v0.3.0 was the skill-library keystone: a producer is now a model endpoint, not a holder of a frozen skill list. It checks out whatever a task needs from a shared library at run-time, so any producer can run any task and the capability gap that used to block a task is dissolved. See the v0.3.0 release notes for the full delta.

v0.3.0 brought:

Producers as model endpoints. Setup no longer assigns skills — you give a producer an LLM and tag what it’s good at. Skills compose per task from the shared library; the wizard gets simpler and the team composes capability at run-time.
Capability + availability routing that never blocks. Dispatch picks the least-loaded producer (independent work spreads across idle models), prefers ones that meet a task’s capability floor, but runs the best-available model with a Product Quality Report reservation when none do. The only hard gap is “no producer exists at all” — a setup error.
The skill library’s first working brick — search_skills / load_skill / drop_skill over a shared pool with a cheap resident index (names + one-liners, no bodies). The lazy checkout/drop library + the self-codification flow landed in the bricks that followed.
Self-contained goal decomposition — a fix the live runs surfaced: goals now name their concrete subject (never “the three topics”), and the project objective is threaded into the producer prompt as a north-star.

v0.2.2 — web search + a provably-terminating redo loop

The headline of v0.2.2 was web search and a redo loop that provably terminates. Producers can now discover current sources by searching the web — instead of only fetching a URL they already know or reciting stale training data — and the loop that redoes a not-good-enough goal can no longer spin forever: it always exits to the Product Quality Report. See the v0.2.2 release notes for the full delta.

v0.2.2 added:

web_search (DuckDuckGo, no API key) — a producer searches, reads ranked hits, then http_gets the URLs worth reading. Shipped as the first brick of the skill library: a separate single-purpose web-search skill composed onto a task via a per-task tool union (no fixed roles, no bundling).
Source-credibility flagging — known content-farm domains are flagged and sunk below credible hits (flag, never drop); extensible via MODULATIO_LOW_CREDIBILITY_DOMAINS. Plus an http_get User-Agent fix (it was 403’ing courteous sites).
A provably-terminating redo loop — the per-run retry budget is now absolute (a midnight roll can’t reset it mid-run); fix-is-final + a deadlock bow-out; budget tightened to 4. The goal always ships to the Product Quality Report.
A skill-library design spec for the brick’s generalization (lazy checkout/drop from a shared pool) — design only, not yet built.

v0.2.1 — in-place editing

The headline of v0.2.1 was in-place editing: hand Modulatio an existing file with --attach and it improves it surgically rather than rebuilding from scratch. The producer emits exact SEARCH/REPLACE blocks and the engine applies them, keeping every untouched byte. See the v0.2.1 release notes for that delta.

v0.2.1 added:

kickoff --attach <file> — pin an existing file and switch on in-place edit: the attached file is the starting point, the plan stays in it (no scatter), and changes are surgical.
Surgical patch mode — SEARCH/REPLACE blocks applied by the engine; untouched content is preserved structurally. Artifact-agnostic (code, prose, config, data).
A code read-toolkit (grep / tail / wc / read-only sed), confined to the artifacts dir.
Code ships verbatim — a game.py is delivered as runnable source, with markdown companions beside it, not rendered to a stray .docx; delivery dedups and replaces rather than piling up copies.
The verify-goal wall now catches the run-it-to-check family (test / playtest / play through).

v0.2.0 — the QC-thesis arc

The headline of v0.2.0 was the QC-thesis arc: cheap producers generate the bulk, the smarter QC reviews and patches only the errors, the Leader confirms while QC owns repair, and the lead’s unresolvable reservations ship as an advisory Product Quality Report beside the work. See the v0.2.0 release notes for that delta.

v0.2.0 added:

The Product Quality Report — the lead’s honest covering note, advisory and never a gate.
Bundled default standards (research / code / text / marketing) so cold-start QC has a real bar; plus the rigorous-sourcing producer skill.
The Leader-confirms / QC-repairs model — standalone “verify the deliverable” goals are dropped at planning; verification verdicts no longer open tickets.
Context-budget hardening — capped http_get fetches + tool-result truncation close the overflow/storm class; budgets are model-agnostic.
Finished products delivered as .docx, and withheld when a task or goal is genuinely blocked.

The v0.1.0 foundation

The headline of v0.1.0 is the five-layer working-memory architecture plus calibrated honesty about what the engine is and isn’t sized for. See Beta calibration for the full contract; in one paragraph: the engine is sized for single deliverables (one cohesive output) and multi-piece deliverables (3-7 related outputs in one phase). Production-scale efforts (a 200-page novel, a multi-feature application across many releases) are explicitly Phase-1-only.

The v0.1.0 foundation ships:

Tool-loop summarization (Layer 1) and context-budget gating (Layer 2), wired into production.
The repo_map symbol-aware code digest (Layer 3) for Python repos.
Team-state continuity (Layer 4) and cross-cutting terse-prose templates (Layer 5).
A three-tier over-scope gate: Leader-plan deliverable-shape clarifying question, the planner’s hard cap of 6 tasks per sub-objective, and a 70% soft-warn band below the compression band.
A context-budget exhaustion route that opens a CRITICAL ticket carrying the conversation checkpoint and routes Leader-reflect to revise-major.
A modulatio doctor engine-calibration banner that surfaces the contract on first contact.
The producer-collapse / skills-first model: no fixed role identities — an agent is a producer-with-skills, and tasks are skill-routed to whichever producer holds the matching skill.
QC-as-fixer (ON by default) — when a producer can’t clear the bar after exhausting retries, QC patches the artifact from its own findings and the task completes.
Provider thinking on/off control so you can run producers non-thinking.
Honest documentation: the Beta calibration page names every gap users should expect to hit.

Next — v1.0 (a new way to use it)

1.0 is the interface chapter: the engine internals (orchestrator, plan model, working-memory layers, skill system) stay stable — the change is in how humans engage with it, and how much you hand over.

A new TUI. The conversation-first overhaul — the Leader as a streaming partner you talk to (not an admin dashboard), agents addressed by the names you gave them, producer + QC activity streamed live. The Feng-Tui reskin (v0.9.3) landed the visual foundation — pure-black phosphor look, live-cycling variants, glyph+WORD state, the shared master-detail layout; the conversation-first streaming surface builds on it.
A web-based UI. Drive the same engine from the browser — a different deployment shape (multi-user collaboration, asynchronous work, longer horizon) without bending the local-vault contract.
Remote access via the web UI. Run Modulatio on a box and reach it from anywhere, so a long-horizon deployment isn’t tied to the terminal it launched in.
Operator permission modes — /yolo, /goal, /yolo-goal. The humane padded room: when the sandbox blocks a capability (run a command, reach the internet, use a secret), the team asks you in plain words — just this once / this whole session / always / no — and remembers your answer. Three modes hand over judgment and/or access at your discretion: /yolo auto-grants the access asks (the sandbox stays on), /goal lets the Leader decide how without stopping to ask, /yolo-goal does both. Job Templates record the grants + mode so a headless cron run carries its own authorization. (The security-critical core is already built and independently reviewed.)
Cross-phase memory + persona continuity. Long-running deployments that span weeks or months — templates that carry team state, conventions, and prior decisions across phases, and a stable team identity that survives a resume — lifting today’s Phase-1-only constraint.
Various minor features and improvements. Smaller upgrades that ride along — not enumerated here.

Pillars under design

The following are tracked as long-horizon design pillars rather than specific releases. They land when they’re ready.

Build / test feedback loop. Producers can write code; the next slice lets them run the build, read the test failures, and iterate. Closes the gap where iterate-on-failure needs human-in-the-loop. Likely lands as a code_iter skill composing run_shell + parse_test_output + redo-with-failure-context.
Multi-language symbol map. Modulatio’s repo_map is Python-only. Extending to JavaScript / TypeScript / Rust / Go via tree-sitter or per-language AST modules. Today’s filename-only fallback for non-Python repos goes away.
Embedded LLM defaults. Setup-wizard offers a curated embedded-LLM bundle for users who want zero external API dependencies. Ships pinned model presets that work end-to-end with the local Ollama defaults.
Cost / token telemetry surfaces. Already-tracked per-call usage (the <plan>.usage.jsonl ledger) gets a first-class TUI tab + CLI subcommand for “where did the budget go” forensics. Ties into the existing comptroller primitives.

What’s intentionally not on the roadmap

A few things users sometimes ask about that we’ve deliberately not queued. Not because they’re bad ideas — because they don’t fit Modulatio’s current shape.

Hosted / SaaS Modulatio. Modulatio is open-source software you run on your machine. We don’t currently plan a hosted offering. The local-vault contract is load-bearing — the user owns their data, their providers, their models. A hosted version would break that.
Agent-to-agent autonomous protocols. Agent-to-agent surfaces are interesting but not on the Modulatio roadmap. Modulatio’s job is to orchestrate one team of agents on one team’s work. Agent-to-agent federation is a different problem and a different product.
Replacing the human in the loop. Modulatio’s gates (plan approval, between-phase reflection, ticket approvals on CRITICAL) are by design. The roadmap doesn’t include a “fully-autonomous” mode that bypasses them. Those gates are what makes Modulatio viable for high-stakes work.

How to influence the roadmap

File an issue on GitHub using the Feature request template. Tag it with the affected component label so it shows up in the right slice’s queue.
Open a discussion on Discussions for open-ended threads where the shape isn’t yet a concrete feature request.
Audit Modulatio against your use case and file the gaps.

Project history

Modulatio matured as engine work under the name Starling through several pre-1.0 iterations; v0.1.0 is the first release shipped under the Modulatio name. The prior repository is being retired as Modulatio replaces it; this is a clean break, not a parallel project.