Working memory — the five-layer architecture
Modulatio lands the five-layer working-memory architecture, the core of the engine’s ability to handle work that exceeds a single LLM context window. This page is a deep-dive: what each layer solves, what it sees, what it doesn’t, and how they compose.
If you want the executive summary first: context-bound work doesn’t fail silently. Tool loops summarize their inflow (Layer 1), every call boundary is gated against the model’s window (Layer 2), code repos get a symbol-aware digest instead of filename listings (Layer 3), team continuity survives across sub-objectives (Layer 4), and prompt templates pull their weight without prose bloat (Layer 5). When work still outgrows the engine’s ceiling — which it can for production-scale efforts — the engine refuses gracefully with a checkpoint and a decompose-required ticket, rather than burning iterations on a deterministic failure.
The five layers are independent — you can disable any one of them without breaking the others — but they compose into a coherent story about what the team can carry across turns and across sessions.
Layer 1 — tool-loop summarization
Section titled “Layer 1 — tool-loop summarization”Module: modulatio/tool_summarization.py
What it bounds: the inflow that accumulates inside a single agent call.
A tool-using agent’s conversation grows linearly with every tool result it sees. A long tool loop — research that hits twenty URLs, a QC pass that runs ten probes — can put hundreds of kilobytes of verbatim payload into the conversation. Layer 1 catches that at the tool-result return site: when a tool result exceeds threshold_tokens, it’s persisted verbatim to <run>/tool_calls/<call_id>.txt and replaced in the conversation with a short summary plus a call_id pointer.
The model doesn’t lose access to the original — it can call read_tool_result(call_id) any time to recover the verbatim text. But the common case is that the model doesn’t need the verbatim back; it needs the gist + the option to drill in. Layer 1 makes that the cheap path.
Configuration shape
Section titled “Configuration shape”@dataclassclass ToolSummarizationConfig: enabled: bool = True threshold_tokens: int = 2000 summarizer_model: str | None = None keep_recent: int = 3 prune_at_pct: float = 0.80 tool_calls_dir: Path | None = Nonesummarizer_model = Nonekeeps Layer 1’s summarization branch a no-op even when bound. Production bindstool_calls_dir = <run>/tool_calls/but leavessummarizer_modelto per-project config — opt-in, not implicit.keep_recentis shared with Layer 2’s compression: the most recent N tool messages stay verbatim, older ones get pruned to placeholders.
Sliding-window prune
Section titled “Sliding-window prune”Layer 1 also exposes prune_messages_sliding_window(...) — when the conversation crosses prune_at_pct of the model’s window, older tool-role messages are rewritten to [summarized: call_id=...] placeholders, keeping keep_recent verbatim. Layer 2 calls this ad-hoc on overflow; Layer 1 itself doesn’t auto-invoke prune inside the tool loop (the summarization-on-return path is the only auto-trigger).
Recovery path: read_tool_result
Section titled “Recovery path: read_tool_result”When the model calls read_tool_result(call_id), the tool reads <tool_calls_dir>/<call_id>.txt and returns the content. The tool validates call_id against path traversal (no slashes, no ..) and refuses anything that would escape the directory.
What Layer 1 does NOT do
Section titled “What Layer 1 does NOT do”- It doesn’t retroactively summarize tool results that landed before the threshold was crossed. The decision is per-result at return time.
- It doesn’t deduplicate similar tool results. If the model fetches the same URL twice, both results are persisted separately.
- It doesn’t surface a “you’re spending a lot on tool results” signal. That’s cost-telemetry territory and is on the Roadmap.
Layer 2 — context-window budget gate
Section titled “Layer 2 — context-window budget gate”Module: modulatio/context_budget.py
What it bounds: every call boundary — the prompt + history sent to the model on each turn.
Where Layer 1 catches tool-result inflow inside a single agent call, Layer 2 catches every call boundary uniformly. Planner brief, Leader-reflect input, QC eval context, Producer prompt, Leader decompose: each goes through check_and_compress(...) before dispatch. Four-state gate:
- Under
soft_warn_at_pct(default 70%) — silent pass-through. - In
[soft_warn_at_pct, prune_at_pct)(default 70-80%) — structuredWARNINGlog via Pythonlogging. No compression. The first warn perrun_llm_with_toolsinvocation fires; subsequent iterations sitting in the same band are suppressed so a 20-iter tool loop doesn’t emit 20 identical warnings. - In
[prune_at_pct, 100%)(default 80-100%) — invoke Layer 1’sprune_messages_sliding_windowad-hoc. Re-estimate. If the compressed prompt fits, proceed. - At or above 100% after compression — write a checkpoint to
<run>/checkpoints/<call_id>.jsonand raiseRecoverableContextError. The orchestrator catches this and lands the task as BLOCKED with a CRITICAL ticket carrying the checkpoint path and decompose-required framing.
Configuration shape
Section titled “Configuration shape”@dataclassclass ContextBudgetConfig: enabled: bool = True max_input_tokens: int | None = None soft_warn_at_pct: float = 0.70 prune_at_pct: float = 0.80 pad_pct: float = 0.05 keep_recent: int = 3 checkpoints_dir: Path | None = None checkpoint_redact_secrets: bool = Truemax_input_tokens = Nonefalls back tolitellm.get_max_tokens(model)with a conservative_DEFAULT_FALLBACK_MAX_INPUT_TOKENS = 8192for unknowns.pad_pct = 0.05adds 5% padding to the raw token estimate — models tokenize slightly differently across providers, padding keeps us from clipping the cap by surprise.checkpoint_redact_secrets = Trueredacts tool-role bodies AND assistanttool_calls[*].function.argumentsbefore write. See the Checkpoint format section below.
Checkpoint format
Section titled “Checkpoint format”When the gate refuses, the checkpoint file at <run>/checkpoints/<call_id>.json carries the conversation snapshot for audit + Leader-side recovery. It is not loaded back as a re-input source — checkpoints are decomposition inputs + audit artifacts, not resume payloads.
{ "timestamp": "2026-05-06T20:30:00+00:00", "call_id": "iter-3", "model": "openrouter/anthropic/claude-haiku-4-5", "estimated_tokens": 205000, "max_input_tokens": 200000, "redaction_policy": [ "tool.content", "assistant.tool_calls.function.arguments" ], "redacted": true, "messages": [ {"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "...", "tool_calls": [ {"id": "call_1", "type": "function", "function": {"name": "http_get", "arguments": "[redacted: 142 chars]"}} ]}, {"role": "tool", "tool_call_id": "call_1", "content": "[redacted: 50000 chars]"} ]}redaction_policy is the explicit honest field — it lists exactly which channels were redacted. Files are written at 0o600 (owner read/write only), with both os.open(..., O_CREAT, 0o600) for the no-race creation case AND a follow-up chmod(0o600) for the existing-file repair case.
What Layer 2 does NOT do
Section titled “What Layer 2 does NOT do”- It doesn’t redact assistant prose content, user prompts, or system prompts. A model that echoes a tool response into its assistant
contentfield will still leak the secret. Regex sweeps over assistant + user content are roadmap work. - It doesn’t load checkpoints back as resume payloads. By design.
- It doesn’t compress assistant tool_calls themselves (only their arguments are redacted, the call structure stays).
Layer 3 — repo_map symbol-aware code digest
Section titled “Layer 3 — repo_map symbol-aware code digest”Module: modulatio/repo_map.py
What it bounds: what the team can see about an existing code base without reading every file.
When the planner decomposes a sub-objective for a code repo, the producers need to know the existing code shape — what classes exist, what their methods are, what types they accept. Without Layer 3 the team would get a filename listing and have to grep + read files to learn shape. Layer 3 replaces that with a stdlib-ast-extracted symbol digest: classes, methods, signatures, module docstrings, top-level functions.
Coverage and limits
Section titled “Coverage and limits”Modulatio’s repo_map is Python-only. JavaScript / TypeScript / Rust / Go projects fall back to a filename listing. modulatio doctor surfaces this calibration on first contact. Multi-language symbol awareness is on the long-horizon Roadmap.
What goes into the digest
Section titled “What goes into the digest”- Module docstrings — the top-of-file context for what each module does.
- Top-level functions — name + signature + docstring.
- Classes — name + docstring + every method’s name and signature.
- Module imports — surfaces the dependency graph the team has to be coherent with.
Bodies are deliberately not included. The digest is for “what’s there”, not “how does X do Y” — which is the producer’s job to read when it actually needs to.
Layer 4 — team_state continuity
Section titled “Layer 4 — team_state continuity”Module: modulatio/team_state.py
What it bounds: what the team carries across sub-objectives.
Plan execution runs sub-objectives in sequence. Without Layer 4, each sub-objective starts fresh — the team sees the next prompt but has no narrative connection to what just happened. Layer 4 maintains a small, structured state document at <run>/current_state.md that captures: what was just shipped, what the producer claimed in their summary_for_state_doc trailer, what QC verified, and any divergence flags Leader caught between claim and verdict.
Three write paths
Section titled “Three write paths”- Producer self-claim. Producers tag their final output with a
## summary_for_state_doctrailer; the orchestrator extracts it before artifact-cleanup so it doesn’t leak into the artifact body. The claim says “what I just shipped, in one paragraph.” - Producer/QC prompt prepend. Both Producer and QC see the prior
current_state.mdbody prepended to their prompt under a## Team Stateheader. - Leader-reflect Verify-phase write-back. Between sub-objectives, Leader-reflect reads the prior state + producer claims + QC verdicts, emits the next state doc, and flags divergences (places where producer claim and QC verdict disagree). Divergence notes append to
<run>/audit.jsonl.
FIFO + soft-cap
Section titled “FIFO + soft-cap”The state document has a soft cap (~2KB rendered) to keep it prompt-appropriate. Older entries FIFO out — but unlike Layer 1’s prune, this is between-sub-objective state, not within-call. The state doc is the team’s running short-term memory; older sub-objective summaries roll off as the plan progresses.
What Layer 4 does NOT do (yet)
Section titled “What Layer 4 does NOT do (yet)”- It doesn’t carry persona / identity context. If your team has a recurring character or named voice, that context drifts across long runs. See the Roadmap — persona continuity is upcoming work.
- It doesn’t survive plan boundaries. A new plan starts with an empty
current_state.md; team_memory carries cross-plan facts but team_state is plan-scoped.
Layer 5 — terse-prose convention across templates
Section titled “Layer 5 — terse-prose convention across templates”What it bounds: the prompt templates that drive every agent on every turn.
The agent prompts ship as templates with pinned instruction contracts. Earlier versions of those templates carried a lot of prose overhead — long recap sections, redundant axis explanations, pre-canned framing. Layer 5 is a cross-cutting compression pass across the templates that keeps load-bearing contracts verbatim while trimming prose around them.
The honest pattern: a uniform compression target wasn’t universal. Templates with high contract-content (verbatim JSON shapes, axis lists, severity ladders) compressed less; the discipline was “preserve every load-bearing rule, drop only prose around it.”
Why it matters
Section titled “Why it matters”Smaller templates mean more headroom for the actual conversation context — the prior turns, the team_state, the team_memory pull. Layer 5 is the multiplier on the other layers: every byte saved in the template is a byte the conversation gets back.
What Layer 5 does NOT do
Section titled “What Layer 5 does NOT do”- It doesn’t auto-tune templates over time. If a template grows back through future edits, the budget is gone. Drift-gate tests pin the post-Layer-5 sizes so edits that grow the templates fail review.
- It doesn’t compress runtime prompt slots (the
{team_memory_context},{team_state}, etc. injections). Those are user-content, not template-content.
How the layers compose
Section titled “How the layers compose”A single agent call passes through multiple layers in sequence:
- Layer 5 — the template renders with its (terse) instruction contract.
- Runtime slot fills —
{team_memory_context},{team_state}(Layer 4),{repo_map}(Layer 3) are interpolated into the prompt. - Layer 2 preflight —
check_and_compressevaluates the assembled prompt against the model’s window. Decisions: pass-through, soft-warn, compress, or refuse-with-checkpoint. - Tool loop (when applicable) — model dispatches tools.
- Layer 1 — long tool results get summarized + persisted.
- Layer 2 again — every iteration of the tool loop hits the gate; compression and refuse are both reachable mid-loop.
- Producer self-claim trailer + Layer 4 write-back — when the call returns, the producer’s
summary_for_state_docgets extracted and Leader-reflect’s next turn updatescurrent_state.md.
A user reading the conversation in <run>/transcripts/ sees only the surface — the actual shape under the hood is this five-layer sandwich.
Disabling layers
Section titled “Disabling layers”Each layer is independently togglable for debugging or backwards-compatibility:
- Layer 1 — set
ToolSummarizationConfig.enabled = False(or simply don’t bind a config). Tool-result summarization + prune go away; verbatim payloads accumulate. - Layer 2 —
ContextBudgetConfig.enabled = Falseor no binding. Soft-warn / compress / checkpoint all skip; the runner behaves as if no budget gate were present. - Layer 3 — disabled per-task by simply not declaring
repo_mapin the producer’s skill loadout. The team falls back to the filename listing. - Layer 4 — happens automatically when no
current_state.mdexists yet (first sub-objective) or when the team_state path isn’t writable. - Layer 5 — not really “disable-able”; the templates are what they are. Drift-gate tests pin the sizes.
In production you want all five active; the toggles exist for testing the failure modes individually.
Cross-references
Section titled “Cross-references”- The Layer 2 catch route lives in
Orchestrator._block_for_context_budget; see Audit trails for how the BLOCKED transition + ticket land in the run record. - The Layer 1 + Layer 2 binding sites are in
Orchestrator.kickoff,project_execution.start_execution, and (for direct TUI kickoffs)tui.app._build_kickoff_orchestrator. See Skill system for how skills declare their tool loadouts and how the registry seesread_tool_resultand the other Layer-1-recovery primitives.