Recording the Parse, Re-deriving the Plan

Kendall Clark · Pentad Labs · 6 June 2026 · PLRN-009

Abstract

In WunderOS the planner is a deterministic HTN engine, off the hot path, that decomposes a goal into a plan and replays byte-for-byte from a seed. We nonetheless want a neural model in the loop, because the natural way to author plans is incremental and intent-driven: the next goal is formed after the previous result is seen, not fixed whole at the start. The model’s role is to parse intent into a goal the engine decomposes, and, where no method yet covers the goal, to author a plan fragment as a fallback. Both are nondeterministic; the planner is not. This appears to conflict with a deterministic-replay executor that pins a plan version for the life of an invocation, for the simple reason that an incrementally-authored plan has no precisely-fixed shape when the invocation begins. The conflict dissolves when the model’s contribution is modeled as a recorded actuator.

Each parse or authored fragment is a write-ahead-logged completion. Replay reads it from the log and never calls the model again, and the deterministic decomposition is re-derived rather than stored. The plan materializes over time as its own trace. Two further commitments keep the construction safe. First, the authoring discipline is bounded-epoch: each epoch is a closed sub-plan that certifies under the pinning guarantee unchanged, with re-planning only at explicit boundaries. And, second, the model’s effect is recognized as an effect on the plan rather than on the world, so that it carries no compensation and is reversed by truncating the recorded frontier rather than by running an inverse action. The construction sits inside Brass Loom, the single side-effect executor of WunderOS, and inherits the deterministic-replay fingerprint described in PLRN-002.

1. The problem

A plan in WunderOS is a structure the executor runs: a set of steps with ordering edges, each step a dispatch to an actuator that changes state, calls a tool, sends a message, sets a timer. The executor records every step against a write-ahead log before it acts, so that a crashed or migrated plan resumes from the log and a completed plan replays byte-for-byte from a seed. The replay guarantee matters rather a lot, allowing the system to promise audit and compliance-sensitive tenants that a days- or weeks-long plan can be reproduced exactly from its record. The executor pins a plan when its invocation starts. The pin fixes the planner version and the decision sequence for the life of the invocation, so that a plan in flight for days does not silently upgrade underneath itself. The guarantee is stated over a plan tree that exists at the moment the invocation begins.

A word on what plans. WunderOS does not make a neural model the planner. The planner is, rather, a deterministic HTN engine that mines methods from past traces and decomposes a goal into a plan, with no neural model in its inner loop. The premise of the system is that neural models plan badly and parse well, so the model grounds intent into a goal and a planning engine does the planning. This is the LLM-Modulo arrangement: a sound engine reasons, the model translates. Where the method library does not yet cover a goal, the model authors a plan fragment directly, as a bootstrap that recedes as methods are mined. The model is therefore the planner’s front-end and occasional stand-in, not the planner.

The difficulty is that the most important source of intent does not arrive all at once. A neural model working a real task proposes a goal, sees what executing toward it returns, and forms the next goal in light of the result. The sequence of goals is not present at the start. It comes into being one epoch at a time. Read literally, the pinning guarantee assumes a plan that the incremental process cannot supply, because there is no whole tree to pin when the invocation begins.

The question before us, then, is whether incremental authoring by a nondeterministic model can be admitted into a deterministic-replay executor without weakening the guarantee that makes the executor worth having.

It can and the rest of this note is that construction.

2. The executor and its replay contract

In WunderOS, the Brass Loom is the single side-effect executor. Everything that changes the world outside the agent passes through it as a recorded plan-step. The recording discipline is comprised of two layers. The native substrate records every activity and chains the records into a signed fingerprint, so that the same seed yields the same fingerprint; this is the composition described in PLRN-002. On top of that, the executor labels named checkpoints and owns the contract by which a plan suspends for human intervention and later resumes.

The contract that matters here is the one the executor already keeps for any external call. A tool call leaves the executor, reaches a service whose reply the executor does not control, and returns a result. The reply is nondeterministic in the sense that the executor cannot compute it. The executor handles this by recording the reply at the boundary. On replay it does not call the service again. It reads the recorded reply. The nondeterminism is captured once, at the dispatch boundary, and never re-enters the system. This is the standard move of deterministic-simulation testing, and Brass Loom already makes it for every tool call.

A neural model of whatever sort is a service of exactly this kind. Its reply is or may be in the worst case nondeterministic and the executor cannot compute it. The model differs from an ordinary tool in what its reply is for, not in how its reply is obtained. That difference is the subject of section 4. The similarity is the foundation. An authored fragment is recorded at the boundary like any other completion and replay reads it.

3. The authoring discipline

Three authoring disciplines are available, and the choice among them is our first design question.

The first is pure step-by-step authoring, in which the model itself writes one step, the step runs, and the model writes the next. This is the model planning directly, and WunderOS uses it only as the gap-path fallback below. It is the most flexible and gives the weakest guarantee. The plan is known only after each step, so neither a human nor a policy engine can certify the whole run before it starts, and the surface that must be audited grows with every step. The recent literature reports it as the costliest in tokens and the lowest in control-flow integrity (Architecting Resilient LLM Agents, 2025).

The second is plan-and-execute, in which the model emits a complete plan before any side-effect runs and a separate executor runs it. The whole plan can be certified before anything happens, and replay is straightforward because the plan is fixed. The planning cost is paid up front, in latency and tokens, and the plan is brittle to a world that does not match the model’s prediction.

The third is bounded-epoch re-planning, in which the model parses a goal, the engine decomposes it into a closed sub-plan, the executor runs it, and the model is consulted again only at an explicit epoch boundary. Neither too hot, nor too cold, but just right. This is the discipline WunderOS adopts as its default. Its key property is the one the other two miss. A bounded epoch is a closed plan, so it has a tree at the moment it starts, so it certifies under the pinning guarantee without any change to that guarantee. The audit surface is the epoch boundary rather than the individual step. Recovery does not require re-consulting the model for every step, because the epoch’s steps are recorded. Token cost falls between the other two. The literature on full-horizon planning with lazy re-planning reports it at parity in accuracy with step-by-step authoring at a fraction of the token cost (Do Agents Need to Plan Step-by-Step?, 2026), and the execution-lineage work reports exact reproduction of a recorded directed-acyclic plan where a loop reproduces nothing (From Agent Loops to Deterministic Graphs, 2026).

Bounded-epoch authoring is, therefore, not a compromise between flexibility and control. It is the only one of the three planning modalities that satisfies controllability, auditability, and deterministic replay together. Pure step-by-step authoring by the model remains available for a sub-task too dynamic for the engine to close, embedded inside an epoch, but it is the degraded gap-path: not the default, and it does not set the certification boundary.

The epoch boundary is also a certification boundary, not a human-approval gate. An epoch is admitted by the policy engine without HITL. The boundary is the place where a human approval may be inserted when a tenant policy demands it. The discipline is human-optional, and the option is free because the boundary already exists for certification.

4. The model’s contribution is a recorded actuator

In WunderOS the model’s contribution to a plan, a parse or in the fallback a gap fragment, is dispatched as an actuator, on the same uniform shape as every other dispatch, and its reply is recorded at the boundary like every other completion. What distinguishes it is not its mechanics but its effect, and the distinction is what the rest of this section works out.

4.1 Authoring is an effect on the plan, not on the world

The four ordinary actuator classes act on the world. They write state, call a tool, send a message, set a timer. Each can in principle be undone by a compensating action, and the executor carries a compensation discipline for exactly that: a step may register an inverse that the executor runs if a later step fails. Agents love sagas, or they’ll learn too at any rate.

An authoring step acts on the plan, not on the world. Its effect is to append steps to the plan’s open frontier, whether those steps come from the engine’s decomposition of a parsed goal or from a gap fragment. Nothing outside the agent changes when it runs. This has a precise consequence for compensation. There is no inverse action to register, because there was no appreciable world effect, so an authoring step carries no compensation hook. The world-effect steps that the authoring step produced are ordinary steps and compensate in the ordinary way. The authoring step itself is reversed by removing what it appended, which is to say, by truncating the recorded frontier back to the point before the authoring completion. The substrate for that truncation already exists in WunderOS as the log-rollback machinery used elsewhere.

Reversing an authored region is therefore two operations in sequence: first, compensate the world-effect steps in the region in reverse order, reading the trace to know what they were, then, second, truncate the frontier. Truncation is the last act, not a compensation entry, because the trace is needed until the compensations are done.

Admission follows the same shape and is recursive without being circular. The policy engine admits the authoring call as it admits any dispatch, against the tenant, the isolation tier, and the contract it may emit within. Each step then appended to the frontier, whether decomposed by the engine or taken from a gap fragment, is admitted on its own, at the moment it is about to run. The admission of the authoring call does not pre-authorize the steps it produces. There is no special threading of the compensation chain, because the authoring step resides in no chain.

4.2 Recording the completion, not re-running the model

The durability of the authored plan rests on recording the model’s reply, not on the model’s reproducibility. This distinction is the one place where a tempting alternative is wrong for this system.

The alternative is to make the model itself deterministic: fix the sampling temperature to zero, record the prompt and the model version, and on replay run the same prompt again and check that the output matches. This works only as well as the model’s reproducibility holds, and it does not hold. A reply can vary across nominally identical calls through batching, through nondeterministic kernels, through a provider’s silent version change. Any scheme that reproduces a reply by re-running the model inherits every one of these as a way to diverge on replay.

Recording the completion avoids all of them. The reply is written to the log when it first arrives, and replay reads the written reply. The model is never called on replay, so the model’s reproducibility is irrelevant. The nondeterminism is captured once and converted into a fact in the log. An incrementally authored plan is, by this means, a fixed and replay-deterministic artifact. It materializes over time rather than in space. The plan is its trace, the trace is the durable plan, and authoring and execution are one act recorded in one log.

The division of labor sharpens what is recorded. It is the model’s output, not the engine’s plan. The HTN decomposition is deterministic, so it is re-derived rather than stored: the engine, run over the method-library version pinned for the invocation, produces the same sub-plan from the same recorded parse. Only the model’s own reply, the parse or a gap fragment, carries nondeterminism, so only it need be written down. This is why the pin fixes the method-library version. That version is the deterministic input the re-derivation reads, and pinning it is what makes same-seed-same-plan hold across replay.

This is event-sourcing applied to the act of planning, and it has direct precedent in the recent literature on event-sourced agents (ESAA, 2026), which records each model decision as an envelope in an append-only store and replays from the store.

4.3 What replay reads

A replay of an incrementally-authored plan reads, in order:

the recorded admission decisions,
the recorded model replies (each a parse, or in the fallback a gap fragment), and
the recorded world-effect completions.

The frontier is not read back wholesale. It is rebuilt: for each recorded parse, the engine re-derives the same decomposition over the pinned method library; a recorded gap fragment is appended as written. The model is never consulted, and the engine is deterministic, so the same recorded replies plus the same pinned methods give the same frontier. Ordering is the recorded order of the single-writer log, which is replay-stable by the fingerprint composition of PLRN-002. There is no point at which the model’s nondeterminism re-enters, because the model is not on the replay path at all, and the only thing read from the log that the model produced is its already-captured reply.

4.4 Failure modes

The literature on recorded execution flags a small set of failure modes, and each maps to a specific commitment in our construction.

Nondeterminism leaking past the boundary is the first. It is fatal to the re-run-and-compare scheme of section 4.2 and harmless to the record-the-completion scheme, because the latter never re-runs the model. Recording the completion is the design’s answer to this mode, and it is the reason we rejected the alternative.

Loss of in-flight authoring state on a crash is the second. A bounded-epoch plan suspends across the model round-trip, sometimes for arbitrary time, and the suspended state must be durable. The transactional-frontier work makes the same point about in-memory frontier state and warns that it must be persisted (Atomix, 2026). In WunderOS the frontier is durable by the same pin that the worker re-reads on resume, and the suspended plan rides the hibernation path rather than an inline driver that would discard the pin.

Drift between an emitted action and the schema that admits it is the third. The canonical-authorization work reports that a canonical schema must be kept in step across authors or it produces false rejections (Faramesh, 2026). This is a live caution for the per-step admission gate and for the cert envelope discussed below, and it is the reason the envelope’s shape is deferred rather than fixed hastily.

A recorded identity that captures a nondeterministic input is the fourth. Execution-lineage schemes that identify a step by a hash of its inputs diverge silently when an input is nondeterministic (From Agent Loops to Deterministic Graphs, 2026). The boundary discipline answers this, too. The recorded completion is the input on replay, so the identity is stable.

The last is the cost of the log itself. Recording every authored fragment grows the log, and a large fragment can dominate its bandwidth. This is a sizing concern rather than a correctness one, and it bounds how large an epoch should be.

4.5 Certification without a pre-materialized plan

The hardest question is certification, and the bounded-epoch discipline is what makes the answer tractable. A pre-materialized plan can be certified whole before it runs. An incrementally authored plan cannot, for the obvious reason that its steps do not yet exist. The discipline reconciles the two. Because each epoch is a closed plan, each epoch is certified whole under the pinning guarantee unchanged. The thing that materializes step-by-step across epochs is governed not by certifying a tree that does not exist but by pinning the planner’s version across epoch boundaries and admitting each step as it runs.

The fully general case, in which authoring is unbounded and no closed sub-plan ever exists, would require reinterpreting the certification guarantee itself, from certifying a plan to certifying the planner’s policy version and the contract envelope it may emit within, with per-step admission as the live gate.

That reinterpretation has a named precedent in the canonical-authorization control plane (Faramesh, 2026), and WunderOS defers rather than adopts it. The bounded-epoch discipline does not need it. The general case is taken up only when a task needs authoring finer than an epoch, and at that point the envelope’s shape is the thing to settle.

The two paths of the division certify differently, and this is where the distinction earns its keep. An epoch the engine decomposed is certifiable against its mined methods: the method library is the audit object, and a decomposition is checkable against it. An epoch authored on the gap-path has no method to check against, so it is exactly the case that leans on per-step admission and, in the unbounded limit, on the deferred envelope. A deployment that needs strong certification therefore wants method coverage, and the gap-path is the bootstrap that earns that coverage over time.

5. What this does not give

The model front-end is not built. What is built is the executor, its write-ahead log, the deterministic-replay fingerprint of PLRN-002, and the rollback substrate that section 4.1 reuses; the HTN engine is specified separately and is deterministic by construction. The claims here are architectural commitments about how the model front-end will sit on that substrate, not measurements of a running system. The note is a design, and the design is not implemented yet.

The construction does not make the model deterministic. It makes the recorded plan deterministic. Two production runs of the same task will author different plans, because the model is free to author differently, and both runs are individually reproducible from their own logs. The guarantee is reproducibility of a recorded run, not equality across runs.

The construction does not certify an unbounded authored plan ahead of its execution. It certifies each closed epoch ahead of that epoch. A tenant that requires the entire course of action approved before anything runs is served by an epoch of size equal to the whole task, which is plan-and-execute, at plan-and-execute’s cost. The discipline spans that case and the incremental case under one mechanism, and the choice of epoch size is where the trade between foresight and flexibility is made.

The numbers cited from the 2026 literature are recent and not yet corroborated by a body of replication. They are offered here as directional evidence that independent groups are converging on the same shapes, namely event-sourced recording, transactional frontiers, and canonical authorization, and not as measurements to be relied upon.

The authoring-discipline tradeoff draws on the ReAct lineage of interleaved reasoning and acting and on the plan-and-execute lineage that emits a plan before acting; the contemporary statements used here are Architecting Resilient LLM Agents (Rosario et al., 2025, arxiv:2509.08646) and Do Agents Need to Plan Step-by-Step? (Otani et al., 2026, arxiv:2605.08477).

The decision to keep planning in a deterministic engine and confine the model to parsing rests on the evidence that language models plan poorly on their own. PlanBench documents the gap (Valmeekam et al., 2022, arxiv:2206.10498), and the LLM-Modulo framing draws the consequence: pair the model with a sound external reasoner, and let the model generate and translate while the reasoner verifies and plans (Kambhampati et al., 2024, arxiv:2402.01817). WunderOS’s HTN engine is that reasoner; the model is the translator.

The recorded-execution construction has four points of contemporary precedent. Event-sourced agents record each model decision as an envelope in an append-only store and replay from it (ESAA, Filho, 2026, arxiv:2602.23193). Transactional tool use wraps each call in an epoch over an advancing per-resource frontier and warns that the frontier must be persisted against a crash (Atomix, Mohammadi et al., 2026, arxiv:2602.14849). A canonical-authorization control plane canonicalizes and hashes a proposed action, decides it by a pure function, and records it in a hash-chained ledger that replays without the model (Faramesh, Fatmi, 2026, arxiv:2601.17744). Execution-lineage graphs identify each step by a hash of its inputs and upstream identities and reproduce a recorded plan exactly (From Agent Loops to Deterministic Graphs, Rosen and Rosen, 2026, arxiv:2605.06365).

The durable-execution framing, in which a long-running program is made recoverable by recording the complete history of its external calls and replaying against that history, is the same one named in PLRN-002’s discussion of Temporal and Cadence. The contribution here is to apply it to the model’s grounding rather than to the act of executing, recording only the nondeterministic parse and re-deriving the deterministic plan, and to bound it by epochs so that the recorded plan is also a certifiable plan. The substrate it sits on is the deterministic-replay composition of PLRN-002 and the agent harness of PLRN-000.

A note on method

Written in conversation with Claude Opus 4.8 (Anthropic) as structured interlocutor and prose editor. The research backstop was assembled with Paper Lantern. The ideas, claims, framing, and architectural commitments are mine.

Kendall Clark · k@pentad.ai
—Great Falls, Virginia
June 2026