The Saga Tree: Compensation for Agent Execution
Abstract
An agent that does part of a job and then fails later has to undo the part it did. The standard name for the undo is a compensation, and the standard discipline for running compensations in order is a saga. Two facts about enterprise agents make the standard discipline awkward to place. The first is that the agent is arbitrary customer code, and we cannot require it to carry correct rollback for every action it takes. The second is that the agent does not stay resident: it hibernates when idle and is torn down when it disconnects, while the things it must undo run on deadlines of hours to weeks.
WunderOS resolves both by moving the saga off the agent and into the substrate, and by splitting it into two layers with different lifetimes. A per-agent coordinator, living inside the agent’s paired shadow agent, formulates each forward action together with its compensating inverse, because that formulation needs the agent’s intent and exists only while the agent is live. A per-tenant executor, living on the dispatcher bus and independent of any agent’s liveness, holds the compensations, watches their triggers, and fires them. The compensations are held, not released, until the whole plan commits, because a later step’s failure must still be able to run an earlier step’s undo, and they unwind last-in-first-out on failure. They are durable on emission, watched as standing leases, and recorded as facts that reference the actions they undo, so the ledger of what happened is never edited, only added to.
Bottom-line: the customer writes no rollback logic. The interim guarantee is best-effort compensation, decoupling rather than linearizability, with provable rollback deferred to the deterministic clock the substrate is building toward.
1. The problem
A plan changes the world. It writes records, calls tools, sends messages, moves money. When a plan fails partway, the writes already made are still made, and something has to reverse them in the right order. The classical answer is the saga: each forward action is paired with a compensating action that undoes it, and on failure the compensations run in reverse (Garcia-Molina and Salem, Sagas, 1987). The pattern is old and sound. The question for an agent operating system is not whether to use it but where to put it.
Two properties of enterprise agents decide the placement. The first is that the agent is not ours. It is customer code, and it is arbitrary from our perspective. We can offer it a saga discipline, but we cannot mandate that it write a correct inverse for every side effect it has, and a saga whose compensations are written by the party most likely to get them wrong is not a guarantee worth offering. The second is that the agent is transient. Its paired shadow agent hibernates after an idle threshold and is removed from the registry when the agent disconnects or crashes, which is the right lifecycle for a per-agent process. But a compensation timer in this setting runs for hours, days, or weeks, and a stuck plan can wait far longer. An undo that must fire three weeks after the action cannot be held by a process that is torn down three minutes after the agent goes quiet, because if the agent never returns, nothing reactivates it, and the undo never runs.
The question before us is where a saga lives when the party that knows the inverse cannot be trusted to run it and the process that formulates it does not survive long enough to fire it. The answer is two layers, and the rest of this note is that construction.
2. Two layers with two lifetimes
The saga is split where its two jobs have different requirements. Formulating a forward action and its inverse is a per-agent act. It needs the agent’s cognitive context and intent, the answer to what this action is for and what undoing it would mean, and that context exists only while the agent is live. Holding the inverse and firing it on a trigger is a durable act. It must outlive the agent, because the trigger may not arrive for weeks.
So the coordinator is a role the shadow agent plays. It formulates each forward and compensating pair, dispatches the forward plan to the executor of record, and emits a compensation registration. Its lifetime is the agent’s lifetime, and that is correct, because nothing it does needs to outlive the agent. The executor is a per-tenant actor on the dispatcher bus. It holds the live registrations, watches their triggers, and fires the compensations. Its lifetime is the tenant’s, independent of any one agent, and that is correct, because what it holds must survive any one agent’s disconnection.
This is the resolution of the lifecycle problem of section 1 stated as a structure. The thing that knows the inverse hands it, once formulated and made durable, to the thing that will still be alive when the inverse is needed. Whether the coordinator is a separate process or inline logic in the shadow agent is an implementation choice and not an architectural one. The architecture is the split of lifetime, not the count of processes.
3. Held until the plan commits
A compensation is held until the whole plan commits. It is not released when its own forward step succeeds. This is the decision that makes the rollback correct, and it is worth stating against the tempting alternative.
The tempting alternative is to discard a step’s compensation as soon as that step succeeds, on the reasoning that a successful step needs no undo. The reasoning is wrong, because a later step can fail, and the recovery from that failure is to undo everything done so far, including the steps that individually succeeded. A saga that released compensations on per-step success would have nothing to run when step five fails and steps one through four must be reversed. So the held compensations form a tree that grows as the plan advances and unwinds last-in-first-out on failure: step five’s failure runs four, then three, then two, then one, in reverse of the order they were done. The German codename for the subsystem, Sagenbaum, the saga tree, names this shape. Only a terminal success, the plan reaching its committed end, retires the whole held set at once. Commit is a property of the plan, not of a step.
4. Durable on emission, reconstructable on restart
A compensation registration is durable the moment it is emitted, not when the executor happens to be holding it. The forward action’s completion record and its compensation registration are written under a single barrier, both or neither, which is the transactional-outbox discipline (Netherite, Burckhardt et al., 2022, arxiv:2103.00033). Recovery sees a consistent pair or sees nothing, never a forward action whose undo was lost.
The registrations live in their own log segment so that they replay independently of the rest of the plan record. The per-tenant executor holds no durable state of its own that is not derivable from that segment. On restart, or on first attaching to a tenant, it loads the outstanding registrations from the segment and re-arms their triggers. This is the same posture the rest of the substrate takes toward recovery, a live actor whose state is a fold over a write-ahead log, and it is what places the executor inside the deterministic replay ladder of PLRN-002 when the deterministic clock lands.
5. A trigger is a standing lease
A compensation does not fire when its step fails. It fires when a condition becomes true, and the condition is expressed as a query held open. The executor ships no trigger engine of its own. A held registration’s trigger is a standing lease, the shadow lease of PLRN-010, over the trigger’s temporal expression. The substrate already pushes changes from the write path rather than polling, and a lease is that push. Reusing it means the saga inherits the incremental cost and the replayable epoch clock of the standing-query layer rather than reinventing them.
The trigger reads three-valued: true, unknown, false. The executor fires on the transition to true, waits while unknown, and does not fire on false. A deadline, the hours-to-weeks timer of section 1, is a lease over a clock predicate that becomes true at expiry. There is no separate timer subsystem. A deadline three weeks out and a tool call fifty milliseconds out are the same machinery, a held condition that becomes true and wakes a consumer, differing only in how long the condition stays unknown.
A compliance condition is a trigger of the same kind. The supervisory layer that governs an agent’s actions, the taint and policy machinery of PLRN-008, returns a verdict on each action. A verdict that blocks or redirects an action that has already crossed the boundary is a trigger: the executor fires the registered compensation, and the customer’s code handles no policy rollback. Interception before the action crosses the boundary prevents the effect and needs no compensation. The saga handles the cases after the boundary, where prevention is no longer available and only undo remains.
6. The ledger is never edited
A compensation does not delete the thing it undoes. The substrate keeps an immutable forward log, append-only, the record of what was committed, and a separate working memory where compensations take effect. A compensation mutates the working memory and references, by address, the forward action in the log that it undoes. The log stays pristine. The working memory is always reconstructable from the log, which is the core invariant of the arrangement.
The consequence is that the saga never destroys history. A forward action and its compensation are both facts, the log is the single source of truth, and the working memory is a projection over it that can be rebuilt. This is the same posture as the append-only retraction of PLRN-007, where a fact is retired by a later fact that carries a back-reference rather than deleted in place, and it is what makes a compensation auditable in the sense of PLRN-006: an auditor can read that this mutation compensated that action, and verify the chain offline.
The reframe extends to memory itself. Evicting a fact under capacity pressure is a forward action, and re-ingesting it from the log is its compensation. Eviction becomes reversible. A tombstone is not garbage-collection metadata; it is the compensation marker for a deletion, and treating it as a saga event gives the memory lifecycle its replay semantics without separate machinery. The policy that chooses what to evict is choosing which forward actions to undo next, and exposing it as a saga queue makes the choice auditable like any other.
7. Only what crossed the boundary can be undone
A compensation can undo only what actually happened. An action whose external effect never crossed the process boundary needs no undo, and an action whose effect is irreversible cannot be given one. The substrate captures effects at the boundary, and that capture is the ground truth for what is compensable. An effect classified irreversible is registered as non-compensable rather than given a no-op inverse, and its failure routes to the stuck-plan path of PLRN-009 and to owner intervention, not to a compensation that cannot work. For a compensable effect, the boundary capture lets the executor confirm the effect was externalized before it fires the undo, so it does not compensate an action that never took.
Some effects have semantics the substrate’s language cannot express: an HTTP idempotency header, a payment nonce protocol, a provider’s own reversal rules. These live in the actuator’s contract, not in the saga. The honest claim is that the saga converges at the control surface, the decision to compensate and the order of compensation, and not at the effect surface, where a downstream service’s own guarantees take over.
8. What this does not give
The design is adopted and the reference implementation is landing in slices: the registration schema and per-capability compensation registry, the dispatcher protocol, the persistence segment, the trigger evaluation over standing leases, the dual-ledger framing, and the eviction reframe. The claims here are architectural commitments about how the pieces compose, with the parts that have landed checked as they land, not a measurement of a complete running saga.
The construction does not give deterministic rollback yet. The interim guarantee is best-effort compensation, eventually consistent in the probabilistic sense, and that is what an agent operating system needs first. What is bought is decoupling: the agent assumes success and keeps its real-time reflexes while the cold work of compensation happens off its path. It is not linearizability. When the deterministic clock and the simulation harness of PLRN-002 land, compensations become replayable and provable, and the reconstructable-from-log posture of section 4 is what makes that upgrade non-breaking rather than a rewrite.
The construction does not prevent compensation cycles. A trigger can fire a compensation whose effect satisfies another trigger, and a cycle of these is caught after the fact by the provenance machinery but not prevented before it. The standing-lease layer of PLRN-010 is the natural place for a runtime limit on how far a cascade may run, and that limit is an open item, named here rather than papered over.
The construction does not define triggers across tenants. A trigger correlates facts within one tenant, because the executor is per-tenant and a trigger that reached across tenants would breach the isolation the per-tenant placement exists to keep. Cross-tenant compensation is undefined and deferred.
9. Related work
The saga is the long-lived-transaction discipline of Garcia-Molina and Salem (Sagas, 1987), where a sequence of transactions each carries a compensating transaction and failure runs the compensations in reverse. The durability of the forward-and-compensation pair under a single barrier is the transactional-outbox line, stated for durable serverless workflows in Netherite (Burckhardt et al., 2022, arxiv:2103.00033) and for lineage-based recovery in write-ahead lineage (Yu et al., 2024, arxiv:2403.08062). The trigger language is metric temporal Datalog with negation under the approximation-fixpoint treatment (Tena Cucala et al., 2026, arxiv:2601.03841), evaluated incrementally as the standing lease of PLRN-010.
The placement of the saga off the agent and into a durable per-tenant holder is the contribution. It rests on the single side-effect executor and stuck-plan discipline of PLRN-009, the standing-query layer of PLRN-010 for triggers and deadlines, the append-only retraction of PLRN-007 and the provenance of PLRN-006 for the ledger posture, and the supervisory verdicts of PLRN-008 for compliance compensation. PLRN-009 remarked that agents love sagas, or will learn to. The point of putting the saga in the substrate is that the agent does not have to learn anything, because the undo is held and fired for it by a part of the system that is still there when the undo is due.
A note on method
Written in conversation with Claude Opus 4.8 (Anthropic) as structured interlocutor and prose editor. The research backstop was assembled with Paper Lantern. The ideas, claims, framing, and architectural commitments are mine.
Kendall Clark · k@pentad.ai
Great Falls, Virginia
June 2026