PSP:	000005
Title:	SRBN Implementation Continuation: Multi-File Coding UX and Repo-Native Verification
Author:	Vikrant Rathore (@vikrantrathore)
Status:	Active
Type:	Enhancement
Created:	2026-03-23
Discussion-To:	https://github.com/eonseed/perspt/discussions/97
Replaces:	000004
Resolution:	Implemented in v0.5.6

Abstract

This PSP replaces PSP 000004 as the implementation specification for Perspt’s SRBN-based coding workflow. It defines the missing runtime semantics needed for Perspt to create complete multi-file projects, verify repository state with language-native tools, and deliver a review-and-approval experience comparable to contemporary coding CLIs while preserving SRBN stability guarantees.

Motivation

PSP 000004 established the conceptual direction for Perspt’s SRBN-based coding workflow, but it left critical runtime semantics underspecified for implementation. The current implementation remains structurally biased toward single-file generation, Python-centric verification, and partial UI/TUI wiring. As a result, users asking for complete projects or refactors often receive a single file, incomplete verification, or an approval experience that does not provide the confidence expected from a coding agent.

Currently, users cannot reliably request a complete multi-file application and expect Perspt to plan, generate, verify, and review it as a coherent project. The runtime frequently collapses into single-file execution, verification is not consistently language-aware, and the review loop does not surface structured diffs, project-wide validation, or trustworthy execution boundaries. This affects users attempting long-horizon software engineering tasks, especially in Rust and polyglot repositories, where file-level checks are insufficient and repository-level correctness is required.

This PSP addresses that gap by replacing PSP 000004 with an implementation contract for the next SRBN phase: project-first planning, graph-derived context supply, multi-artifact node execution, independent verifier semantics, repo-native verification, stronger safety controls, and a coding UX designed around inspectable diffs and stable convergence.

Proposed Changes

Functional Specification

Perspt SHALL treat project-scale coding as the default SRBN workflow and SHALL stop degrading multi-file tasks into single-file execution unless the user explicitly requests a single file.

Behavioral Changes:

Agent runs SHALL default to project mode for any task that implies a repository, application, package, library, service, refactor, or testable feature set.
Solo Mode SHALL only activate for explicit single-file intents such as single file, snippet, or standalone script or through an explicit CLI flag.
The Architect SHALL produce a structured task graph whose nodes can create or modify multiple files when needed.
The Actuator SHALL emit a pure JSON artifact bundle containing one or more file operations instead of a single File: or Diff: block.
The Orchestrator SHALL apply all artifact operations for a node as a unit, then verify repository state before committing the node.
Verification SHALL be repository-native and selected through the language plugin registry. For Rust workspaces this includes Rust LSP plus cargo check and cargo test by default. Successful compilation and test execution SHALL be required in the default interactive mode, while warnings MAY remain unless a stricter verifier preset is selected. Optional stricter checks such as cargo clippy MAY be enabled explicitly.
Language plugins SHALL drive runtime verification and execution behavior, not only project initialization.
The Verifier SHALL become a first-class stage in the loop. It SHALL compute energy from deterministic tool and contract signals rather than relying on the Actuator to self-assess.
Approval SHALL be performed against a structured diff view that can show all affected files within a node.
Stable nodes SHALL be committed to the ledger with per-node state, energy, and artifact metadata, enabling trustworthy status and resume behavior.
Node execution SHALL use bounded, graph-derived context assembled from parent intent, dependency summaries, target file state, and verifier summaries rather than ad hoc repository-wide prompt stuffing.
The SRBN runtime SHALL remain provider-neutral and support tiered model selection across Gemini, GPT, and Claude class models without changing the execution semantics.
Once the user approves the session policy, Perspt SHALL be able to operate autonomously within the current project or working folder. Any action that reads from, writes to, or executes against locations outside that folder SHALL require explicit user approval at the time of the action.

Execution Flow:

Detect repository language(s) and initialize the relevant plugin, verifier commands, and LSP clients.
Plan the task as a project graph instead of a single file unless single-file intent is explicit.
Execute nodes in topological order.
For each node, generate a multi-artifact bundle containing file creations, file edits, and optionally approved commands.
Apply the bundle transactionally inside the workspace.
Run verifier stages to compute node energy from syntax, structure, logic, execution, and sheaf consistency signals.
If energy remains above threshold, generate a correction prompt grounded in verifier outputs and retry within policy limits.
If stable (V(x) ≤ ε), show reviewable diffs, accept or reject the node, then commit the stable node to the ledger with NodeOutcome::Completed.
If retries are exhausted (V(x) > ε after max retries), transition the node to NodeOutcome::Escalated and track the escalation count.
After child nodes converge, run sheaf validation for cross-node consistency before finalizing parent completion.
Derive SessionOutcome from completed/escalated counts: Success if all completed, PartialSuccess if any escalated, Failed if none completed.

digraph psp5_state_machine {
rankdir=LR;
bgcolor="transparent";
node [shape=box, style="rounded,filled", fontname="Arial", fontsize=10, margin="0.18,0.1"];
edge [fontname="Arial", fontsize=9, color="#6D4C41", fontcolor="#6D4C41"];

queued [label="Queued", fillcolor="#ECEFF1", color="#607D8B"];
planning [label="Planning\ncontract + targets", fillcolor="#EDE7F6", color="#5E35B1"];
generating [label="Generating\nartifact bundle", fillcolor="#E1F5FE", color="#039BE5"];
validating [label="Validating\nparse + policy", fillcolor="#FFF3E0", color="#FB8C00"];
reviewing [label="Reviewing\ndiff + approval", fillcolor="#E8F5E9", color="#43A047"];
verifying [label="Verifying\nLSP + build + tests", fillcolor="#FCE4EC", color="#D81B60"];
stable [label="Stable", fillcolor="#C8E6C9", color="#2E7D32"];
committed [label="Committed\nledger updated", fillcolor="#DCEDC8", color="#558B2F"];
rejected [label="Rejected", fillcolor="#FFEBEE", color="#E53935"];
retrying [label="Retrying\ngrounded correction", fillcolor="#FFF8E1", color="#F9A825"];
escalated [label="Escalated\nmanual intervention", fillcolor="#FBE9E7", color="#D84315"];

queued -> planning;
planning -> generating;
generating -> validating;
validating -> rejected [label="invalid bundle"];
validating -> reviewing [label="valid bundle"];
reviewing -> rejected [label="user reject"];
reviewing -> verifying [label="user approve"];
verifying -> stable [label="energy < epsilon"];
verifying -> retrying [label="energy >= epsilon"];
retrying -> generating [label="retry budget available", style=dashed];
retrying -> escalated [label="budget exhausted"];
stable -> committed;
rejected -> generating [label="revise node", style=dashed];
} — PSP 5 Node State Machine

digraph psp5_runtime {
rankdir=LR;
bgcolor="transparent";
node [shape=box, style="rounded,filled", fontname="Arial", fontsize=10];
edge [fontname="Arial", fontsize=9, color="#546E7A", fontcolor="#546E7A"];

user [label="User\nTask Request", fillcolor="#E3F2FD", color="#1E88E5"];
cli [label="CLI / TUI\nReview Surface", fillcolor="#E8F5E9", color="#43A047"];
orch [label="SRBN Orchestrator", fillcolor="#FFF3E0", color="#FB8C00"];
plan [label="Architect\nTask Graph", fillcolor="#F3E5F5", color="#8E24AA"];
act [label="Actuator\nArtifact Bundle", fillcolor="#E1F5FE", color="#039BE5"];
verify [label="Verifier\nEnergy Computation", fillcolor="#FCE4EC", color="#D81B60"];
plugins [label="Plugin Registry\nLanguage + Toolchain", fillcolor="#F1F8E9", color="#7CB342"];
repo [label="Workspace / Repo", fillcolor="#ECEFF1", color="#607D8B"];
ledger [label="Merkle Ledger\nSession State", fillcolor="#E8F5E9", color="#2E7D32"];

user -> cli [label="invoke task"];
cli -> orch [label="start / approve / reject"];
orch -> plan [label="sheafify"];
plan -> orch [label="task DAG"];
orch -> act [label="execute node"];
act -> repo [label="apply artifacts"];
orch -> plugins [label="select active plugin(s)"];
plugins -> verify [label="LSP + build + test commands"];
repo -> verify [label="current project state"];
verify -> orch [label="V_syn, V_str, V_log, V_boot, V_sheaf"];
orch -> cli [label="diffs + status + requests"];
orch -> ledger [label="commit stable node"];
ledger -> cli [label="status / resume context"];
} — PSP 5 Runtime Architecture

Error Conditions:

If plan parsing fails, Perspt SHALL use a deterministic project fallback graph rather than collapsing directly to a single unstructured root node.
If a node emits malformed artifact data, the Orchestrator SHALL reject the bundle, increase energy, and request correction.
If project verification tools are unavailable, the session SHALL surface a sensor degradation warning and SHALL not claim verified stability.
If a command violates policy or sandbox constraints, the node SHALL be rejected before execution.
If retries or budget limits are exceeded, the node SHALL escalate with stored state and actionable diagnostics. The node transitions to Escalated state; it is NOT converted to Failed immediately. The orchestrator accumulates escalation counts and derives the final SessionOutcome at the end of the session.
If multiple language plugins are active in the same workspace, the verifier SHALL select the relevant plugin set for the current node rather than collapsing to a single global language assumption.

UI/UX Design

This PSP introduces a coding workflow optimized for confidence, inspectability, and incremental control rather than opaque autonomous execution.

User Goals:

Request a full project or refactor and receive a coherent multi-file result.
Review exactly what changed before approving execution.
Understand whether a node is blocked by syntax, tests, structure, sheaf mismatch, or policy restrictions.
Resume interrupted sessions with trustworthy project state.

Interaction Flow:

The default perspt agent experience SHALL begin with repository inspection and planning rather than immediate code emission.
Before applying a node, the user SHALL be able to review a grouped diff spanning all files touched by that node.
The review experience SHALL support approve, reject, edit externally, and request correction actions.
The dashboard SHALL show current node, total nodes, completed nodes, energy value, verifier stage, and latest failure class.
The task tree SHALL show verifying, failed, escalated, and completed states, not only pending and running.
The logs view SHALL correlate LLM calls, verifier outputs, and tool executions to the current node.
Headless mode SHALL print concise, structured progress with the current node, verification summary, and next action.

Project Development Workflow:

The intended end-user development workflow for a project SHALL be:

The user runs perspt agent "<project task>" in an existing repository or an empty project directory.
Perspt detects active language plugin(s), inspects the repository, and produces a task graph.
The user reviews the top-level plan, including project files, subgraphs, and expected verifier stages.
Perspt executes the first node and generates a multi-file artifact bundle.
The user reviews a grouped diff for the node before approval.
Perspt applies the bundle, runs language-native verification, and reports energy with component breakdown.
If unstable, Perspt iterates with correction prompts grounded in verifier output.
If stable, Perspt commits the node and advances to the next node.
The user may pause, resume, reject, or edit externally at any approval boundary.
Once all nodes converge and pass sheaf validation, the session completes with a resumable ledger state and a final summary.

This workflow applies equally to greenfield projects, feature additions, and refactors, with the key difference being the initial repository inspection and plugin selection.

digraph psp5_user_flow {
rankdir=TB;
bgcolor="transparent";
node [shape=box, style="rounded,filled", fontname="Arial", fontsize=10, margin="0.18,0.1"];
edge [fontname="Arial", fontsize=9, color="#5C6BC0", fontcolor="#5C6BC0"];

start [label="Start\nperspt agent <task>", fillcolor="#E3F2FD", color="#1E88E5"];
inspect [label="Inspect Repo\nSelect Plugin(s)", fillcolor="#E8F5E9", color="#43A047"];
plan [label="Show Task Graph\nFiles, Contracts, Tests", fillcolor="#F3E5F5", color="#8E24AA"];
gen [label="Generate Node\nArtifact Bundle", fillcolor="#FFF3E0", color="#FB8C00"];
review [label="Review Multi-File Diff\nApprove / Reject / Edit", fillcolor="#E1F5FE", color="#039BE5"];
verify [label="Verify Repo State\nLSP + Build + Tests", fillcolor="#FCE4EC", color="#D81B60"];
stable [label="Stable?", shape=diamond, fillcolor="#FFF9C4", color="#F9A825"];
commit [label="Commit Node\nLedger + Progress", fillcolor="#E8F5E9", color="#2E7D32"];
next [label="Next Node", fillcolor="#ECEFF1", color="#607D8B"];
done [label="Complete Project\nSummary + Resume State", fillcolor="#C8E6C9", color="#2E7D32"];
retry [label="Correction Loop\nGrounded Feedback", fillcolor="#FFEBEE", color="#E53935"];

start -> inspect;
inspect -> plan;
plan -> gen;
gen -> review;
review -> verify [label="approve"];
review -> gen [label="reject / edit request", style=dashed];
verify -> stable;
stable -> commit [label="yes"];
stable -> retry [label="no"];
retry -> gen [style=dashed];
commit -> next;
next -> gen [label="remaining nodes"];
next -> done [label="all nodes complete"];
} — User Development Workflow for a Project

Visual Design:

Diff review SHALL display multi-file unified diffs by default, with side-by-side mode optional.
Approval prompts SHALL include stability metrics, affected files, and verification summaries.
Energy feedback SHALL distinguish syntax, structure, logic, execution, and sheaf components so the user can understand why convergence failed.
Status surfaces SHALL avoid color-only signaling and include text labels for stability and failure classes.

Accessibility Considerations:

All review actions SHALL remain keyboard-only and discoverable through on-screen hints.
Diff approval SHALL not rely solely on color; additions, removals, and warnings SHALL be labeled textually.
Energy and status indicators SHALL include plain-language labels in addition to graphs or sparklines.
Headless output SHALL remain useful for screen readers and remote terminals.

Technical Specification

Architecture:

This PSP refines PSP 000004 into a discrete execution contract for project-scale coding work.

The following invariants govern the runtime semantics described below:

Structural truth SHALL come from machine-verifiable artifacts such as signatures, schemas, symbol inventories, generated interface files, and content hashes.
Node scope SHALL be bounded by ownership closure rather than by prompt convenience alone.
Ledger commit SHALL remain the only hard stability barrier; provisional or speculative work SHALL NOT be treated as committed state.
Non-convergence SHALL trigger local topology repair, degraded-validation stop, or explicit user escalation rather than unstructured retry loops alone.

1. Project-First Runtime Selection

Replace heuristic Solo Mode activation with explicit single-file detection.
Introduce a project-default execution path for Feature and Enhancement tasks.
Add a deterministic fallback planner that produces a minimal project graph when structured planning fails.

2. Multi-Artifact Node Protocol

Replace single-file response parsing with a structured pure JSON artifact format.
A node response MAY support multiple file operations, but only when those operations remain inside one ownership-bounded change unit.
The Orchestrator SHALL parse all artifact operations before mutating the workspace.
Artifact application SHALL fail atomically if any operation is invalid.

Ownership-Bounded Node Rules

A node MAY modify multiple files only when those files belong to a single ownership closure.
A node SHALL NOT span multiple ownership closures unless it is an explicit integration node.
The planner SHALL reject or decompose a node whose artifact bundle crosses ownership domains without an explicit integration boundary.
The planner SHALL enforce bounded fanout for each node, including limits on touched files, changed external interfaces, and ownership domains.
Multi-file bundles SHALL remain attributable to one verifier scope so that failures can be localized to the responsible node set.

Node Classes

The runtime SHALL distinguish three node classes:

Interface nodes: define exported signatures, schemas, ownership manifests, verifier scope, and interface seals used by dependent nodes.
Implementation nodes: operate only on node-owned files plus adjacent sealed interfaces.
Integration nodes: reconcile cross-owner or cross-plugin boundaries after child convergence and SHALL be the primary mechanism for cross-domain coordination.

This distinction preserves practical multi-file work while preventing a single node from collapsing into an unbounded mini-monolith.

Required artifact schema:

{
  "artifacts": [
    {
      "path": "src/main.rs",
      "operation": "write",
      "content": "..."
    },
    {
      "path": "src/lib.rs",
      "operation": "diff",
      "patch": "--- ..."
    }
  ],
  "commands": []
}

digraph artifact_lifecycle {
rankdir=LR;
bgcolor="transparent";
node [shape=box, style="rounded,filled", fontname="Arial", fontsize=10];
edge [fontname="Arial", fontsize=9, color="#607D8B", fontcolor="#607D8B"];

node_goal [label="Node Goal + Contract", fillcolor="#EDE7F6", color="#5E35B1"];
bundle [label="Ownership-Bounded\nArtifact Bundle", fillcolor="#E1F5FE", color="#039BE5"];
parse [label="Parse + Validate\nall operations", fillcolor="#FFF3E0", color="#FB8C00"];
apply [label="Transactional Apply\nworkspace-bounded", fillcolor="#E8F5E9", color="#43A047"];
verify [label="Repo Verification\nplugin-selected", fillcolor="#FCE4EC", color="#D81B60"];
decision [label="Energy < epsilon?", shape=diamond, fillcolor="#FFF9C4", color="#F9A825"];
commit [label="Commit Stable Node", fillcolor="#C8E6C9", color="#2E7D32"];
reject [label="Reject Bundle\nRetry / Escalate", fillcolor="#FFEBEE", color="#E53935"];

node_goal -> bundle;
bundle -> parse;
parse -> apply [label="valid"];
parse -> reject [label="invalid"];
apply -> verify;
verify -> decision;
decision -> commit [label="yes"];
decision -> reject [label="no"];
reject -> bundle [style=dashed, label="correction"];
} — Multi-Artifact Node Lifecycle

3. Graph-Derived Context Supply and Restriction Maps

The task graph SHALL define not only execution order but also the authoritative context boundary for each node. This is required for SRBN to remain stable on long-running, large-project work.

Each node SHALL execute against a stratified context package composed from the graph, the ledger, and the active plugin capabilities.

Structural Context (Authoritative)

Structural context SHALL provide the machine-verifiable artifacts required for compile-critical and type-critical work. It includes:

exact exported signatures, trait or protocol definitions, API schemas, and generated interface files
ownership manifests and node-to-file bindings
dependency commit hashes, structural digests, and interface seals
symbol inventories, migration shapes, or equivalent plugin-defined structural representations

Semantic Context (Advisory)

Semantic context MAY provide:
- parent intent summary and parent contract rationale
- dependency summaries and verifier summaries
- design notes, policy explanations, and resume-oriented context
Local State

Local state SHALL include:
- current contents of files owned by the node
- adjacent sealed interfaces relied upon by the node
- active repository invariants and policy constraints
The runtime SHALL treat this package as the node’s restriction map from parent scope to child scope. A node SHALL not receive the full repository by default, and a node SHALL NOT rely on semantic summaries alone for structural correctness.

Context Supply Rules
- Context selection SHALL be bounded by explicit byte and file-count budgets.
- When budgets are exceeded, the runtime SHALL prefer structural digests over raw external source bodies and SHALL prefer raw contents for node-owned files.
- Semantic summaries MAY replace explanatory or resume-oriented context, but SHALL NOT replace compile-critical or type-critical structure.
- Context supplied to a node SHALL be reproducible from persisted state. Resume semantics SHALL reconstruct the same context package from the ledger and repository state.
- A node SHALL record the hashes of the summaries, contracts, and dependency commits that were used to derive its prompt context.
- If required structural context cannot be reconstructed, trusted, or matched to the current repository state, the node SHALL not proceed silently. The runtime SHALL rehydrate structural artifacts, downgrade verification confidence explicitly, or escalate.
Long-Horizon Project Requirements
- The runtime SHALL support project execution where total project size exceeds the per-node prompt budget by relying on graph partitioning, committed structural digests, and bounded local state.
- Parent nodes SHALL consume child outputs through sealed interfaces, structural digests, and verifier results rather than re-reading all child source files.
- Cross-language nodes SHALL receive plugin-selected context relevant to the files and interfaces they own.
- The ledger SHALL act as the durable context substrate for long-running sessions, preventing state drift caused by mutable conversational history.
4. Repo-Native Verifier
Add a verifier pipeline selected by the active language plugin.
Replace the current init-oriented plugin usage with a capability-based runtime plugin contract that governs repository detection, ownership matching, LSP startup, verifier commands, structural validators, and execution defaults.
Each plugin SHALL define:
- detection rules and file ownership matching rules
- interface extraction strategy and structural digest generation
- LSP configuration and fallback ordering
- syntax/type command
- optional build or compile command
- test command
- optional lint command
- optional structural checks
- optional execution or build checks
- required host tools and availability probes
- degraded-sensor behavior when required tools are unavailable
- file patterns or scopes that identify node ownership inside mixed-language repositories
- approval and policy requirements for plugin-recommended commands or scripts

Plugin Contract Evolution

The language plugin interface SHALL evolve beyond the current init_command, test_command, and run_command model. The runtime SHALL support capabilities equivalent to:

project detection for existing repositories
new-project initialization for greenfield tasks
ownership matching and node scoping
structural interface extraction and digesting
LSP selection and fallback behavior
syntax/type verification command selection
build verification command selection
test verification command selection
execution command selection when runtime checks are appropriate
structural verification hooks for language-specific API checks
host-tool availability checks and degraded-validation handling
ownership or matching rules for files and nodes in multi-language workspaces

The plugin registry SHALL support more than one active plugin in a workspace. A polyglot repository SHALL be allowed to use multiple plugins simultaneously, with the Orchestrator selecting the relevant verifier stack per node.

If a plugin’s preferred verifier tool is unavailable on the host OS, the runtime SHALL enter a degraded-validation state, select a lower-confidence fallback if the plugin defines one, or escalate. It SHALL NOT claim verified stability solely because a sensor failed to start.

For Rust, the default verifier stack SHALL be:
- rust-analyzer diagnostics
- cargo check
- cargo test
- optional cargo clippy -- -D warnings in stricter modes
For Python, the default verifier stack SHALL be:
- ty or pyright diagnostics
- pytest
- optional execution checks when a runnable entry point exists
For JavaScript or TypeScript, the default verifier stack SHALL be:
- TypeScript or JavaScript LSP diagnostics
- the plugin-defined test command
- optional build command when the repository defines one

Multi-Language Repository Semantics

The registry SHALL expose all matching plugins for a repository, not only the first detected plugin.
Each task node SHALL be associated with a primary plugin based on its context files, output targets, or repository path.
Cross-language nodes MAY invoke more than one verifier stack when the change spans language boundaries.
Integration nodes SHOULD be the default mechanism for cross-plugin boundaries so that verification and escalation remain attributable.
Repository-wide status SHALL report active plugins and degraded sensors per plugin.

Tool Surfaces and OS Utility Governance

The runtime SHALL distinguish among four tool classes:

built-in content tools for file reads, writes, diffs, and searches
repository utilities such as checked-in scripts or make targets
language toolchain commands such as cargo, uv, pytest, npm, or equivalent plugin-defined tooling
temporary scripts generated for narrowly scoped automation when built-in tools or repository utilities are insufficient

The runtime SHOULD prefer structured built-in content tools for ordinary file manipulation.

The runtime MAY invoke OS utilities, repository-local scripts, language-native toolchain commands, or temporary generated scripts when the active plugin or node contract requires them. Such invocations SHALL be governed by the following rules:

mutating commands, networked commands, shell composition, and temporary scripts SHALL pass through canonicalization, sanitization, workspace/path policy, and sandbox selection
mutating or risky commands SHALL require explicit user approval before execution
free-form shell pipelines SHALL NOT be the primary artifact-editing model
command provenance SHALL be recorded in ledger-backed state alongside the node that requested the action
once the user approves the session policy, commands operating strictly within the current project or working folder MAY run autonomously when permitted by that policy and by the active plugin capability profile
commands that read from, write to, or execute against paths outside the current project or working folder SHALL always require explicit user approval for that action

This preserves practical access to host capabilities without turning unrestricted shell execution into the default editing semantics.

Model and Provider Compatibility

This PSP SHALL support operation with high-capability models from multiple providers, including Gemini 3.1 Pro class models, GPT 5.4 class models, and Claude Opus 4.6 class models, without requiring provider-specific workflow semantics.

The runtime SHALL define model behavior in terms of capability classes rather than vendor identity:

Architect tier: strong planning, decomposition, and structured-output reliability.
Actuator tier: strong code generation and patch generation reliability.
Verifier tier: strong instruction following, summarization, and deterministic evaluation formatting.
Speculator tier: lower-cost fast lookahead or branch exploration.

Compatibility Requirements

The runtime SHALL accept explicit per-tier model configuration independent of provider.
The PSP SHALL not depend on hidden chain-of-thought, provider-specific reasoning tokens, or proprietary tool-call protocols in order to remain functional.
Structured outputs SHALL be defined by Perspt-owned response contracts and parsers, not by assuming one provider’s native JSON mode is always available.
Planner, actuator, and verifier outputs SHALL tolerate common provider variation such as markdown fences, explanatory preambles, and minor formatting noise when extracting structured content.
Streaming behavior SHALL not be required for correctness. Non-streaming responses SHALL remain valid for all stages.
The system SHALL support retry, backoff, and model fallback policies when a configured model returns malformed structured output or transient provider errors.
The execution protocol SHALL remain text-first and portable across providers even when one provider offers stronger native tool or JSON features than another.

Provider-Neutral Output Contracts

Planning output SHALL be parseable from plain text responses containing a JSON object, fenced JSON, or a normalized extracted body.
Artifact bundle output SHALL use a Perspt-defined pure JSON schema. Markdown with embedded blocks MAY be tolerated only as a normalization fallback for backward compatibility and SHALL NOT be the normative format.
Verification summaries SHALL use a stable Perspt schema for status, violations, evidence, and suggested corrections.
The runtime SHALL normalize provider responses before parsing so that equivalent Gemini, GPT, and Claude outputs enter the same execution path.

Model Selection and Fallback Rules

Users SHALL be able to set one model for all tiers or specify different models per tier.
The runtime MAY use the same model for Architect, Actuator, Verifier, and Speculator tiers when the chosen model is sufficiently capable.
The runtime SHOULD allow lower-cost models for the Speculator tier and higher-reliability models for the Architect or Verifier tiers.
If a model repeatedly fails the structured-output contract for a stage, the runtime SHALL escalate, retry with stricter normalization, or fall back to a configured alternate model for that tier.
A provider outage or rate limit SHALL not corrupt the ledger or cause the system to claim stable convergence.

Sheaf Validators for Large Projects

The verifier SHALL include explicit sheaf validators for project-scale and polyglot repositories. These validators define how cross-node consistency is checked after child nodes converge and before parent nodes are committed.

Required validator classes are:

Export and import consistency: exported symbols, trait implementations, type definitions, and module imports SHALL match the interfaces promised by dependency nodes.
Dependency graph consistency: repository dependency edges SHALL remain acyclic where required, and node-local changes SHALL not introduce invalid module or package references.
Schema and contract compatibility: JSON schemas, API request and response types, configuration formats, migration shapes, and serialization contracts SHALL remain compatible across producer and consumer nodes.
Build graph consistency: plugin-selected build targets SHALL remain satisfiable for the affected subgraph, not only for the single edited file.
Test ownership consistency: failing tests SHALL be attributed to the owning node or interface boundary so the correction loop can requeue the correct node set.
Cross-language boundary consistency: FFI layers, generated clients, protocol bindings, and API contracts crossing plugin boundaries SHALL be validated with the relevant plugin stack.
Policy and invariant consistency: repository-wide invariants and forbidden patterns inherited from parent scopes SHALL still hold after child convergence.

Validator Execution Rules

Sheaf validation SHALL run after child nodes converge and before the parent node is considered stable.
Validators SHALL operate on committed summaries and current repository state, not only on prompt text.
A sheaf validation failure SHALL identify the violated boundary, the affected node or node set, and the evidence used by the validator.
Failures SHALL increase V_sheaf and requeue only the affected node set where possible instead of restarting the whole project graph.
When a validator cannot produce trustworthy evidence because sensors are degraded, the runtime SHALL surface a degraded-validation state and escalate rather than claim stability.

Sheaf Validation Outputs

Each sheaf validation pass SHALL emit:

validated boundary or boundaries
validator class and plugin source
pass or fail status
evidence summary and affected files or interfaces
resulting V_sheaf contribution
recommended requeue targets when validation fails

5. Discrete Adaptive Speculation Pipeline

To hide verifier latency without weakening the stability barrier, the runtime SHALL implement discrete speculation through provisional branches rather than through implicit asynchronous optimism.

Provisional Branches: Speculative child work SHALL be stored separately from committed ledger state.
Interface Seal Prerequisite: Downstream speculation SHALL be allowed only when the parent node’s public interface is sealed and hashed.
Sandboxed Verification: Background verifier stages SHALL execute against provisional branches, cloned workspaces, or plugin-defined sandbox boundaries.
Blocked Dependents: Children that depend on unstable parent implementation details SHALL remain blocked until the relevant interface is sealed.
Flush-on-Failure: If parent verification fails, the runtime SHALL flush or prune dependent speculative branches and replay only the surviving branch state.
No False Commit: Provisional work SHALL NOT be merged into the global ledger until the parent node meets the stability threshold.

The current concept of a lower-cost Speculator tier MAY assist with branch exploration, but it does not by itself satisfy this pipeline contract unless the provisional branch and flush semantics are implemented.

6. SRBN Energy Model Implementation

The runtime SHALL implement the practical energy model:

V(x) = αV_syn + βV_str + γV_log + V_boot + V_sheaf

With the following concrete meanings:

V_syn: LSP and compiler diagnostics
V_str: contract violations including interface mismatches, missing symbols, forbidden patterns, and invariant failures
V_log: targeted test failures weighted by criticality
V_boot: command or runtime execution failures when part of the contract
V_sheaf: cross-node consistency failures after child convergence

The Verifier SHALL own energy computation. The Actuator SHALL not be considered the source of truth for correctness.

7. Escalation and Local Graph Rewrite

Retry exhaustion SHALL be treated only as a trigger, not as the escalation semantics themselves.

The runtime SHALL classify non-convergence into one or more of the following categories:

implementation error
contract mismatch
insufficient model capability
degraded sensors or tool unavailability
topology mismatch

When escalation is required, the runtime SHALL choose one or more ordered repair actions:

correction retry with grounded verifier evidence
contract repair or interface-node refinement
capability-tier promotion
sensor recovery or degraded-validation stop
node split by ownership closure
insertion of an interface node
local subgraph re-planning
explicit user escalation with stored evidence

Escalation outputs SHALL identify the violated boundary, the affected node set, the evidence used to classify the failure, and the recommended rewrite or stop action.

8. Safety, Policy, and Persistence

All command proposals SHALL pass through canonicalization, sanitization, policy review, and sandbox selection before execution.
Path resolution SHALL be workspace-bounded by default.
Stable node completion SHALL record node state, artifact metadata, energy values, retry counts, and Merkle hash material in the ledger.
Provisional branch execution SHALL record branch lineage, interface seals, and flush decisions separately from committed node state.
Status and resume commands SHALL derive their view from ledger-backed state, not assumptions about in-memory progress.

TUI and Headless Workflow Requirements:

The implementation SHALL support both interactive TUI sessions and headless CLI runs with equivalent semantic stages.

In TUI mode, the user SHALL see:
- repository detection and active plugin list
- task graph and current node
- grouped diff review for node changes
- verifier output with energy breakdown
- explicit approval actions and retry/escalation state
In headless mode, Perspt SHALL print concise stage-oriented output such as:
- PLAN: detected plugins, top-level nodes, expected files
- NODE: current node ID and goal
- DIFF: files changed in the proposed bundle
- VERIFY: summarized results from LSP, build, and tests
- ENERGY: component breakdown and threshold comparison
- COMMIT: node committed and ledger updated
- OUTCOME: session outcome (Success, PartialSuccess, or Failed)
- CONTEXT: summaries, contracts, and dependency hashes used for the current node

The user experience SHALL remain consistent across both surfaces, with the TUI offering richer inspection rather than different semantics.

Configuration:

This PSP permits implementation of the following configuration capabilities:

explicit single-file mode
explicit single-file CLI flag
verifier strictness presets
per-language verifier command overrides
per-language LSP overrides and fallback ordering
workspace path policy mode
optional auto-approval of read-only commands only
per-tier model selection and optional per-tier fallback model
structured-output normalization strictness

Performance:

Multi-file artifact parsing SHALL be linear in the number of operations.
Verification SHALL prefer incremental LSP updates, structural digest reuse, and targeted repo commands rather than full ad hoc scans where possible.
Context assembly SHALL prefer structural digests for external dependencies, raw contents for node-owned files, and bounded semantic summaries for rationale and resume.
Output normalization and parsing SHALL be linear in response size and independent of provider-specific SDK features.
Strict modes MAY reduce speculative concurrency and trade latency for stronger verifier confidence.

Implementation Phases:

All phases described below have been implemented. See the Reference Implementation section for per-phase implementation details.

Phase 1: Project-first execution model, deterministic planner fallback, and capability-based plugin contracts
Phase 2: Ownership manifests, node classes, and bounded multi-artifact validation
Phase 3: Stratified restriction maps, structural digests, and context provenance
Phase 4: Provider-neutral output contracts, repo-native verifier attribution, and degraded-sensor handling
Phase 5: Escalation semantics, local graph rewrite, and sheaf validator targeting
Phase 6: Provisional branch ledger, interface-sealed speculation, and branch flush behavior
Phase 6a: Workspace classification, tool prerequisites, and model fallback
Phase 7: Review UX parity and provenance-rich status surfaces
Phase 8: Ledger-backed node commits, sheaf validation, and resume correctness
Phase 9: Universal verification pipeline and bundle command execution
Phase 10: uv-first Python developer experience

Acceptance Criteria

A request for a new application with modules and tests results in a multi-file plan and a multi-file applied project by default.
Rust repositories use Rust-native verification rather than Python-only checks.
In default interactive Rust mode, successful compilation and test execution are required, while warnings MAY remain unless a stricter verifier preset is selected.
Language plugins are used for runtime verification, not only initialization and tooling sync.
Polyglot repositories can activate more than one plugin, and node verification selects the relevant plugin stack.
A single node can create or modify multiple files in one iteration only when those files remain inside one ownership closure or explicit integration boundary.
Each node executes with a reproducible, bounded context package derived from task-graph restrictions, structural digests, and committed state.
Resume reconstructs node context from ledger-backed structural artifacts, summaries, and repository state rather than relying on mutable conversational history.
A node is rejected or decomposed when it crosses ownership domains without being an integration node.
A Rust node cannot proceed when only prose exists for a required public trait, schema, or equivalent structural dependency.
Plugin-selected verification is chosen per node from declared capabilities and available host tools.
Missing build, LSP, or test tools produce degraded validation or escalation rather than false stable status.
Speculative child branches are flushed when parent verification fails.
A non-convergent node results in local graph rewrite, degraded-validation stop, or explicit user escalation rather than retry exhaustion alone.
The user can review all node changes in a diff surface before approving.
Status and resume reflect actual persisted node state.
Policy and sandbox enforcement are applied to every command execution path.
Shell utilities, repository scripts, and temporary scripts require approval and policy checks before mutating repository state.
Once the user approves the session policy, Perspt can operate autonomously within the current project or working folder.
Any operation outside the current project or working folder requires explicit user approval at the time of that operation.
The same PSP 5 execution flow works with a Gemini 3.1 Pro class model, a GPT 5.4 class model, or a Claude Opus 4.6 class model when configured for the relevant tiers.
Planner and artifact parsing remain correct when the model returns fenced JSON or plain text with minor wrapper text.
A malformed structured response triggers normalization or retry behavior rather than silent plan corruption.

Rationale

The primary goal of this PSP is to turn SRBN from an architectural promise into an operational coding workflow. PSP 000004 defined the conceptual topology, but its implementation scope remained too broad and too aspirational to guarantee correct execution semantics. This continuation proposal narrows the problem into concrete runtime contracts that can be implemented, tested, and reviewed incrementally.

Design Decision Rationale:

Multi-file project creation must be a first-class runtime behavior because software engineering tasks rarely map to a single file.
Multi-file execution must be bounded by ownership closure and interface locality so that verifier attribution remains practical.
Verification must be repository-native because file-local checks cannot provide trustworthy stability for real codebases.
The plugin layer must own runtime language behavior because initialization-only plugins do not provide enough structure for SRBN verification.
Structured artifact bundles are preferred over free-form code blocks because they allow explicit parsing, transactional apply, and better review UX.
Structural context must be separated from semantic summaries so that compile-critical correctness does not depend on lossy natural-language descriptions.
Real repositories require governed access to OS utilities and language-native tools, but such access must remain policy-checked and approval-gated.
A separate verifier stage is required to align with SRBN’s barrier model and to avoid treating the generator as its own correctness oracle.
Ledger-backed state is required if status, resume, and SRBN commit semantics are to be reliable.

Alternatives Considered:

Alternative 1: Keep PSP 000004 broad and fix issues opportunistically in code. This was not chosen because the current gaps are architectural and need a clearer implementation contract.
Alternative 2: Replace SRBN with a simpler Codex-style imperative loop. This was not chosen because it would abandon the stability, contract, and ledger goals that distinguish Perspt.
Alternative 3: Keep single-file node semantics and rely on more planning granularity. This was not chosen because many real tasks require coordinated edits across multiple files within one logical unit of work.
Alternative 4: Pass large repository slices directly into each node prompt. This was not chosen because it would recreate the mutable-context failure mode that SRBN is meant to avoid.
Alternative 5: Enforce one-file-per-node as a universal rule. This was not chosen because it is too rigid for atomic multi-file changes and does not by itself guarantee verifier locality.
Alternative 6: Make shell utilities and temporary scripts the default editing model. This was not chosen because it weakens policy control, provenance, and structured review for ordinary artifact generation.
Alternative 7: Require manual approval for every in-project command even after session policy approval. This was not chosen because it would prevent practical autonomous project development inside a trusted working folder.

UI/UX Design Rationale:

Codex-like and Gemini-like user confidence comes from transparency: diffs, verification output, and explicit approvals. Perspt should adopt those strengths while retaining SRBN’s formal stability model.
Grouping file changes by node preserves the SRBN mental model while still making review understandable for users.
Keyboard-first review and labeled status output fit Perspt’s terminal-first design philosophy.

Backwards Compatibility

User Impact:

Existing perspt agent invocations will continue to work, but many requests that currently degrade to single-file execution will instead run as project-oriented SRBN sessions.
Users will gain more structured review surfaces and more accurate status information.
Some previously auto-executed commands may now require explicit approval or policy compliance.

Configuration Impact:

Existing CLI flags remain valid.
Explicit single-file mode and verifier strictness SHALL be exposed as additive CLI capabilities.
No immediate migration of user configuration files is required.

Migration Strategy:

Project-first behavior is the default now that multi-artifact execution is implemented.
Solo Mode is retained behind explicit intent detection or an explicit CLI flag.
Session resume operates under the new ledger-backed semantics introduced in Phase 8.

Reference Implementation

Implementation Notes:

The implementation work primarily affects:

crates/perspt-agent/src/orchestrator.rs
crates/perspt-agent/src/agent.rs
crates/perspt-agent/src/context_retriever.rs
crates/perspt-agent/src/tools.rs
crates/perspt-core/src/llm_provider.rs
crates/perspt-agent/src/lsp.rs
crates/perspt-agent/src/test_runner.rs
crates/perspt-core/src/plugin.rs
crates/perspt-core/src/events.rs
crates/perspt-core/src/types.rs
crates/perspt-tui/src/agent_app.rs
crates/perspt-tui/src/diff_viewer.rs
crates/perspt-tui/src/review_modal.rs
crates/perspt-agent/src/ledger.rs
crates/perspt-store/src/store.rs

Branch:

Initial drafting branch: psp5

Testing Strategy:

parser tests for multi-artifact bundles
orchestration tests for multi-file node execution
plugin-driven verifier tests for Rust and Python repositories
TUI event tests for diff and approval flows
resume and ledger persistence tests for node-level session recovery
cross-provider contract tests covering Gemini, GPT, and Claude style responses for plan parsing, artifact extraction, and verifier summaries

Example Headless CLI Transcript:

$ perspt agent "build a Rust CLI todo app with tests and JSON storage"
PLAN    plugins=rust  nodes=5  repo_mode=project
PLAN    node[1]=scaffold crate + cli entrypoints
PLAN    node[2]=domain model + JSON store
PLAN    node[3]=command handlers + tests
PLAN    node[4]=integration cleanup + docs
PLAN    node[5]=final sheaf validation

NODE    id=1 goal="scaffold crate + cli entrypoints"
DIFF    create Cargo.toml, src/main.rs, src/lib.rs, tests/cli_smoke.rs
VERIFY  rust-analyzer=0 diagnostics  cargo check=pass  cargo test=pass
ENERGY  syn=0 str=0 log=0 boot=0 sheaf=0 total=0.00 threshold=0.10
COMMIT  node=1 merkle=7e31f1e8 ledger=updated

NODE    id=2 goal="domain model + JSON store"
DIFF    modify src/lib.rs, create src/store.rs, create src/model.rs
VERIFY  rust-analyzer=1 diagnostic  cargo check=fail  cargo test=not-run
ENERGY  syn=0.35 str=0.20 log=0 boot=0 sheaf=0 total=0.55 threshold=0.10
RETRY   reason="missing Serialize derive for TodoRecord"

NODE    id=2 retry=1
DIFF    modify src/model.rs, src/store.rs
VERIFY  rust-analyzer=0 diagnostics  cargo check=pass  cargo test=pass
ENERGY  syn=0 str=0 log=0 boot=0 sheaf=0 total=0.00 threshold=0.10
COMMIT  node=2 merkle=44d0ad2b ledger=updated

SUMMARY completed=5/5 escalated=0 outcome=Success active_plugins=rust

Example TUI Review Transcript:

Perspt Agent
Workspace: ./todo-cli      Plugins: rust      Session: 01HV7...

Task Graph
> [2/5] domain model + JSON store
  [3/5] command handlers + tests
  [4/5] integration cleanup + docs

Proposed Node Bundle
Files: src/model.rs, src/store.rs, src/lib.rs
Commands: cargo check, cargo test

Diff Summary
+ struct TodoRecord { id, title, done }
+ impl JsonStore::load / save
~ pub mod store; pub mod model;

Verification
rust-analyzer: 0 diagnostics
cargo check: pass
cargo test: 12 passed
Energy: syn=0 str=0 log=0 boot=0 sheaf=0 total=0.00

Actions: [a]pprove  [r]eject  [e]dit externally  [v]iew full diff  [q]uit
> a

Commit Result
Node 2 committed. Ledger hash: 44d0ad2b. Next node unlocked: command handlers + tests.

File-by-File Implementation Appendix:

Phase 1: Project-first execution model and deterministic planner fallback (implemented)

crates/perspt-agent/src/orchestrator.rs: replace Solo Mode heuristics with explicit single-file intent detection, add deterministic fallback graph generation, and route default execution through project planning.
crates/perspt-agent/src/agent.rs: revise planner and actuator prompts to describe node contracts, expected outputs, and multi-file scope.
crates/perspt-core/src/events.rs: add plan-ready, node-selected, and fallback-planner events needed by CLI and TUI status surfaces.
crates/perspt-cli/src/commands/agent.rs: print structured plan output and expose project-mode defaults in headless sessions.
crates/perspt-agent/src/context_retriever.rs: add graph-aware context assembly primitives, ownership manifests, and structural digest retrieval instead of raw file-list expansion.
crates/perspt-core/src/llm_provider.rs and crates/perspt-cli/src/commands/agent.rs: support explicit per-tier model selection and fallback configuration independent of provider.

Phase 2: Ownership manifests, node classes, and bounded multi-artifact validation (implemented)

crates/perspt-agent/src/agent.rs: replace single-target output instructions with structured bundle instructions covering multiple file operations.
crates/perspt-agent/src/orchestrator.rs: classify node types, validate ownership closure before mutation, and apply bounded bundles atomically.
crates/perspt-agent/src/tools.rs: enforce workspace-bounded path resolution and transactional file operation helpers.
crates/perspt-core/src/types.rs: define artifact bundle, ownership-manifest, interface-seal, and node-class types shared across orchestration, storage, and UI.

Phase 3: Stratified restriction maps, structural digests, and context provenance (implemented)

crates/perspt-agent/src/context_retriever.rs: build reproducible node context packages from structural artifacts, semantic summaries, target files, and verifier state.
crates/perspt-agent/src/orchestrator.rs: persist and replay context package hashes, enforce context budgets, and block execution when required structural artifacts are missing.
crates/perspt-core/src/types.rs: define context-package, structural-digest, summary-digest, and restriction-map types.
crates/perspt-agent/src/ledger.rs and crates/perspt-store/src/store.rs: persist structural hashes, semantic summary hashes, and context provenance needed for resume.

Phase 4: Provider-neutral output contracts and normalization (implemented)

crates/perspt-core/src/normalize.rs (new): provider-neutral JSON extraction from LLM responses with fenced-block, direct-JSON, and embedded-JSON strategies. Provider-family classification (ProviderFamily) and NormalizedOutput / ExtractionMethod types.
crates/perspt-agent/src/orchestrator.rs: planner and artifact parsing route through extract_json / extract_and_deserialize before legacy File:/Diff: fallback. Per-node plugin selection via owner_plugin, lsp_key_for_file for plugin-routed diagnostics, and degraded-validation blocking in step_converge.
crates/perspt-core/src/plugin.rs: VerifierStage, VerifierCapability (with effective_command primary/fallback), LspCapability, LspConfig, and VerifierProfile. RustPlugin, PythonPlugin, JsPlugin implement granular verifier and LSP declarations. PluginRegistry::detect_all probes host binaries at startup.
crates/perspt-agent/src/test_runner.rs: TestRunnerTrait with run_lint / run_stage, PluginVerifierRunner (generic command executor gated by policy sanitisation and workspace-bound checks), and test_runner_for_profile factory.
crates/perspt-agent/src/lsp.rs: LspClient::from_config(LspConfig) constructor, start_with_config, and language_id field for multi-plugin LSP lifecycle.
crates/perspt-core/src/types.rs: SensorStatus (Available / Fallback / Unavailable), StageOutcome, and VerificationResult::has_degraded_stages / degraded_stage_reasons for runtime degradation tracking.
crates/perspt-core/src/events.rs: SensorFallback and DegradedVerification event variants.
crates/perspt-agent/src/tools.rs and crates/perspt-policy/src/sanitize.rs: validate_workspace_bound for path-escape detection; run_command and exec_command enforce sanitisation before execution.
crates/perspt-cli/src/commands/agent.rs: startup detects active plugins via PluginRegistry::detect_all, reports verifier stages and LSP status, and uses detected plugins for start_lsp_for_plugins.
Test coverage: normalization, plugin capabilities, verifier runners, degraded-state propagation, policy rejection, and orchestrator routing.

Phase 5: Escalation semantics, local graph rewrite, and sheaf validator targeting (implemented)

crates/perspt-core/src/types.rs: EscalationCategory (5 variants), RewriteAction (9 variants with structured payloads), SheafValidatorClass (7 validator types), SheafValidationResult (passed/failed constructors with V_sheaf contribution), EscalationReport, RewriteRecord, TargetedRequeue, and StabilityMonitor::reset_for_replan for subgraph replans.
crates/perspt-store/src/schema.rs: escalation_reports, rewrite_records, and sheaf_validations tables with session-scoped indexes.
crates/perspt-store/src/store.rs: EscalationReportRecord, RewriteRecordRow, SheafValidationRow record types; six CRUD methods for persistence round-trips.
crates/perspt-agent/src/ledger.rs: record_escalation_report, record_rewrite, record_sheaf_validation, and get_escalation_reports facade methods.
crates/perspt-agent/src/orchestrator.rs: classify_non_convergence categorises failures (DegradedSensors → ContractMismatch → TopologyMismatch → InsufficientModelCapability → ImplementationError); choose_repair_action selects least-drastic fix; apply_repair_action handles local repairs (GroundedRetry, ContractRepair, CapabilityPromotion, SensorRecovery) and bounded graph rewrites (split_node, insert_interface_node, replan_subgraph); select_validators targets validators by node class, plugin ownership, and verification state; run_sheaf_validator dispatches seven validator classes with V_sheaf scoring and requeue targeting.
crates/perspt-core/src/events.rs: EscalationClassified, SheafValidationComplete, and GraphRewriteApplied event variants.
crates/perspt-tui/src/agent_app.rs: TUI dashboard handling for all three Phase 5 events.
Test coverage: type construction, serialization, graph rewrite correctness, validator targeting, and classification defaults.

Phase 6: Provisional branch ledger and interface-sealed speculation (implemented)

crates/perspt-core/src/types.rs: ProvisionalBranchState (Active/Sealed/Merged/Flushed), ProvisionalBranch, BranchLineage, InterfaceSealRecord, BranchFlushRecord, BlockedDependency; SRBNNode extended with provisional_branch_id and interface_seal_hash fields.
crates/perspt-core/src/events.rs: BranchCreated, InterfaceSealed, BranchFlushed, DependentUnblocked, and BranchMerged event variants.
crates/perspt-store/src/schema.rs: provisional_branches, branch_lineage, interface_seals, and branch_flushes tables with sequences and session-scoped indexes.
crates/perspt-store/src/store.rs: ProvisionalBranchRow, BranchLineageRow, InterfaceSealRow, BranchFlushRow record types; fifteen CRUD methods for persistence round-trips.
crates/perspt-agent/src/ledger.rs: record_provisional_branch, update_branch_state, get_provisional_branches, get_live_branches_for_parent, flush_branches_for_parent, record_branch_lineage, get_child_branches, record_interface_seal, get_interface_seals, has_interface_seals, record_branch_flush, and get_branch_flushes facade methods.
crates/perspt-agent/src/orchestrator.rs: maybe_create_provisional_branch creates session-scoped branches before speculation; merge_provisional_branch and flush_provisional_branch handle commit and failure paths; flush_descendant_branches and collect_descendants cascade flushes through the DAG; emit_interface_seals produces sealed digests after Interface-class node commits; check_seal_prerequisites gates execution on sealed parent interfaces; inject_sealed_interfaces adds sealed digests to restriction maps; unblock_dependents releases children when seals become available.
crates/perspt-agent/src/tools.rs: create_sandbox, cleanup_sandbox, cleanup_session_sandboxes, and copy_to_sandbox helpers for provisional verification in isolated workspaces.
crates/perspt-cli/src/commands/status.rs and crates/perspt-cli/src/commands/resume.rs: provisional branch counts, flush decision display, and branch state in resume output.
crates/perspt-tui/src/agent_app.rs: TUI dashboard handling for all five Phase 6 events.
Test coverage: unit tests covering branch lifecycle, seal prerequisites, descendant collection, flush cascades, unblock logic, and sandbox utilities.

Phase 6a: Workspace Classification, Tool Prerequisites, and Model Fallback (implemented)

crates/perspt-core/src/types.rs: WorkspaceState enum (ExistingProject { plugins }, Greenfield { inferred_lang }, Ambiguous) added to AgentContext. The orchestrator classifies workspace state once before any init or planning work.
crates/perspt-core/src/plugin.rs: required_binaries() trait method on LanguagePlugin — each plugin declares (binary, role, install_hint) tuples. RustPlugin requires cargo, rustc, rust-analyzer; PythonPlugin requires uv, python3, uvx; JsPlugin requires node, npm, typescript-language-server.
crates/perspt-agent/src/orchestrator.rs:
- classify_workspace(task) -> WorkspaceState: inspects PluginRegistry::detect_all() → ExistingProject, detect_language_from_task() → Greenfield, empty dir → Greenfield { None }, else → Ambiguous.
- check_tool_prerequisites(plugin) -> bool: probes common OS tools (grep, sed, awk) and language-specific binaries via host_binary_available(). Critical tools (init/build) block greenfield init; optional tools (LSP, linters) warn but allow degraded mode. Emits structured install instructions.
- step_init_project() rewritten with three WorkspaceState arms: ExistingProject runs tooling sync only, Greenfield creates isolated project dirs (child dir if non-empty working dir) then updates working_dir and tools, Ambiguous always creates a child project dir.
- redetect_plugins_after_init(): re-runs detect_all() after greenfield init and updates active_plugins and workspace_state.
- emit_plugin_readiness(): extracted into reusable helper for pre- and post-init plugin status emission.
- Plugin timing in run() fixed: plugins detected immediately only for ExistingProject; for Greenfield/Ambiguous, detection deferred until after step_init_project() completes.
- Model fallback: call_llm_with_tier_fallback() retries with the same primary model when no explicit fallback is configured, instead of returning a broken response.
- Per-tier fallback fields: verifier_fallback_model and speculator_fallback_model added alongside existing architect/actuator fallbacks.
- Artifact extraction: output.txt generic fallback replaced with language-specific filenames (index.js, index.ts, Cargo.toml, etc.); unrecognized code-block languages log a warning and skip instead of creating output.txt.
crates/perspt-cli/src/commands/agent.rs: CLI plugin detection made provisional (display-only hint); LSP start moved into orchestrator where workspace state is known; --verifier-fallback-model and --speculator-fallback-model CLI args added.
crates/perspt-cli/src/main.rs: new fallback CLI args wired through to new_with_models().
crates/perspt-agent/src/context_retriever.rs: get_project_summary() method added — inspects detected plugins, dependency manifests, entry points, and test locations to produce a structured project summary for architect prompts.
Test infrastructure: MerkleLedger::in_memory() used for all orchestrator tests via #[cfg(test)] conditional, eliminating DuckDB lock contention that caused false test failures.
Test coverage: workspace classification (empty dir, existing Rust/Python/JS, ambiguous, greenfield with language inference), tool prerequisite checking, required_binaries() declarations, and model fallback configuration.

Phase 7: Review UX parity with structured diffs, node status, and energy output (implemented)

crates/perspt-tui/src/agent_app.rs: wire runtime events into the dashboard, review modal, and node progression state.
crates/perspt-tui/src/diff_viewer.rs: render grouped multi-file node bundles with summary and detailed diff views.
crates/perspt-tui/src/review_modal.rs: support approve, reject, edit externally, and correction-request actions.
crates/perspt-tui/src/task_tree.rs: expose queued, verifying, failed, escalated, provisional, and committed states.
crates/perspt-cli/src/commands/status.rs and crates/perspt-cli/src/commands/logs.rs: mirror node states, verifier output, energy breakdown, branch provenance, and context provenance in headless mode.

Phase 8: Ledger-backed node commits, sheaf validation, and resume correctness (implemented)

crates/perspt-agent/src/ledger.rs: persist stable node metadata, Merkle material, retries, provisional lineage, and sheaf validation inputs.
crates/perspt-store/src/store.rs and crates/perspt-store/src/schema.rs: store node-level artifact bundles, verification results, branch state, and resume state.
crates/perspt-agent/src/orchestrator.rs: replace placeholder commit and sheaf-validation steps with persisted node commits and parent/child convergence checks.
crates/perspt-cli/src/commands/resume.rs: reconstruct sessions from stored node state instead of in-memory assumptions.

Phase 9: Universal verification pipeline and bundle command execution (implemented)

This phase implements the universal verification pipeline, wires bundle command execution into the SRBN runtime, and closes the gap between speculative generation and host-toolchain verification:

Bundle command execution — Artifact bundle commands (e.g. cargo add, pip install) parsed by step_speculate are executed via the approval-gated tool system.
Per-node-class verification stages — verification_stages_for_node() maps each NodeClass to the appropriate verification stages: Interface → SyntaxCheck; Implementation → SyntaxCheck + Build (+ Test when weighted tests are present); Integration → SyntaxCheck + Build + Test + Lint.
Short-circuit verification — run_plugin_verification() accepts an allowed_stages filter and a working_dir parameter. Syntax failure skips build/test/lint; build failure skips test/lint. Each stage maps to energy components: syntax fail → V_syn ≤ 5.0, build fail → V_syn ≤ 8.0, test fail → V_log, lint fail → V_str += 0.3.
Artifact bundle persistence — ledger.record_artifact_bundle() is called in both the commit path and the escalation path, ensuring the ledger records which commands were applied for each node.
Correction-loop feedback — Verification raw output is fed into context.last_test_output so that subsequent correction prompts include actual compiler and test errors. The correction prompt includes raw cargo check/cargo build stderr alongside LSP diagnostics.
Actuator prompt guidance — The actuator prompt includes guidance for intra-crate imports (crate:: in library modules vs. package name in tests/examples/main). Instruction #13 requires the LLM to populate the commands array with dependency install commands (cargo add, pip install, npm install) when code uses external packages not already in the project manifest.
Auto-dependency repair — After run_plugin_verification() detects a syntax/build failure, extract_missing_crates() parses compiler stderr for patterns such as “undeclared crate or module”, “can’t find crate for”, and “unresolved import”. When missing crates are found, auto_install_crate_deps() runs cargo add <crate> automatically and re-runs verification.
Correction command extraction — The correction loop parses Commands: sections from LLM correction responses and executes whitelisted dependency commands (cargo add, pip install, npm install, yarn add, pnpm add) before re-verification. The correction prompt output format includes a Commands: block.

crates/perspt-agent/src/orchestrator.rs: last_applied_bundle field, bundle command execution in step_speculate, verification_stages_for_node(), refactored run_plugin_verification() with stage filter and short-circuit, rewired step_verify to use universal verification, artifact bundle persistence in step_commit and escalation path, extract_missing_crates() and auto_install_crate_deps() for auto-dependency repair, extract_commands_from_correction() for correction command parsing, build output injection into correction prompts.
crates/perspt-agent/src/agent.rs: crate:: import guidance and dependency command instructions added to actuator coding prompt, multi-artifact bundle JSON example updated to show populated commands array.

Phase 10: uv-first Python developer experience (implemented)

This phase establishes uv as the default Python toolchain integration and ensures Python project generation produces complete, verifiable projects with correct package layout, dependency management, and test alignment:

Python auto-dependency repair — extract_missing_python_modules() parses ModuleNotFoundError and ImportError patterns from verification output, including nested No module named 'foo.bar' variants. python_import_to_package() maps common import-name mismatches (e.g. PIL → pillow, yaml → pyyaml, cv2 → opencv-python, sklearn → scikit-learn). auto_install_python_deps() runs uv add <package> for each missing module and follows with uv sync --dev to refresh the venv. On success, verification is re-run automatically.
Bundle command normalization — Before executing bundle commands for Python nodes, the orchestrator normalizes generic pip/pip3 install commands to their uv equivalents: pip install foo → uv add foo, python -m pip install foo → uv add foo, pip install -r requirements.txt → uv pip install -r requirements.txt. Non-Python commands and already-correct uv commands pass through unchanged.
Post-bundle uv sync — After executing bundle commands for Python nodes, uv sync --dev is run automatically to ensure newly-added dependencies are available in the venv before verification begins.
Python-specific actuator prompt guidance — Instruction #14 in the actuator prompt directs the LLM to: prefer src-layout (src/<package>/), keep all modules inside the declared package, use relative imports within the package, put tests in tests/ using real symbol names from generated code, and emit uv add (not pip install) for dependency commands.
Correction prompt improvements — The Commands: output format includes uv add examples. Fix direction for import errors recommends uv add <pkg> and relative imports. The correction command extractor allowlist includes uv add and uv pip install.
Improved Python environment setup — PythonTestRunner::setup_environment() ensures pytest is available as a dev dependency after uv sync --dev: if pytest is not found, it runs uv add --dev pytest before falling back to uv pip install pytest.
Dynamic run command — PythonPlugin::run_command_for_dir() inspects the project directory to return context-appropriate run commands instead of the hardcoded uv run python -m main. It checks src/<pkg>/ directories first, then [project.scripts] entries in pyproject.toml, defaulting to the generic command only when neither is found. The LanguagePlugin trait includes run_command_for_dir() with a default implementation delegating to run_command().

crates/perspt-agent/src/orchestrator.rs: extract_missing_python_modules() with stdlib filter, python_import_to_package() mapping table, auto_install_python_deps() with uv add + uv sync, normalize_command_to_uv() command rewriter, Python auto-repair wiring in step_verify, bundle command normalization in step_speculate, post-bundle uv sync --dev, extended command allowlist with uv add and uv pip install.
crates/perspt-agent/src/agent.rs: instruction #13 updated for uv add instead of pip install, instruction #14 added for Python src-layout, relative imports, test alignment, and uv command conventions.
crates/perspt-agent/src/test_runner.rs: PythonTestRunner::setup_environment() ensures pytest is declared as a dev dependency via uv add --dev pytest when not already available.
crates/perspt-core/src/plugin.rs: run_command_for_dir() added to LanguagePlugin trait with default implementation, PythonPlugin::run_command_for_dir() inspects src layout and [project.scripts].

Orchestration State Overhaul (fix/orchestration-state-overhaul):

crates/perspt-core/src/types.rs: SessionOutcome enum (Success, PartialSuccess, Failed). NodeState extended with from_display_str() (case-insensitive canonical parser with legacy aliases), is_success(), is_active(), and Display impl.
crates/perspt-agent/src/orchestrator/mod.rs: NodeOutcome enum (Completed, Escalated). execute_node() returns Result<NodeOutcome> instead of Result<()>. run_orchestration() tracks completed/escalated counts and derives SessionOutcome for the Complete event instead of unconditionally reporting success.
crates/perspt-agent/src/orchestrator/convergence.rs: call_llm_with_logging() always records token usage, latency, and estimated cost via record_llm_usage() regardless of --log-llm. The --log-llm flag now only controls verbose prompt/response text persistence.
crates/perspt-agent/src/orchestrator/convergence.rs: Sandbox file tree listing included in both actuator and correction prompts. ContextRetriever in step_speculate() uses the node’s sandbox directory.
crates/perspt-cli/src/commands/status.rs, agent.rs, resume.rs: All string-based state comparisons replaced with NodeState::from_display_str() and type-safe helper methods.
crates/perspt-store/src/store.rs, crates/perspt-agent/src/ledger.rs, crates/perspt-policy/src/lib.rs, crates/perspt-tui/src/chat_app.rs, crates/perspt-tui/src/agent_app.rs, crates/perspt-tui/src/task_tree.rs: Dead code elimination — 16 unused functions and redundant exports removed.

Documentation Updates Required:

README agent-mode description — updated
Perspt book SRBN architecture page — updated
Developer architecture guide — updated
CLI reference — updated
PSP-5 execution flow and headless output — updated

Open Issues

The default capability profile recommended for Gemini, GPT, and Claude class models by tier is deferred to a future PSP focused on model-tier policy and defaults.

Copyright

This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.