Inside the Engine: Agents, the PSUB Scheduler, and the Gymnasium Wrapper

Post 1 said what the engine is. Post 2 said why it forks real mainnet. This post says where — where to look when you want to add a new agent, wire a new protocol adapter, expose a new RL surface, or add a new validation gate.

The whole engine fits on one page. We'll walk it bottom-up, because the bottom is the part that earns the credibility everything else inherits.

The layer map

┌─────────────────────────────────────────────────────────────────┐
│  cli.py                Typer CLI surface                        │
│  (run / train / report / demo / validate / replay)              │
└──────┬──────────────────────────────┬───────────────────────────┘
       │                              │
       │                              ▼
       │                    ┌─────────────────────┐
       │                    │  rl/                │
       │                    │  Ray RLlib + Modal  │
       │                    └──────────┬──────────┘
       │                               │
       │       ┌─────────────────────────┐
       └──────▶│  scenarios/  YAML +     │◀┘
               │  builder + registry     │
               └────────────┬────────────┘
                            │
                            ▼
               ┌─────────────────────────┐
               │  gym_env/  Gym wrappers │
               └────────────┬────────────┘
                            │
                            ▼
               ┌─────────────────────────┐
               │  sim/  state +          │
               │  scheduler + runner     │
               │  (PSUB, DuckDB persist) │
               └────────────┬────────────┘
                            │
       ┌──────────────────┼──────────────────┐
       ▼                  ▼                  ▼
┌────────────┐   ┌────────────────┐   ┌──────────────────┐
│ agents/    │   │ protocols/     │   │ replay/          │
│ Agent ABC  │   │ Aave V3, Spark │   │ Quoter validator │
│ + refs     │   │ Compound, Curve│   │ + historical-tx  │
└────────────┘   └────────────────┘   └──────────────────┘
                          │
                          ▼
             ┌─────────────────────────┐
             │  evm/  Fork wrapper +    │
             │  SnapshotRegistry +      │
             │  ABI helpers             │
             └────────────┬────────────┘
                          │
                          ▼
                     pyrevm.EVM
                          │
                          ▼
                  Alchemy fork RPC

The arrows are calls, not data. Lower layers do not call upward. That single invariant is what keeps the architecture from rotting — the gym env can be swapped without touching the EVM kernel, the EVM kernel can grow without touching the agents.

The EVM kernel — `mayavi/evm/`

The bottom of the stack. Fork wraps pyrevm.EVM, adds a SnapshotRegistry for journaled rollback, and exposes the ABI helpers every higher layer uses. The frozen-fork snapshot path (Fork.dump_snapshot(path) / Fork.load_snapshot(path)) is the workhorse for RPC-free RL training: one warm rollout populates pyrevm's journal, we serialize the touched-account + slot set to a versioned JSON file, and subsequent PPO training loads that snapshot offline. Zero RPC during training, and v1↔v2 reward parity intact.

Touch this layer when you add a low-level operation, tighten fork-state safety, or regenerate the snapshot because the contract surface widened.

The protocol adapters — `mayavi/protocols/`

Thin Python wrappers that take a Fork and expose protocol-level helpers (supply, borrow, repay, liquidate, swap). Four adapters today: Aave V3, SparkLend (an Aave V3 fork — see Post 6 for why the adapter is a 30-line SparkAccount(AaveAccount)), Compound V3 / Comet, and Curve StableSwap. Each adapter encodes calldata, calls Fork.call for static reads and Fork.send for state mutations, and decodes return values. Never pokes _evm directly — that's the protocol-rule contract.

Touch this layer when you wire a new on-chain protocol into the engine.

Replay validation — `mayavi/replay/`

The Quoter bit-exact validator (tests/replay/test_eigen_incident.py) and the historical-tx replay scanner. The Quoter path was explained in Post 2. The historical scanner walks a block range and replays specific transactions; it's how the ENA vesting cliff replay works (see Post 12).

Touch this layer when you add a new validation gate or named-incident replay.

Agents — `mayavi/agents/`

The Agent ABC is small enough to fit in two screens:

class Agent(ABC):
    @abstractmethod
    def step(self, state: WorldState, step_idx: int) -> StepResult:
        ...

Reference subclasses today: BorrowerAgent, LiquidatorAgent, VestingRecipientAgent, OracleShockAgent, CurveBorrowerAgent, plus the per-protocol concrete classes (AaveAccount, SparkAccount, CompoundV3Account).

Adding a new behaviour is one file under agents/, a build_* factory under scenarios/, and (if you want to train it) a gymnasium.Env subclass under gym_env/.

The simulation layer — `mayavi/sim/`

Three components: WorldState, Scheduler, Runner.

WorldState is the per-run carrier object — it holds the Fork, the agent population, the MarketSnapshotHook, the per-step actions log, and the random.Random seeded from scenario.seed. The state object passes by reference; every layer above the EVM treats it as the canonical handle to "what is happening in this run."

Scheduler is the PSUB-style (Partial Sub-step) loop. Every step iterates over registered agents, calls each agent's step(), persists the resulting Action rows, runs hooks, and advances time. The scheduler has a halt_on_exception mode used by the determinism gate (Post 2) and a default swallow-and-record mode for production runs.

Runner is the entry point that gym envs and the CLI both call. It owns the DuckDB persistence layer: every run produces a row in runs, every step row in actions, every snapshot row in market_snapshots. The schema is versioned and forward-migratable — _RUNS_FORWARD_MIGRATABLE_COLUMNS adds new columns to existing local DBs via ALTER TABLE on connect, so a developer pulling a new branch doesn't need to nuke their data/runs/runs.duckdb.

MarketSnapshotHook lives here too. After every step, the hook reads on-chain state (TVL, utilization, pool prices) into a typed row. Reports render off these snapshots; KPI plots are aggregations of them.

Touch this layer when you change the scheduler semantics, the persistence schema, or per-step telemetry.

Scenarios — `mayavi/scenarios/`

The scenario layer is the only place YAML lives. A scenario file declares the chain, the fork block, the agent population, the duration, the hook configuration. A matching build_* Python factory expands the YAML into a populated WorldState. Both register through mayavi.scenarios.registry so the CLI's mayavi run <path> and the API's /runs endpoint can resolve scenarios by handle.

Today's scenario coverage: vesting_cliff, multi_cohort, depeg_cascade (across six chains), aave_borrower (across six chains), spark_borrower (mainnet), compound_v3_borrower (four chains), curve_depeg (mainnet), launch_replay, svm_launch_replay (parked).

Adding a new scenario is a YAML + a build_* function + one line in the registry. That's the deliberate scenario-driven scope-control: protocol library stays narrow, scenario library grows.

The Gym wrappers — `mayavi/gym_env/`

Four gymnasium.Env subclasses today: VestingEnv, AaveBorrowerEnv, AaveBorrowerShockEnv, CompoundV3BorrowerEnv, plus AaveLiquidatorEnv and a multi-agent variant. Each wraps a WorldState and exposes a Box / Dict observation space and an action space matched to that agent class's decision surface.

The step() body is structurally identical across envs:

def step(self, action):
    self._apply_action(action)
    self._scheduler.tick(self.state)         # advance one PSUB step
    obs = self._observe()
    reward = self._compute_reward()
    terminated, truncated = self._termination_signals()
    return obs, reward, terminated, truncated, self._info()

The reward functions are deliberately simple — the on-chain protocol provides the signal. A borrower's reward is the change in getUserAccountData().healthFactor plus a penalty for over-collateralization. A liquidator's reward is the post-bonus profit on liquidationCall. We do not invent reward functions that try to encode "good behaviour" — the protocol's economics are the reward.

gymnasium.utils.env_checker.check_env runs on every env's test file. The contract is part of the release gate.

Touch this layer when you expose a new RL surface.

The RL layer — `mayavi/rl/`

Ray RLlib is the canonical framework since Sprint 3-E1. Both backends — local GTX 1650 and Modal A10G — route through the same Ray code path; the choice is a runtime knob:

Variable	Default	Meaning
`MAYAVI_RL_BACKEND`	`local`	`local` → local GPU; `modal` → Modal A10G entrypoint
`MAYAVI_LOCAL_GPU_MEM_GB_CEILING`	`4.0`	Local backend aborts cleanly before the run if model + batch don't fit
`MAYAVI_MODAL_COST_CEIL_USD`	`10.0`	Pre-spawn cost guard (worst-case `timeout_hours × $0.60/hr` A10G)
`MAYAVI_RL_NUM_ENV_RUNNERS`	`min(cpu_count, 2)`	Parallel rollout subprocesses; cap is 2 to stay under free-tier Alchemy 429

The CLI --remote modal wins on conflict with the env var. PPO training writes a *.json eval file + a Ray RLlib checkpoint dir; both flow through mayavi artifact <run_id> into a channel-agnostic bundle.

Post 8 is the deep dive on the local-vs-Modal trade-off.

The reporting + API layer

mayavi/report/ holds Jinja2 templates + the HTML builder. The post-Phase-5 template is shape-aware: it inspects the run's recorded action types and renders different KPI tiles for a vesting run vs an Aave borrower vs a depeg cascade. The Plotly charts are inline JSON; the report HTML loads plotly from CDN and renders the chart from the embedded data without runtime fetches — so the bundle is self-contained and survives offline viewing.

mayavi/api/ is the FastAPI service. Bearer-auth on Authorization: Bearer <MAYAVI_API_KEY>, deployed to Modal at https://morellato26--mayavi-api-fastapi-app.modal.run. The Next.js dashboard (web/) proxies same-origin so the bearer token never reaches the client bundle. Routes: /healthz, POST /runs, GET /runs/{id}, GET /runs/{id}/report{,.pdf}, GET /scenarios. The deployment runbook is in docs/deployment.md.

A concrete worked example

What does a real run look like end-to-end? The Aave V3 mainnet borrower demo:

Bundle: aave-v3-mainnet-borrower-2026-05-14— 4 steps · mainnet @ block 19,000,000 · ~1.6 s wall-clock

Reading the path bottom-up:

Fork(rpc_url=ALCHEMY_RPC_URL, fork_block=19_000_000) opens a pyrevm EVM, cache-backed.
AaveV3 adapter calls Pool.supply(WETH, 1e18, borrower, 0) via Fork.send — real Solidity executes, real journal entries appear.
BorrowerAgent.step calls the adapter's borrow(USDC, 1700e6, ...); the agent observes the new health factor.
MarketSnapshotHook reads getUserAccountData() and the AaveOracle's USDC price, writes a row.
Runner persists the action + snapshot to DuckDB.
After 4 steps, mayavi artifact <run_id> produces the bundle you're embedding above.

Total clock time on a workstation: ~1.6 s once the fork cache is warm.

Where to look for common tasks

Task	Start at
Adding a new agent	`mayavi/agents/base.py` (the ABC), then a new file under `agents/`
Adding a new scenario	`mayavi/scenarios/` — copy `vesting_cliff.{py,yaml}`, register in `registry.py`
Adding a new RL env	`mayavi/gym_env/` — subclass an existing env or use `WorldState` directly
Adding a protocol	`mayavi/protocols/` — write an adapter that takes a `Fork`, do not poke `_evm` directly
New CLI command	`mayavi/cli.py` — add a `@app.command()`; lazy-import heavy deps
New report KPI	`mayavi/report/builder.py` (aggregation) + `templates/report.html.j2` (layout)
New fork-cache check	append a `StorageTriple` in `tests/evm/test_cache_integrity.py`
New named-incident replay	follow `tests/replay/test_eigen_incident.py`, add a section to `docs/validation.md`

Reading order for the rest of the series

You've now seen what the engine is (Post 1), why it forks (Post 2), and how it's laid out (this post). The next slice goes per-protocol — Aave V3 across six chains, Compound V3, SparkLend's thin-adapter pattern, and Curve StableSwap depeg modeling — followed by the RL findings, the incident replays, and the deployment.

If you only have time for one more post: Post 4 — Aave V3 Across Six Chains is the highest-density technical deep dive in the series.