Forked Ethereum mainnet at block 38,000,000,
4 simulation steps (0.8 minutes
of mainnet block time). Seed 42.
Horizon
4 steps
Actions recorded
0
Wall-clock
0.7s
No swap or liquidation actions executed
This run produced no successful swaps, no liquidations, and no failed swap
attempts either. The scenario may have completed without firing any of the
report’s currently-known action types — check the
actions table in data/runs/runs.duckdb for what
this run actually recorded.
Methodology
Execution: All token operations execute against actual Solidity bytecode
deployed at the pinned mainnet block via revm (Rust EVM). No Python re-implementation
of pool math — slippage and routing match what would have happened on-chain at that block.
Fork pinning: Mainnet state is fetched once at block
38,000,000. The fork captures token balances, pool reserves, oracle
states, and contract code as they were at that block.
Agent population: 0 heterogeneous recipients with urgency
parameters sampled via seeded RNG (seed=42). Same seed + same fork block produce
byte-identical results.
Pool: WETH/USDC 5.00% pool (Uniswap V3, fee tier 500bps).
What is NOT modeled: external arbitrageurs restoring price between
recipient sells; cross-pool routing through aggregators (1inch / CoW); sandwich attacks
from other MEV searchers; off-chain venues (CEX hedging); recipient behavior changing
in response to observed price impact.
Engine validation: Mayavi’s execution kernel has been validated
bit-exact against the on-chain Uniswap V3 Quoter at the pinned block.
Every swap our engine produces matches what the official Uniswap simulation contract
produces — zero delta. Reproduce locally with mayavi validate.
Reinforcement-learning agents
The agents in this run use the hand-coded / scripted baselines (see the
Agent column above). Mayavi’s agents are RL-trainable on the same
forked-mainnet stack — mayavi train --env aave|vesting|liquidator produces
a PPO policy, and VestingRecipient(policy_path=…) loads one into a scenario.
Trained-policy-vs-baseline evaluation results (each on a real forked mainnet, $0 marginal cost):
Aave V3 leveraged borrower — PPO captures most of the analytic optimum at $0 marginal cost (50K timesteps, local GPU). docs/artifacts/aave_ppo_v2_local_2026-05-07.json
Vesting-cliff recipient — Saturated regime: at 0.1 WETH inventory in the mainnet WETH/USDC 0.05% pool, dump_all / twap / PPO return $256.31656 +/- ~4e-6 (agreement to ~6 decimal places across all 3 strategies, std_reward=0). The pipeline is reproducible at $0; the policy distinction is only measurable at larger inventory / shallower pool / adversarial multi-agent (50K timesteps).. docs/artifacts/vesting_ppo_v1_local_2026-05-13.json
Aave V3 liquidator — PPO underperformed the scripted close-factor-max heuristic ($392.84 vs $624.42; PPO captured ~63%) on this single-liquidator-vs-one-borrower setup at 50K timesteps -- the scripted heuristic is essentially the analytic upper bound here (the only degree of freedom is close-factor timing), so this is the expected single-agent outcome and the deliverable is the reproducible pipeline at $0. Competing-liquidator timing (where RL beats the scripted policy) runs on the same pipeline.. docs/artifacts/aave_liquidator_ppo_v1_local_2026-05-13.json