Forked Ethereum mainnet at block 22,180,000,
24 simulation steps (4.8 minutes
of mainnet block time). Seed 42.
Recipients
1
Total swaps
23
WETH sold
191,666,666.6667
WETH realized
1,140.79
Avg effective price
0.00
WETH/WETH
Peak step
8,333,333.3333
WETH sold (step 20)
Wall-clock
145.5s
Sell pressure over time
Cumulative inventory liquidated, in WETH, by simulation step.
The slope reflects the aggregate selling intensity across all recipients.
Realized USDC per step
Per-step revenue summed across all recipients. Gaps indicate steps where
no recipient sold (small rounding amounts below the dust threshold).
Effective realized price
WETH received per WETH sold, per step.
A declining line indicates accumulating slippage from sustained selling.
Pool tick — ENA/WETH 30bps
Per-step Uniswap V3 tick for the tracked pool, captured by the
MarketSnapshotHook at the start of every scheduler step.
A monotone trend signals sustained price pressure; abrupt jumps
correspond to large fills landing in a single step.
Per-recipient breakdown
Agent
WETH sold
WETH realized
Effective price
Swaps
recipient-0Hand-coded heuristic
191,666,666.6667
1,140.79
0.00
23
Methodology
Execution: All token operations execute against actual Solidity bytecode
deployed at the pinned mainnet block via revm (Rust EVM). No Python re-implementation
of pool math — slippage and routing match what would have happened on-chain at that block.
Fork pinning: Mainnet state is fetched once at block
22,180,000. The fork captures token balances, pool reserves, oracle
states, and contract code as they were at that block.
Agent population: 1 heterogeneous recipients with urgency
parameters sampled via seeded RNG (seed=42). Same seed + same fork block produce
byte-identical results.
Pool: WETH/WETH 5.00% pool (Uniswap V3, fee tier 500bps).
What is NOT modeled: external arbitrageurs restoring price between
recipient sells; cross-pool routing through aggregators (1inch / CoW); sandwich attacks
from other MEV searchers; off-chain venues (CEX hedging); recipient behavior changing
in response to observed price impact.
Engine validation: Mayavi’s execution kernel has been validated
bit-exact against the on-chain Uniswap V3 Quoter at the pinned block.
Every swap our engine produces matches what the official Uniswap simulation contract
produces — zero delta. Reproduce locally with mayavi validate.
Reinforcement-learning agents
The agents in this run use the hand-coded / scripted baselines (see the
Agent column above). Mayavi’s agents are RL-trainable on the same
forked-mainnet stack — mayavi train --env aave|vesting|liquidator produces
a PPO policy, and VestingRecipient(policy_path=…) loads one into a scenario.
Trained-policy-vs-baseline evaluation results (each on a real forked mainnet, $0 marginal cost):
Aave V3 leveraged borrower — PPO captures most of the analytic optimum at $0 marginal cost (50K timesteps, local GPU). docs/artifacts/aave_ppo_v2_local_2026-05-07.json
Vesting-cliff recipient — Saturated regime: at 0.1 WETH inventory in the mainnet WETH/USDC 0.05% pool, dump_all / twap / PPO return $256.31656 +/- ~4e-6 (agreement to ~6 decimal places across all 3 strategies, std_reward=0). The pipeline is reproducible at $0; the policy distinction is only measurable at larger inventory / shallower pool / adversarial multi-agent (50K timesteps).. docs/artifacts/vesting_ppo_v1_local_2026-05-13.json
Aave V3 liquidator — PPO underperformed the scripted close-factor-max heuristic ($392.84 vs $624.42; PPO captured ~63%) on this single-liquidator-vs-one-borrower setup at 50K timesteps -- the scripted heuristic is essentially the analytic upper bound here (the only degree of freedom is close-factor timing), so this is the expected single-agent outcome and the deliverable is the reproducible pipeline at $0. Competing-liquidator timing (where RL beats the scripted policy) runs on the same pipeline.. docs/artifacts/aave_liquidator_ppo_v1_local_2026-05-13.json