Forked Ethereum mainnet at block 19,000,000,
25 simulation steps (5.0 minutes
of mainnet block time). Seed 42.
Recipients
10
Total swaps
200
WETH sold
10.0000
USDC realized
25,630.49
Avg effective price
2,563.05
USDC/WETH
Peak step
0.7628
WETH sold (step 20)
Wall-clock
21.2s
Sell pressure over time
Cumulative inventory liquidated, in WETH, by simulation step.
The slope reflects the aggregate selling intensity across all recipients.
Realized USDC per step
Per-step revenue summed across all recipients. Gaps indicate steps where
no recipient sold (small rounding amounts below the dust threshold).
Effective realized price
USDC received per WETH sold, per step.
A declining line indicates accumulating slippage from sustained selling.
Pool tick — WETH/USDC 5bps
Per-step Uniswap V3 tick for the tracked pool, captured by the
MarketSnapshotHook at the start of every scheduler step.
A monotone trend signals sustained price pressure; abrupt jumps
correspond to large fills landing in a single step.
Per-recipient breakdown
Agent
WETH sold
USDC realized
Effective price
Swaps
recipient-0Hand-coded heuristic
1.0000
2,563.07
2,563.07
20
recipient-1Hand-coded heuristic
1.0000
2,563.02
2,563.02
20
recipient-2Hand-coded heuristic
1.0000
2,563.05
2,563.05
20
recipient-3Hand-coded heuristic
1.0000
2,563.04
2,563.04
20
recipient-4Hand-coded heuristic
1.0000
2,563.08
2,563.08
20
recipient-5Hand-coded heuristic
1.0000
2,563.07
2,563.07
20
recipient-6Hand-coded heuristic
1.0000
2,563.08
2,563.08
20
recipient-7Hand-coded heuristic
1.0000
2,563.02
2,563.02
20
recipient-8Hand-coded heuristic
1.0000
2,563.05
2,563.05
20
recipient-9Hand-coded heuristic
1.0000
2,563.01
2,563.01
20
Methodology
Execution: All token operations execute against actual Solidity bytecode
deployed at the pinned mainnet block via revm (Rust EVM). No Python re-implementation
of pool math — slippage and routing match what would have happened on-chain at that block.
Fork pinning: Mainnet state is fetched once at block
19,000,000. The fork captures token balances, pool reserves, oracle
states, and contract code as they were at that block.
Agent population: 10 heterogeneous recipients with urgency
parameters sampled via seeded RNG (seed=42). Same seed + same fork block produce
byte-identical results.
Pool: WETH/USDC 5.00% pool (Uniswap V3, fee tier 500bps).
What is NOT modeled: external arbitrageurs restoring price between
recipient sells; cross-pool routing through aggregators (1inch / CoW); sandwich attacks
from other MEV searchers; off-chain venues (CEX hedging); recipient behavior changing
in response to observed price impact.
Engine validation: Mayavi’s execution kernel has been validated
bit-exact against the on-chain Uniswap V3 Quoter at the pinned block.
Every swap our engine produces matches what the official Uniswap simulation contract
produces — zero delta. Reproduce locally with mayavi validate.
Reinforcement-learning agents
The agents in this run use the hand-coded / scripted baselines (see the
Agent column above). Mayavi’s agents are RL-trainable on the same
forked-mainnet stack — mayavi train --env aave|vesting|liquidator produces
a PPO policy, and VestingRecipient(policy_path=…) loads one into a scenario.
Trained-policy-vs-baseline evaluation results (each on a real forked mainnet, $0 marginal cost):
Aave V3 leveraged borrower — PPO captures most of the analytic optimum at $0 marginal cost (50K timesteps, local GPU). docs/artifacts/aave_ppo_v2_local_2026-05-07.json
Vesting-cliff recipient — Saturated regime: at 0.1 WETH inventory in the mainnet WETH/USDC 0.05% pool, dump_all / twap / PPO return $256.31656 +/- ~4e-6 (agreement to ~6 decimal places across all 3 strategies, std_reward=0). The pipeline is reproducible at $0; the policy distinction is only measurable at larger inventory / shallower pool / adversarial multi-agent (50K timesteps).. docs/artifacts/vesting_ppo_v1_local_2026-05-13.json
Aave V3 liquidator — PPO underperformed the scripted close-factor-max heuristic ($392.84 vs $624.42; PPO captured ~63%) on this single-liquidator-vs-one-borrower setup at 50K timesteps -- the scripted heuristic is essentially the analytic upper bound here (the only degree of freedom is close-factor timing), so this is the expected single-agent outcome and the deliverable is the reproducible pipeline at $0. Competing-liquidator timing (where RL beats the scripted policy) runs on the same pipeline.. docs/artifacts/aave_liquidator_ppo_v1_local_2026-05-13.json