Mayavi Report — curve_depeg

Methodology

Execution: All token operations execute against actual Solidity bytecode deployed at the pinned mainnet block via revm (Rust EVM). No Python re-implementation of pool math — slippage and routing match what would have happened on-chain at that block.
Fork pinning: Mainnet state is fetched once at block 19,000,000. The fork captures token balances, pool reserves, oracle states, and contract code as they were at that block.
Agent population: 0 heterogeneous recipients with urgency parameters sampled via seeded RNG (seed=42). Same seed + same fork block produce byte-identical results.
Pool: WETH/USDC 5.00% pool (Uniswap V3, fee tier 500bps).
What is NOT modeled: external arbitrageurs restoring price between recipient sells; cross-pool routing through aggregators (1inch / CoW); sandwich attacks from other MEV searchers; off-chain venues (CEX hedging); recipient behavior changing in response to observed price impact.
Engine validation: Mayavi’s execution kernel has been validated bit-exact against the on-chain Uniswap V3 Quoter at the pinned block. Every swap our engine produces matches what the official Uniswap simulation contract produces — zero delta. Reproduce locally with mayavi validate.

Reinforcement-learning agents

The agents in this run use the hand-coded / scripted baselines (see the Agent column above). Mayavi’s agents are RL-trainable on the same forked-mainnet stack — mayavi train --env aave|vesting|liquidator produces a PPO policy, and VestingRecipient(policy_path=…) loads one into a scenario. Trained-policy-vs-baseline evaluation results (each on a real forked mainnet, $0 marginal cost):

Aave V3 leveraged borrower — PPO captures most of the analytic optimum at $0 marginal cost (50K timesteps, local GPU). docs/artifacts/aave_ppo_v2_local_2026-05-07.json
Vesting-cliff recipient — Saturated regime: at 0.1 WETH inventory in the mainnet WETH/USDC 0.05% pool, dump_all / twap / PPO return $256.31656 +/- ~4e-6 (agreement to ~6 decimal places across all 3 strategies, std_reward=0). The pipeline is reproducible at $0; the policy distinction is only measurable at larger inventory / shallower pool / adversarial multi-agent (50K timesteps).. docs/artifacts/vesting_ppo_v1_local_2026-05-13.json
Aave V3 liquidator — PPO underperformed the scripted close-factor-max heuristic ($392.84 vs $624.42; PPO captured ~63%) on this single-liquidator-vs-one-borrower setup at 50K timesteps -- the scripted heuristic is essentially the analytic upper bound here (the only degree of freedom is close-factor timing), so this is the expected single-agent outcome and the deliverable is the reproducible pipeline at $0. Competing-liquidator timing (where RL beats the scripted policy) runs on the same pipeline.. docs/artifacts/aave_liquidator_ppo_v1_local_2026-05-13.json

curve_depeg_demo

No swap or liquidation actions executed

Methodology

Reinforcement-learning agents