Mayavi Report — cliff_collision_team_plus

Cohort	Members	Swaps	WETH sold	USDC realized	Effective price
advisor	8	96	4.0000	10,248.37	2,562.09
seed	20	300	20.0000	51,248.66	2,562.43
team	5	40	25.0000	64,069.81	2,562.79

Per-recipient breakdown

Agent	WETH sold	USDC realized	Effective price	Swaps
`advisor-0` Hand-coded heuristic	0.5000	1,281.04	2,562.08	12
`advisor-1` Hand-coded heuristic	0.5000	1,281.05	2,562.10	12
`advisor-2` Hand-coded heuristic	0.5000	1,281.05	2,562.10	12
`advisor-3` Hand-coded heuristic	0.5000	1,281.06	2,562.11	12
`advisor-4` Hand-coded heuristic	0.5000	1,281.05	2,562.10	12
`advisor-5` Hand-coded heuristic	0.5000	1,281.04	2,562.08	12
`advisor-6` Hand-coded heuristic	0.5000	1,281.05	2,562.09	12
`advisor-7` Hand-coded heuristic	0.5000	1,281.04	2,562.07	12
`seed-0` Hand-coded heuristic	1.0000	2,562.39	2,562.39	15
`seed-1` Hand-coded heuristic	1.0000	2,562.37	2,562.37	15
`seed-10` Hand-coded heuristic	1.0000	2,562.47	2,562.47	15
`seed-11` Hand-coded heuristic	1.0000	2,562.49	2,562.49	15
`seed-12` Hand-coded heuristic	1.0000	2,562.37	2,562.37	15
`seed-13` Hand-coded heuristic	1.0000	2,562.45	2,562.45	15
`seed-14` Hand-coded heuristic	1.0000	2,562.50	2,562.50	15
`seed-15` Hand-coded heuristic	1.0000	2,562.39	2,562.39	15
`seed-16` Hand-coded heuristic	1.0000	2,562.40	2,562.40	15
`seed-17` Hand-coded heuristic	1.0000	2,562.44	2,562.44	15
`seed-18` Hand-coded heuristic	1.0000	2,562.44	2,562.44	15
`seed-19` Hand-coded heuristic	1.0000	2,562.34	2,562.34	15
`seed-2` Hand-coded heuristic	1.0000	2,562.46	2,562.46	15
`seed-3` Hand-coded heuristic	1.0000	2,562.52	2,562.52	15
`seed-4` Hand-coded heuristic	1.0000	2,562.45	2,562.45	15
`seed-5` Hand-coded heuristic	1.0000	2,562.46	2,562.46	15
`seed-6` Hand-coded heuristic	1.0000	2,562.37	2,562.37	15
`seed-7` Hand-coded heuristic	1.0000	2,562.49	2,562.49	15
`seed-8` Hand-coded heuristic	1.0000	2,562.42	2,562.42	15
`seed-9` Hand-coded heuristic	1.0000	2,562.44	2,562.44	15
`team-0` Hand-coded heuristic	5.0000	12,814.13	2,562.83	8
`team-1` Hand-coded heuristic	5.0000	12,814.08	2,562.82	8
`team-2` Hand-coded heuristic	5.0000	12,814.10	2,562.82	8
`team-3` Hand-coded heuristic	5.0000	12,813.87	2,562.77	8
`team-4` Hand-coded heuristic	5.0000	12,813.63	2,562.73	8

Methodology

Execution: All token operations execute against actual Solidity bytecode deployed at the pinned mainnet block via revm (Rust EVM). No Python re-implementation of pool math — slippage and routing match what would have happened on-chain at that block.
Fork pinning: Mainnet state is fetched once at block 19,000,000. The fork captures token balances, pool reserves, oracle states, and contract code as they were at that block.
Agent population: 33 heterogeneous recipients with urgency parameters sampled via seeded RNG (seed=42). Same seed + same fork block produce byte-identical results.
Pool: WETH/USDC 5.00% pool (Uniswap V3, fee tier 500bps).
What is NOT modeled: external arbitrageurs restoring price between recipient sells; cross-pool routing through aggregators (1inch / CoW); sandwich attacks from other MEV searchers; off-chain venues (CEX hedging); recipient behavior changing in response to observed price impact.
Engine validation: Mayavi’s execution kernel has been validated bit-exact against the on-chain Uniswap V3 Quoter at the pinned block. Every swap our engine produces matches what the official Uniswap simulation contract produces — zero delta. Reproduce locally with mayavi validate.

Reinforcement-learning agents

The agents in this run use the hand-coded / scripted baselines (see the Agent column above). Mayavi’s agents are RL-trainable on the same forked-mainnet stack — mayavi train --env aave|vesting|liquidator produces a PPO policy, and VestingRecipient(policy_path=…) loads one into a scenario. Trained-policy-vs-baseline evaluation results (each on a real forked mainnet, $0 marginal cost):

Aave V3 leveraged borrower — PPO captures most of the analytic optimum at $0 marginal cost (50K timesteps, local GPU). docs/artifacts/aave_ppo_v2_local_2026-05-07.json
Vesting-cliff recipient — Saturated regime: at 0.1 WETH inventory in the mainnet WETH/USDC 0.05% pool, dump_all / twap / PPO return $256.31656 +/- ~4e-6 (agreement to ~6 decimal places across all 3 strategies, std_reward=0). The pipeline is reproducible at $0; the policy distinction is only measurable at larger inventory / shallower pool / adversarial multi-agent (50K timesteps).. docs/artifacts/vesting_ppo_v1_local_2026-05-13.json
Aave V3 liquidator — PPO underperformed the scripted close-factor-max heuristic ($392.84 vs $624.42; PPO captured ~63%) on this single-liquidator-vs-one-borrower setup at 50K timesteps -- the scripted heuristic is essentially the analytic upper bound here (the only degree of freedom is close-factor timing), so this is the expected single-agent outcome and the deliverable is the reproducible pipeline at $0. Competing-liquidator timing (where RL beats the scripted policy) runs on the same pipeline.. docs/artifacts/aave_liquidator_ppo_v1_local_2026-05-13.json

cliff_collision_team_plus_seed

Cohort breakdown

Sell pressure over time

Realized USDC per step

Effective realized price

Pool tick — WETH/USDC 5bps

Per-recipient breakdown

Methodology

Reinforcement-learning agents