Forked Ethereum mainnet at block 19,000,000,
8 simulation steps (1.6 minutes
of mainnet block time). Seed 42.
Borrowers
5
Rebalance actions
5
successful repays/supplies
Horizon
8 steps
Wall-clock
4.5s
Borrower scenario
This run opened 5 borrower positions
against a deployed lending Pool (Aave V3 / SparkLend / Compound V3) and
observed 5 successful rebalance
actions over the 8-step horizon
— HF only drifts via interest accrual under nominal conditions so
most steps are no-ops by design.
Per-borrower HF / debt / collateral trajectories are not
currently captured in data/runs/runs.duckdb; the actions
above are the available signal until a per-step borrower-state hook lands.
Methodology
Execution: All token operations execute against actual Solidity bytecode
deployed at the pinned mainnet block via revm (Rust EVM). No Python re-implementation
of pool math — slippage and routing match what would have happened on-chain at that block.
Fork pinning: Mainnet state is fetched once at block
19,000,000. The fork captures token balances, pool reserves, oracle
states, and contract code as they were at that block.
Agent population: 0 heterogeneous recipients with urgency
parameters sampled via seeded RNG (seed=42). Same seed + same fork block produce
byte-identical results.
Pool: WETH/USDC 5.00% pool (Uniswap V3, fee tier 500bps).
What is NOT modeled: external arbitrageurs restoring price between
recipient sells; cross-pool routing through aggregators (1inch / CoW); sandwich attacks
from other MEV searchers; off-chain venues (CEX hedging); recipient behavior changing
in response to observed price impact.
Engine validation: Mayavi’s execution kernel has been validated
bit-exact against the on-chain Uniswap V3 Quoter at the pinned block.
Every swap our engine produces matches what the official Uniswap simulation contract
produces — zero delta. Reproduce locally with mayavi validate.
Reinforcement-learning agents
The agents in this run use the hand-coded / scripted baselines (see the
Agent column above). Mayavi’s agents are RL-trainable on the same
forked-mainnet stack — mayavi train --env aave|vesting|liquidator produces
a PPO policy, and VestingRecipient(policy_path=…) loads one into a scenario.
Trained-policy-vs-baseline evaluation results (each on a real forked mainnet, $0 marginal cost):
Aave V3 leveraged borrower — PPO captures most of the analytic optimum at $0 marginal cost (50K timesteps, local GPU). docs/artifacts/aave_ppo_v2_local_2026-05-07.json
Vesting-cliff recipient — Saturated regime: at 0.1 WETH inventory in the mainnet WETH/USDC 0.05% pool, dump_all / twap / PPO return $256.31656 +/- ~4e-6 (agreement to ~6 decimal places across all 3 strategies, std_reward=0). The pipeline is reproducible at $0; the policy distinction is only measurable at larger inventory / shallower pool / adversarial multi-agent (50K timesteps).. docs/artifacts/vesting_ppo_v1_local_2026-05-13.json
Aave V3 liquidator — PPO underperformed the scripted close-factor-max heuristic ($392.84 vs $624.42; PPO captured ~63%) on this single-liquidator-vs-one-borrower setup at 50K timesteps -- the scripted heuristic is essentially the analytic upper bound here (the only degree of freedom is close-factor timing), so this is the expected single-agent outcome and the deliverable is the reproducible pipeline at $0. Competing-liquidator timing (where RL beats the scripted policy) runs on the same pipeline.. docs/artifacts/aave_liquidator_ppo_v1_local_2026-05-13.json