ENA Vesting Cliff Replay: Real Launch → Synthetic Stress Test
ENA's real launch happened on 2025-04-02. We pin the cliff block, replay the vesting-cliff scenario, and compare simulator outflow against on-chain truth — within an honest 30% tolerance band, not delta-zero. The number that came back was 1.299e19%. Here is why that's not a regression.
Part of the Incident replays series.
Post 11 — EIGEN Season 1 walked the bit-exact replay path: delta == 0, no tolerance. This post walks the other replay shelf — ledger-style replay, where the test compares simulator-observed cumulative outflow against on-chain ground truth within a tolerance band, and the published number can be 1.299 × 10¹⁹ % out of the band and still not be a regression.
That's a weird claim. The rest of this post is the explanation.
The two replay strategies, side by side
| Strategy | Mechanism | Tolerance | When it fits |
|---|---|---|---|
| Quoter bit-exact (Post 11) | Static-call Quoter, execute SwapRouter, assert outputs identical | 0 (delta == 0) | Whenever the protocol exposes a simulation oracle with matching semantics |
| Ledger-style (this post) | Simulate the scenario end-to-end, compare cumulative on-chain outflow during the same block range against the simulator's recorded outflow | 30% acceptance, 10% target | Multi-tx scenarios where no single contract can be Quoter-replayed (vesting cliffs, depeg cascades, real launches) |
The Quoter strategy is stronger but narrower. The ledger strategy is weaker per assertion but covers the multi-tx scenarios that matter most for tokenomics work. ENA is a ledger-style replay because we're not modeling one swap — we're modeling 23 cohort-driven recipients dumping at the cliff, and there's no single contract that can be Quoter'd against the aggregate.
The setup
mayavi/scenarios/launch_replay.py + launch_replay_ena.yaml:
- Pinned cliff block: 22,180,000 (ENA's real 2025-04-02 launch block range).
- 23 vesting recipients, deterministic addresses derived via
_address_for(scenario_name="launch_replay_ena", role="recipient", idx). - Each recipient receives a cohort-weighted ENA allocation at step 0 (cliff).
- A
VestingRecipientagent per recipient runs a TWAP-style dump strategy in subsequent steps. - The simulator records every swap, accumulating an
ena_amount_inandusdc_amount_outledger.
tests/replay/test_launch_replay.py runs the scenario end-to-end against forked mainnet at the cliff block (@pytest.mark.fork @pytest.mark.slow), then uses find_swap_in_range to scan on-chain SwapRouter events in the same block range. The assertion compares the two cumulative outflows.
The number
When Phase 5 finally ran the replay end-to-end and closed the five-month-old docs/validation.md:297 TODO (Sprint N), the recorded delta was:
sim_outflow = 191,666,666,666,666,671,504,097,280 (≈ 191.7M ENA, 18 decimals)
real_outflow = 1,476,014,298 (≈ 1,476 USDC, 6 decimals)
delta_pct = 1.299e19 %The test asserts delta_pct ≤ 30%. 1.299 × 10¹⁹ is spectacularly outside that band. And yet the test is correctly failing in a way that isn't a regression. Why:
The 30% tolerance band was sized for a world where (1) was fixed and (2) was at least partial. We hit neither, so the delta is dominated by structural infrastructure gaps, not engine correctness.
Why the test isn't tightened
This is the crucial discipline:
Tolerance posture (CRITICAL): target delta is ≤ 10%; acceptance is ≤ 30%; above 30% the assertion is NOT tightened — the gap is documented in
docs/validation.mdas Phase 3 seed work. This is a sim-to-real ledger test, not a "make the numbers match" exercise.
The pattern the discipline rejects is: a test is failing → tighten the tolerance to make it green → ship. That's how validation suites become decorations. The honest path is: a test is failing → understand why → document the failure mode in the ledger → schedule the fix as a real work item → leave the test as-is.
validation.md:297's five-month staleness was a different problem: not "test was wrong," but "we hadn't run the test locally yet because every host had been GHA-quota-exhausted." Sprint N's swab finally ran it locally, captured the actual number, and replaced the TBD with the tabulated readout + two-bullet explanation. That entry now reads as a known structural gap, not a TODO.
Replay-v2: what would close the gap
Two improvements would convert this from "structural gap documented" to "delta within band":
- Unit-aware ledger comparison. Compare
sim_amount_out(USDC, 6 decimals) againstreal_amount_out(USDC, 6 decimals). The simulator already records the output side of each swap; the scanner needs to switch from summingamount_into summingamount_out. ~50-line PR, no new infrastructure. - Universal-Router-aware scanner. Walk the Universal Router's
execute(...)calldata, decode the inline operation list (V3 swaps appear as opcodes inside that calldata), accumulate. Plus 1inch / CowSwap / 0x adapter parsing. This is the larger PR — probably a week of work, including a new test suite for each aggregator's calldata shape.
Both are queued behind the post-Phase-5 outreach surface. The mainnet ENA replay bundle is committed today as evidence the scenario runs end-to-end; the comparative band is replay-v2 work, deliberately separate.
The cliff-collision sweep
A second related bundle ships in Phase 5: a synthetic cliff-collision scenario that doesn't try to replay a real launch but explores the parameter space of what happens when two vesting cliffs land at the same block.
This is the "what if you stack two unlocks?" scenario that tokenomics teams want to know about before deciding their unlock schedule. The simulator doesn't claim to predict the realized price impact (the cliff-collision bundle has no on-chain ground truth — there's no real launch with these exact cohort weights at this exact block) — it claims to reproduce the swap mechanics end-to-end given the cohort assumptions. The KPI tiles (peak step, realized USDC, swap count) are the deliverable.
Where this leaves replays
Two named-incident replay scenarios committed today (EIGEN Quoter-bit-exact + ENA ledger-style), plus the depeg-cascade scenarios from Post 7, plus the cliff-collision synthetic — that's the credibility infrastructure for "this engine simulates real on-chain events." The remaining work is:
- Quoter-bit-exact for sell-side EIGEN (5 minutes).
- Quoter-bit-exact at other pool tiers as liquidity allows (5 minutes per).
- Replay-v2 (unit fix + Universal Router scanner) for ledger-style tests (~1 week).
- A named-tx replay path that walks individual user transactions (Phase 6+).
Next theme: Platform & Pipeline. Post 13 walks the 21-bundle multichain matrix as a pipeline showcase; Post 14 explains why determinism falls out of the engine design; Post 15 walks the Modal + Vercel + DuckDB deployment that ties everything to the dashboard. After that, the series closes.