Experiment 01 · open (no obstacles)

Open floor — method ladder

What it tests: races the candidate architectures against each other on empty grids (16²/4, 24²/6, 32²/10): a homogeneous shared policy, a per-agent identity residual, hardcoded explorer/relay roles, a learned skill-selector (with and without a congestion price), a scale-curriculum arm, and a decentralized-critic ablation. (For the controlled from-scratch vs. warm-started transfer A/B, see the dedicated Warm-start tab.)

Key finding Hardcoded explorer/relay roles sit at the top of the ladder at every scale, and adding a congestion price to the learned skill-selector consistently makes it worse, not better. Removing the central critic (decentralized-critic ablation) collapses coverage to near zero — CTDE with a central critic is doing real work here.
t = 0