Experiment 05 · evolution-strategies coexistence
ES coexistence ladder
Evolves the learned skill-selector with evolution strategies alongside the
gradient-trained executor (MERL-style coexistence: disjoint parameters, shared team return,
separated timescales), warm-started up the scale ladder 16²/4 → 24²/6 → 32²/10.
Key finding
The ES-evolved selector and the gradient executor do coexist and learn at 16², but
the approach does not climb the scale ladder: even with warm-starting from the rung below, coverage
falls off steeply at 24² and again at 32². ES coexistence is viable in the small regime but loses
ground as the world grows.