Experiment 05 · evolution-strategies coexistence

ES coexistence ladder

Evolves the learned skill-selector with evolution strategies alongside the gradient-trained executor (MERL-style coexistence: disjoint parameters, shared team return, separated timescales), warm-started up the scale ladder 16²/4 → 24²/6 → 32²/10.

Key finding The ES-evolved selector and the gradient executor do coexist and learn at 16², but the approach does not climb the scale ladder: even with warm-starting from the rung below, coverage falls off steeply at 24² and again at 32². ES coexistence is viable in the small regime but loses ground as the world grows.
t = 0