Experiment 04 · dense obstacle worlds
Crowded worlds
Tests generalization into dense clutter, pillar fields, and mixed obstacle layouts —
contrasting zero-shot transfer from open-floor training, policies trained on
obstacles (baseline vs up-weighted reward), and a native crowded-mix training set,
with explorer/relay roles vs the homogeneous policy.
Key finding
Policies trained on obstacles with up-weighted coverage reward generally outperform
zero-shot transfer from open floor on the matching map, and up-weighted reward beats the baseline
reward. The open-floor backbone still transfers non-trivially zero-shot, but coverage drops sharply
as the world scales 24² → 32² and clutter density rises from light to heavy.