Experiment 04 · dense obstacle worlds

Crowded worlds

Tests generalization into dense clutter, pillar fields, and mixed obstacle layouts — contrasting zero-shot transfer from open-floor training, policies trained on obstacles (baseline vs up-weighted reward), and a native crowded-mix training set, with explorer/relay roles vs the homogeneous policy.

Key finding Policies trained on obstacles with up-weighted coverage reward generally outperform zero-shot transfer from open floor on the matching map, and up-weighted reward beats the baseline reward. The open-floor backbone still transfers non-trivially zero-shot, but coverage drops sharply as the world scales 24² → 32² and clutter density rises from light to heavy.
t = 0