Experimental findings

All evaluation is stochastic-π (ε = 0), never argmax — argmax collapses the swarm's symmetry-breaking (see Emergence).

Coverage ladder

training curves
Training curves across architecture variants (16×16).
coverage-connectivity pareto at 32x32
Coverage–connectivity Pareto frontier at 32×32.

Connectivity

Graph belief — transfer, reliability, few-shot

@16×16/4 (trained)@32×32/10 (zero-shot)trivial baseline
old snapshot estimator88%~trivial (no transfer)
GCRN (single split)95.9%77–81%45.9%
GCRN (5-fold CV, weight-decay 0.03)95.6 ± 0.4%72.5 ± 1.2%45.9%

Few-shot: a single 32×32 fine-tuning map lifts zero-shot 76.7% → 80.2%, plateauing ~81% (8 maps) within ~1.3 pts of the 82.1% full-data ceiling, and beating from-scratch most in the 1-shot regime (+4.1 pts) — the size-invariant pretraining buys data efficiency.

zero-shot transfer to 32x32
The 16×16-trained policy + belief running zero-shot at 32×32 / 10 agents.

Role-switching — transfer & health

relay ratioswitch rateP(relay | cut-vertex)P(relay | non-cut)
16×16/4 (trained)~20%9.4%/step94%3%
32×32/10 (zero-shot)~19%9.9%/step78%8%

The structure-awareness transfers: the policy still relays cut-vertices ~4× more than redundant agents at the new scale, switching is balanced (no stuck-relay), relay ratio steady. The role decision is correct at scale; connectivity still fails for a structural reason (see the geometric wall).

Belief-wiring at 32×32 (held-out maps)

policycoveragefull-connectivitycohesionsafety-viol
compass (no relay)70%20%68%6.0%
local switcher (zero-shot)60%36%80%5.0%
belief-wired (decision-only)64%13%64%8.7%
belief-wired (decision + action)64%13%64%8.7%

The zero-shot local switcher (60 / 36) remained the best connectivity operating point; re-optimizing at 32×32 under a coverage-aware fitness destroyed its relay pattern — the geometric wall again.

belief-wired policy at 32x32
Belief-wired role-switcher at 32×32 / 10 agents (held-out map).

Adversarial & compromise-sweep results (RedWithinBlue)

From the RedWithinBlue compromise sweep and the n=60 adversarial-model validation — the empirical core of the covert-resilience question. Several results qualify the headline framing; they're reported honestly.

Two algorithm-default cautions from the same corpus: entropy regularization monotonically hurts coverage here (ent_coef 0.01 costs ~10% relative coverage — the exploration reward already prevents collapse), and argmax evaluation inflates catastrophic failure 0%→46% by destroying spawn-time symmetry-breaking (always evaluate by sampling π at ε=0). Full document corpus on the Docs page.

Last runs (2026-06-22) — coordination & generalization

Protocol: episode length FIXED at 100 steps; bigger worlds are compensated with more agents, not more time (extra time was the artifact that faked 100% coverage). New coordination metric: REDUNDANCY = (Σ per-agent coverage) / (unique team coverage) — 1.0 = perfect division of labor; higher = flooding/overlap.

Coordination diagnostic fixed 100 steps · 3 seeds

grid / N heuristic learned-A learned-B
covconnredund covconnredund covconnredund
16×16 / N4 100%55%3.13 77%100%2.20 100%62%3.28
20×20 / N5 100%54%3.76 56%100%2.94 100%59%3.99
24×24 / N6 95%42%3.67 42%100%3.03 94%36%3.87
28×28 / N8 89%17%4.28 31%100%3.03 91%24%4.12
32×32 / N10 78%28%5.29 18%100%3.80 88%32%4.85
40×40 / N16 66%15%5.83 25%100%4.83 79%38%5.88

No controller divides labor: the redundancy floor is ≥ 2.2 everywhere, rising to ~6–8 at scale. learned-A holds connectivity at 100% only by huddling (coverage craters 77% → 25%).

Agent sweep @32×32 learned-B · does adding agents divide or overlap?

N agentscoverageconnectivityredundancy
N6 60% 44%3.68
N10 88% 32%4.85
N14 98% 36%6.27
N18 100% 61%7.87

This is the proof that coverage is brute-force flooding, not coordination: redundancy nearly doubles (3.68 → 7.87) as agents are added while coverage climbs to 100% — adding agents floods rather than divides.

Generalization study (4 modes) train → zero-shot → few-shot

mode 16×16 (train) 32×32 (train) 40×40 (zero-shot) relay% verdict
baseline (w_conn=1.0) 91 / 76100 / 32100 / 22 25% → 1% connectivity collapses with scale
adaptive (Lagrangian λ→1.5) 40 / 10010 / 1004 / 100 94–100% degenerate clump: everyone relays, coverage dies
fiedler (exact λ₂ oracle features) 100 / 60100 / 16100 / 14 0–2% null/worse: a perfect signal produced NO relays
episodes (2× length) 100 / 8698 / 7172 / 72 67–87% roles emerge — but only by violating fixed-100 (more time)

The adaptive and fiedler modes are failures: the Lagrangian chases the trivial all-relay clump, and a perfect global-connectivity oracle (exact eigendecomposition λ₂ + each agent's own Fiedler component) produced essentially no relays and connectivity worse than baseline at scale — information is not the bottleneck. The episodes mode is the only one where roles emerge and connectivity holds, but it "works" only by spending 2× the time — emergence appears just when we cheat the clock.

Architecture tournament & parameter sweep

Regime A relay controller
Regime A (hard backstop) — roles emerge, but coverage is capped.
Regime B relay controller
Regime B (soft + latency-discounted) — full coverage, connectivity collapses.
coverage-pressured swarm at 32x32
Coverage corner @32×32 / N10 — the swarm disperses to cover.
connectivity-pressured swarm at 32x32
Connectivity corner @32×32 / N10 — the swarm huddles to stay linked.
Takeaway. Across every 2026-06-22 run, the swarm did not coordinate — there is no division of labor. Coverage was achieved either by flooding (redundancy rising toward ~8 as agents are added) or by spending more time (the episodes mode that violates fixed-100). A perfect Fiedler oracle did not help, so the bottleneck is not information — it is the action space: a 1-step move head / relay-bit gate cannot represent "claim a disjoint region." This motivates the proposed "Stack C" redesign: a learned goal/region selector + A* executor + hard connectivity mask + redundancy reward, finally plugging in the GCRN belief — success criterion: invert the agent-sweep curve so redundancy → 1 as agents are added, at fixed 100 steps.

Sibling studies — detection & testbed

Three of the author's own write-ups, transcribed here so they survive after the source documents are retired. Two are sibling papers (a degradation detector and the published-form testbed); the third adds the finer empirical detail behind the compromise-sweep headline above.

Early-warning detector — temporal graph embeddings GWU advanced-ML report · w/ Tejaaswini Narendran · 4pp

This sibling report (Mehralizadeh & Narendran, George Washington University) asks whether mission degradation can be caught before it becomes visible as failure. The setup is a grid exploration mission with four homogeneous agents (N=4) on a 32×32 grid, horizon Tmax=500 steps, with connectivity enforced as a hard mission constraint — formal mission failure is the first step the time-varying communication graph disconnects (proximity by Chebyshev distance). Nominal behavior is drawn from four policy families (structured lawn-mower sweeping, pure random walk, ε-greedy with structured drift, frontier-biased coverage) so the embedding encodes coordination structure rather than one policy's motion artifacts. Adversarial perturbation overrides a random 25% or 50% subset of agents with uniform-random actions over random time windows.

Figure 1 (PCA space with kNN probability surface, described in prose since no asset was exported): the 16-dimensional mission embeddings are projected to two principal components. Nominal-mission points (one dot per 150-step prefix) cluster tightly in one region while broken-mission points fan outward; overlaid is a smooth kNN decision surface colored by P(mission broken), shading continuously from low probability over the nominal cluster to high probability across the degraded fan. The continuity of that surface — not a sharp wall — is the figure's whole point: it shows the early-warning score is graded, so a threshold can trade off detection delay against false alarms.

ExoRL — the connectivity-constrained exploration testbed ICML-2026-formatted · 7pp · published-form sibling of zymera_env

The published-form testbed paper for the connectivity-constrained decentralized exploration environment. A swarm of N agents (headline runs use N=4) must maximize coverage of a discrete grid (32×32 in the headline) within a fixed horizon, acting from local sensing plus one-hop neighbor messages — centralized training, decentralized execution. The base reward is the incremental coverage gain cov(t+1) − cov(t).

Stealth-attacks compromise sweep — finer detail 16×16 · N=5 · k∈{0,1,2} · 20 seeds

The headline of this sweep already appears in the adversarial section above; below are the finer numbers and the two figures, not duplicated there.

compromise coverage trajectory by budget k
Mean coverage trajectory ± 95% CI per setup (20 seeds); grey line marks the 90% mission threshold. Per-setup summary (seeds reaching 90%, median crossing step, mean coverage at horizon): B = 19/20, step 89, 98.5%; C₁ = 11/20, step 159, 89.6%; C₂ = 9/20, step 143, 87.1%.
Bayesian-belief cityscape across compromise budgets
Bayesian-belief "cityscape": bar height is per-cell scan count (proxy for posterior log-evidence), common z-axis across panels. B (5b/0r): smooth dome. C₁ (4b/1r): a red tower over the hoarded cells with a ring of depression in blue's surface. C₂ (3b/2r): two red spikes at the extremities with the blue plateau collapsed between them — the damage lives in the posterior, not the per-step coverage rate.