Related work & the gap

Five literature investigations — two PRISMA systematic reviews (connectivity-role-allocation; covert misbehaviour), two deep-research sweeps (connectivity-constrained exploration; resilience), and a code hunt — across four flanks. Honest verdict: each flank is well-trodden alone; the intersection is the open seam.

0 · The connectivity-regime spectrum A · Connectivity-aware exploration & role allocation B · Covert micro→macro misbehaviour C · Resilience of autonomous teams D · Methods worth borrowing The combined gap verdict Threats to validity Sources & document corpus E · What to borrow from Multi-Robot Systems F · Foundational references (annotated)

0 · framing

The connectivity-regime spectrum

The field organizes connectivity-constrained exploration along a strictness spectrum — None → event-based / recurrent → intermittent → continuous. This is the soft↔hard axis our own work moved along: our soft degree-floor sits near the recurrent end; our hard guardrail is the continuous end. Under the strictest (continuous) regimes the field tends to retreat to centralized planning — which is exactly the room a decentralized, learned approach has.

A · connectivity + roles

Connectivity-aware exploration & role allocation

PRISMA review (96 → 24 extracted; 0 met all four criteria) + a connectivity-exploration deep-research sweep + a code hunt.

The literature splits into two camps that no study bridges:

Connectivity camp — learned + decentralized + genuine comm-graph maintenance, but homogeneous, no roles: Li et al. 2021 (λ₂ / constrained-RL), Lin et al. 2020 (hard projection), DRL Multitarget Coverage 2023 (hard action-mask — closest coverage method).
Roles camp — learned + decentralized + role-differentiated, but connectivity-blind: ROMA, RODE, CORD; Area-Search (explore-vs-cover — closest threat); evolved swarm specialization.

Classical relay/explorer + no-consensus allocation (well-established, all heuristic or centralized): de Hoog's role-based exploration, CBAX (centralized, bandwidth-aware Steiner relays), Cesare's self-sacrificing UAVs, DRBECM-ML (RSSI-hysteresis roles), MoRoCo; GVGExp (Voronoi "gate" partition) and Cross-rank (TSP spreading, position-only sharing ~100 B/s); decentralized allocation via market/auction MRTA (Zlot, Sheng), potential fields (Liu & Lyons), and stigmergy (Andries, Kuyucu).

Independent validation of three of our design choices.

our choice	prior art that validates it
hysteresis role-switch (enter hi / release lo)	DRBECM-ML's two-sided RSSI hysteresis (−65 / −75 dBm)
hard connectivity guardrail (mask disconnecting moves)	CBAX's hard feasible-set constraint; Lin 2020 projection; 9738445 hard mask
lightweight intent channel (share little)	Cross-rank shares positions only (~100 B/s), anti-redundancy by spatial spreading

Released code we can study. MRESim (de Hoog's role-based explorer/relay simulator, Java — a baseline to beat) and DRBECM (Python/NumPy — the geometric base, not the ML version). Most others released no code. Honest note: the strongest explicit negotiated anti-redundancy claims were the ones that got refuted in verification — that mechanism class is weakly evidenced, which makes our intent-signalling negotiation a less-trodden path.

B · covert micro→macro

Covert micro→macro misbehaviour

PRISMA review (degraded by tooling → re-established by direct scan). Four clusters; the finding is which never overlap.

cluster	representative work	stealthy	micro→macro	spatial mission	resilience
Covert c-MARL poisoning / backdoor	One4All, BLAST, Spatiotemporal backdoor, CAMA, AMI	yes	yes (1→team)	no (SMAC games)	attack-demo
Stealthy attacks on networked control (CPS)	stealthy FDI / zero-dynamics attacks	yes (formal)	partial	no (linear consensus)	detection
Resilient consensus / r-robustness	W-MSR; secure swarm cooperation	no (needs detectable)	via graph robustness	flocking	yes
Micro→macro / cascade modelling	cascading-failure risk, encounter diffusion	no (stochastic)	yes	abstract	yes

The active subfield: single covert agent → poison the whole cooperative team is real and growing (One4All 2022; BLAST 2025 — behaves normally until a trigger, 91.6% success / 3.7% clean-variance; an OpenReview titled "Single-Agent Poisoning Attacks Suffice to Ruin Multi-Agent…"). But all of it is on game benchmarks (SMAC), trigger-based, attack-demonstrations — not spatial missions, not role/position-aware, not resilience curves.

C · resilience

Resilience of autonomous teams

Resilience deep-research sweep (automated synthesis failed on tooling → direct-read of the surfaced sources).

Byzantine / adversarial resilience theory: r-robustness & (r,s)-robustness of graphs and the W-MSR algorithm (LeBlanc / Zhang / Sundaram) — resilient consensus iff the graph is (F+1,F+1)-robust; resilient leader-follower (Usevitch & Panagou, Automatica 2019).
Resilient motion: Resilient Flocking for Mobile Robot Teams (Saulnier et al., RA-L 2017) — discard the most extreme neighbours (a spatial W-MSR); robust-topology construction (Guerrero-Bonilla & Kumar).
Failure / self-healing: decentralized self-repair to maintain connectivity and coverage; fault-tolerant coverage.
Learned resilience (thin): a few 2023–25 fault-tolerant / Byzantine-robust MARL preprints — for homogeneous teams, not connectivity-aware role-allocating ones.

Key distinction this surfaced: connectivity ≠ resilience. A graph can have high algebraic connectivity (λ₂) yet not be r-robust — one well-placed adversary breaks consensus. So a connectivity objective should target λ₂, but a resilience objective should target graph robustness r (W-MSR). Crucially, all this theory assumes adversaries are detectable outliers (the defense discards extremes) — which is exactly what a stealthy agent evades, so the F-robustness guarantees go silent in the covert regime.

D · borrow

Methods worth borrowing

λ₂ as a constraint, from the belief (Li 2021 / CPO): maximize coverage subject to a connectivity floor — removes the weighted-sum knob that caused our clump/freeze; compute λ̂₂ from the belief instead of an oracle.
Hard action-mask for coverage (9738445, Lin 2020): the connectivity guardrail, validated.
Hierarchical role architecture (Area-Search): a high-level role-selector over role-specific controllers.
GAT / GNN policy backbone (MARVEL, GNN-MARL): the natural substrate for intent-signalled, no-consensus coordination.
Expert-bootstrap (behaviour cloning; our warm-start): pure from-scratch constrained RL oscillates — start from a known-good policy.

verdict

The combined gap verdict

Every flank is NARROWED, not open — so the contribution must be the conjunction, never a single leg. Connectivity-aware role allocation exists (homogeneous OR connectivity-blind, never both); covert single-agent poisoning of cooperative MARL exists (game benchmarks, trigger-based); resilience theory exists (assumes detectable adversaries). The open seam is the intersection: covert, within-bounds misbehaviour's micro→macro propagation in a spatial connectivity mission, governed by role / graph-position, with a learned graph belief as both attack-surface and detector, characterized as a stealth–damage frontier.

Do NOT claim: "first connectivity-aware MARL coverage" (refuted by 9738445), "first learned role allocation for exploration" (refuted by 2312.01747), or "first covert single-agent attack on cooperative teams" (refuted by One4All / BLAST).

validity

Threats to validity

Search coverage. The reviews were arXiv-weighted; IEEE / ACM / Springer robotics venues (ICRA, IROS, RSS, CoRL, T-RO) are under-represented — a targeted robotics-venue sweep is the outstanding check before locking the claim.
Tooling. The connectivity review completed cleanly (96→24); the covert and resilience sweeps' automated synthesis degraded (structured-output failures + an API overload) and were re-established by direct reading — treat those flanks as direct-read, not PRISMA-verified.
Weakest leg. The relay-role's novelty vs. Area-Search's spatial roles — defend via the integration (belief-driven + connectivity-objective + stealth-frontier), not "we have roles".

corpus

Sources & document corpus

The full set of documents gathered across the five investigations and the local reference library. Web sources are the substantive/verified ones; quality was mixed and the verified set is what the verdict rests on.

Surveyed literature (web)

Flank A — connectivity & roles: Amigoni/Banfi/Basilico connectivity-exploration survey (8267592), Li 2021 (λ₂/CPO), Lin 2020, DRL Multitarget Coverage, GVGExp, intermittent-connectivity ILP, Cross-rank, de Hoog (5509803 / MRESim), DRBECM-ML (WiOpt 2025 / repo), MoRoCo; role MARL — ROMA, RODE, LDSA, ACORM, CORD, MAVEN; learned exploration — Area-Search, MARVEL, ACE, evolved swarm.

Flank B — covert misbehaviour: One4All, BLAST, Spatiotemporal backdoor, AMI, CAMA, "Single-Agent Poisoning Suffices…", Constrained Black-Box Attacks, Selective Adversarial Fault Induction.

Flank C — resilience: Sundaram et al. robust consensus / r-robustness (W-MSR), Usevitch & Panagou resilient leader-follower (Automatica 2019), Saulnier et al. Resilient Flocking (RA-L 2017), decentralized self-repair connectivity, secure swarm cooperation (Science Robotics), Towards Fault Tolerance in MARL, plus resilient-MARL preprints (2305.12872, 2111.06776, 2204.10063).

Local reference library (gathered in the workspace)

Classic taxonomies (docs/taxonomy-refs/): Dudek 1996 (multi-agent robotics taxonomy), Gerkey & Matarić 2004 (MRTA taxonomy, IJRR), Verma & Ranga 2021 (multi-robot coordination taxonomy, JIRS), Chung et al. 2018 (aerial swarm robotics survey, T-RO), Brambilla et al. 2013 (swarm robotics / swarm engineering), Schranz et al. 2020 (swarm robotic behaviours), Amato (Dec-POMDP survey), Oliehoek 2012 (Dec-POMDPs RL chapter). Plus the generated marl_taxonomy/ field map (70 entries + gap/venue reports).

Project drafts (references/): Stealth Attacks on Swarms, Swarm Resiliency, mission family, advML report, ExoRL report, Formalization.

Honesty on the corpus. The deep-research sweeps returned more candidate URLs than are listed here; many were graded low-quality and dropped, and the resilience/covert syntheses were re-established by direct reading after a tooling failure. This page lists the substantive documents; the only ones the gap verdict leans on are the verified/primary works in flanks A–C.

E · multi-robot systems

What to borrow from Multi-Robot Systems

A method-level technical review for a sparse, networked team of autonomous agents doing area coverage under a communication constraint — built toward studying covert-misbehaviour resilience. Three areas: distributed connectivity estimation · coverage control · connectivity-constrained exploration.

The multi-robot-systems (MRS) field has worked these exact problems for ~20 years. This review pulls the canonical methods, defines the technical terms from them, states what information each node must hold, shows how the classical methods fuse with learning (MARL/GNN), and lays out the three connectivity "guardrail" strategies (maintain · mask · grace). The novelty-conjunction verdict it lands on is the same one already stated above — see the combined gap verdict; this section does not repeat it, it supplies the method-level depth underneath it.

E.1 · Distributed algebraic-connectivity (λ₂) estimation & maintenance

How does a team where each robot sees only its one-hop neighbours keep the whole comm graph connected? The field makes "connected" quantitative through λ₂, the second-smallest eigenvalue of the graph Laplacian (the Fiedler value): it is > 0 iff the graph is connected and grows as the graph gets better-knit. The hard part: λ₂ and its eigenvector are global properties, yet each robot only knows its own Laplacian row. The signature contribution is a family of distributed power-iteration + average-consensus algorithms that let every robot estimate its own component of the Fiedler vector and a local copy of λ₂ from neighbour-to-neighbour messages alone — then follow the gradient of λ₂ to hold connectivity. Almost all of it is control-theoretic; learning enters as a replacement controller (GNN) or as a constraint/guardrail around a learned policy.

Glossary (defined from the canonical methods)

Term	Definition
Graph Laplacian `L = D − A`	Degree matrix minus adjacency. Symmetric PSD, row-sums zero (`L·1=0`); spectrum `0=λ₁≤λ₂≤…` encodes connectivity. Each robot natively knows only its own row of L.
Algebraic connectivity / Fiedler value (λ₂)	2nd-smallest eigenvalue of L. `λ₂>0 ⇔ graph connected` (Fiedler 1973); monotone as edges/weights increase — a smooth scalar surrogate for "how connected".
Fiedler vector (v₂)	Eigenvector for λ₂, orthogonal to `1`. Component `v₂ⁱ` says how robot i sits on the graph's weakest cut; the λ₂ gradient is built from edge differences `(v₂ⁱ−v₂ʲ)²`, so a robot needs only its own component + neighbours'.
*Why the second* eigenvalue**	λ₁ is always 0 (eigenvector `1`), carrying no connectivity info. Connectivity lives in λ₂ — which requires deflating the trivial `1` direction so λ₂ becomes the dominant mode.
Distributed power iteration	Repeatedly apply a shifted/deflated operator `(I−αL)` to a vector; it converges to v₂. The product `L·x` is local (each node combines only neighbour values) — turning power iteration into rounds of neighbour message-passing.
Dynamic average consensus	The PI (proportional-integral) estimator (Freeman–Yang–Lynch 2006) that tracks the average of a time-varying signal from neighbour exchanges. It supplies the two global scalars — the deflation mean `Ave(x)` and normalisation `Ave(x²)` — that power iteration needs, without any node holding the whole vector.
Connectivity gradient	`∂λ₂/∂pᵏ = Σⱼ −Aₖⱼ(v₂ᵏ−v₂ʲ)²(pᵏ−pʲ)/σ²`. Local: robot k needs only its own Fiedler component, its neighbours', and relative positions. Following it (or a barrier-potential of λ₂) is the connectivity-maintenance control law.

Methods & their learning stacks

Method	How it works	Learning stack
Decentralized deflated power-iteration + PI consensus (Yang–Freeman–Lynch 2010)	Each agent integrates one scalar `xⁱ→v₂ⁱ` with three terms (deflate the `1` mode, apply −L, renormalise); reads off λ₂ via Rayleigh quotient or the normalisation gain; then moves up the λ₂ gradient. The reference scheme.	control-theoretic (ODE + consensus + gradient; Lyapunov proof)
Power-iteration + potential-barrier controller (Sabattini–Chopra–Secchi 2013)	Same estimator, wrapped by a potential `V(λ₂)→∞` as `λ₂→ε`. The barrier dominates near the threshold ⇒ provably keeps `λ₂>ε` for all time even with an added bounded task controller.	control-theoretic
Fiedler-supergradient ascent (de Gennaro–Jadbabaie 2006)	Estimate v₂ (distributed spectral analysis), ascend λ₂ via `dλ₂ = v₂ᵀ(dL)v₂`. Establishes λ₂-as-objective.	control-theoretic + distributed eigensolver
Nearest-neighbour connectivity potentials (Zavlanos–Pappas 2007)	Potentials on existing links diverge as a link nears comm range ⇒ no current edge is ever lost. Preserves the edge set rather than tracking λ₂ (more conservative).	control-theoretic
Nonlinear edge-weight coordination (Ji–Egerstedt 2007)	Edge weights blow up near sensing range ⇒ the weighted-Laplacian flow never disconnects, with no λ₂ computation. Simple, decentralized, conservative (freezes initial topology).	control-theoretic
λ₂-maximisation via SDP (Kim–Mesbahi 2006)	Place agents to maximise λ₂ subject to proximity, solved as a sequence of semidefinite programs. The centralized "ground truth" the distributed gradients approximate.	optimization (SDP), centralized
Aggregation-GNN decentralized controller (Tolstaya–Gama–Ribeiro 2019)	An order-K graph-filter GNN (K rounds of message-passing — structurally the same local aggregation as power iteration) imitates a centralized flocking controller; transfers across team sizes. But no hard connectivity guarantee — the authors document a subgroup-escape failure, motivating guardrails.	imitation (behaviour cloning) on a graph-conv GNN; K-hop message passing
RL policy with λ₂ as a constraint (Li et al., ICRA 2022)	Shared decentralized policy from local range-scan + neighbour positions; λ₂(Gₜ) enters as a constraint (not just reward), behaviour-cloning warm-start. The now-standard pattern: learned task policy + control-theoretic λ₂ as the safety layer.	deep MARL (PPO-style, constrained) + BC

What information each node must hold & exchange (the Yang–Freeman–Lynch scheme — directly answers "what info does each node need"):

Own position pⁱ and the relative positions pⁱ−pʲ of its one-hop neighbours; shared design constants (comm range r, kernel σ).
Its Laplacian row, implicitly: edge weights Aᵢⱼ=exp(−‖pⁱ−pʲ‖²/2σ²) to each neighbour ⇒ its degree and row entries. Never sees non-neighbours' rows. The product (Lx)ⁱ=Σⱼ Aᵢⱼ(xⁱ−xʲ) is computable from this row + neighbours' xʲ.
Its Fiedler-component estimate xⁱ — a single scalar integrated toward v₂ⁱ. The full vector is never materialised anywhere.
Two PI dynamic-average-consensus estimators (4 internal states): one tracks Ave(x) (deflation/mean), one tracks Ave(x²) (normalisation).
Its own λ₂ estimate λ₂ⁱ=(k₃/k₂)(1−z^{i,2}) (or the Rayleigh-quotient form).
Local gains k₁,k₂,k₃ and consensus gains (identical across agents — no global knowledge needed; only an upper bound on n is required).
Exchanged per round, neighbour-to-neighbour only: five scalars {xⁱ, z^{i,1}, w^{i,1}, z^{i,2}, w^{i,2}} (+ position for the controller). Cost is O(1) per node, scaling only with degree, not n.
Does NOT need: the full Laplacian, full Fiedler vector, global topology, exact n, or any coordinator — every global quantity is reconstructed locally via consensus.

How it fuses with learning. Three layers: (1) replacement — a graph-conv GNN imitates a centralized controller and runs decentralized (but pure imitation lacks a guarantee → subgroup-escape, so it does not subsume the λ₂ machinery); (2) λ₂ as a learning signal — MARL puts algebraic connectivity into the objective as a reward or hard constraint; (3) guardrail/shielding (most robust, now-standard) — a learned policy proposes, a control-theoretic layer (distributed λ₂ estimate, edge-weight barrier, or move-masking) projects onto the connectivity-feasible set. In all hybrids the spectral estimator stays control-theoretic; learning replaces the controller or the objective, not the estimator.

Gaps: time-scale fragility (estimators correct only under strong separation: consensus ≫ power-iteration ≫ motion); λ₂ is global & slow (one weak cut sets it, gradient near-zero almost everywhere, per-node readout singular when xⁱ crosses zero); adversarial / Byzantine robustness essentially open (estimators assume truthful neighbours — one lying agent biases the whole λ₂ estimate; resilient distributed Fiedler estimation is unsolved, and W-MSR/r-robustness protect consensus values, not spectral estimates); connectivity is necessary but not the right objective (λ₂-max over-clusters), and the learning side is thin on guarantees.

→ For us. The big one is the adversarial gap — that's our headline RQ. Two of our levers attack it: the learned GCRN belief is dual-use (attack surface and a within-bounds anomaly detector), and we measure resilience as a stealth-damage frontier rather than a one-shot attack. Practically, treat a distributed/learned connectivity estimate as a policy feature, but remember a key in-house finding: λ₂ is not deployable under partial single-step info — the implementable hard form is a local degree/component check, and the deployable belief is our learned size-invariant estimator, not an online eigensolver.

E.2 · Coverage control — and how it fuses with learning

Coverage control deploys mobile sensors so every point is well-sensed by its nearest robot, with more capacity where it matters. The canonical formulation (Cortés–Martínez–Karatas–Bullo 2004) is locational optimization: minimise H(P)=Σᵢ ∫_{Vᵢ} f(‖q−pᵢ‖)·φ(q) dq. The solution partitions the region into Voronoi cells, shows the optimum is a centroidal Voronoi tessellation (each robot at its importance-weighted centroid), and reaches it by Lloyd's algorithm (move toward your cell centroid). Decentralized over the Delaunay graph. Its limitation — φ assumed known, controller myopic (uses only its own cell, never shares) — is exactly what learning attacks.

Glossary

Term	Definition
Voronoi cell `Vᵢ`	All points closer to `pᵢ` than any other robot. Robot i is responsible for its cell; only Delaunay (boundary-sharing) neighbours matter.
Lloyd's algorithm	Iterate: compute Voronoi cells → move each site to its density-weighted centroid → repeat. As feedback: `uᵢ=−k(pᵢ−Cᵢ)` — gradient descent on the locational cost.
Locational cost `H(P)`	Total importance-weighted sensing error (commonly `f(x)=x²`); the objective coverage minimises.
Centroidal Voronoi tessellation (CVT)	Every site = centroid of its own cell. CVTs are exactly the critical points of H.
Generalized mass / centroid	`Mᵢ=∫_{Vᵢ}φ`, `Cᵢ=(1/Mᵢ)∫_{Vᵢ} q·φ dq`. Gradient: `∂H/∂pᵢ=2Mᵢ(pᵢ−Cᵢ)`.
Density / importance `φ`	Where sensing matters. Known a-priori classically; estimated from data in adaptive/learned versions.
Weighted / power Voronoi	Per-robot weights `wᵢ` make capable robots claim larger cells (heterogeneous teams); weights can be learned online.
CTDE-by-imitation	A centralized clairvoyant CVT expert (full state + true φ) generates optimal Lloyd actions; a decentralized GNN is trained by MSE to reproduce them from local obs + messages, then deployed decentrally.

Methods (classical → learned)

Method	How it works	Learning stack
Voronoi/Lloyd coverage (Cortés–Bullo 2004)	Compute cell → weighted centroid → `uᵢ=−k(pᵢ−Cᵢ)`. Exact gradient descent on H; LaSalle convergence to a CVT. Assumes φ known.	control-theoretic
Decentralized adaptive coverage (Schwager–Rus–Slotine 2009)	Learns φ online: `φ=K(q)ᵀa` with unknown weights a; each robot updates â from its own measurements + a consensus term; moves to its â-centroid. Lyapunov proof of coverage + (under persistent excitation) parameter consensus.	online adaptive control + parameter consensus (no deep nets)
Adaptive weighted-Voronoi (Pierson–Schwager 2017)	Heterogeneous teams: multiplicatively-weighted Voronoi; each robot learns its performance weight online from local signal + neighbours.	online adaptive control (weight adaptation)
Spatial-GNN coverage (Tolstaya et al. 2021)	Environment → spatial graph; a size-equivariant GNN maps each robot's neighbourhood to a move, imitating a centralized expert. Generalises to maps/teams far larger than the expert can solve.	spatial GNN + imitation
GNN decentralized controller (Gosrich–Kumar 2022)	Multi-hop message passing lets each robot fuse non-local info before deciding velocity → beats non-communicating Lloyd, scales/transfers to larger teams.	GNN (multi-hop) + imitation
LPAC (Agarwal–Kumar–Ribeiro 2024) SOTA, open-source	Per robot: CNN perception of a 32×32 local density map → GNN (K=3 hops) communication → MLP velocity. Imitates a clairvoyant CVT expert on 100k state-action pairs. Beats decentralized- and centralized-CVT by ≥20%, transfers zero-shot to larger maps/teams, robust to position noise.	CNN + K-hop GNN + MLP, end-to-end imitation (PyTorch-Geometric)
Constrained-learning coverage (Agarwal et al. 2024)	LPAC architecture trained with primal-dual / Lagrangian optimization so the policy satisfies extra constraints (e.g. connectivity) rather than collapsing them into one scalar.	GNN + constrained (primal-dual) learning
MARL coverage + dynamic density	RL (actor-critic) with coverage-cost reward; targets spawn a dynamic density that attracts agents, coupled to CVT — handles moving/unknown φ the static-φ controller can't.	MARL (actor-critic) + CVT prior

How coverage fuses with learning — four increasingly aggressive layers:

Learn the density, keep the controller (Schwager): φ modelled as linear-in-unknown-weights, estimated online + shared by consensus; motion law stays "move to weighted centroid" — a provable controller wrapped around an online estimator (the cleanest fusion).
Learn relative capability (Pierson): per-robot performance weights learned inside a weighted-Voronoi partition; geometry kept, parameters adapted.
Learn the controller itself (Kumar/Ribeiro): the analytic Lloyd rule is myopic (own cell only, no sharing) ⇒ provably suboptimal under limited range. GNNs with multi-hop message passing fuse non-local info and beat it; LPAC is the mature CNN+GNN+MLP pipeline. The recurring recipe is CTDE-by-imitation: classical theory supplies both the cost and the cheap clairvoyant expert, and a decentralized GNN is supervised to reproduce it.
Learn under constraints / by reward: primal-dual constrained learning to respect connectivity; or MARL reward-shaping for time-varying φ.

Across all four, the centroidal-Voronoi objective stays the backbone (the cost, the expert, or the geometric prior); learning supplies what the theory assumes away — the density, the heterogeneity, and the non-local coordination.

What each node holds: own position + Voronoi/Delaunay-neighbour positions (to compute its cell); the density φ over its cell (known classically; a learned estimate â shared by consensus in adaptive coverage); mass Mᵢ and centroid Cᵢ — the move-toward target, computed locally; heterogeneous: a learned performance/trust weight wᵢ, exchanged with neighbours; GNN/LPAC: a local ego-centric density map (e.g. 32×32 + boundary + neighbour-position channels), a fixed-size learned message exchanged at each of K hops, and in-range neighbours' relative positions.

Gaps: only local optimality (Lloyd → a local CVT, init-dependent); known-/static-density assumption (adaptive needs persistent excitation, which conflicts with staying spread on high-φ areas); learned controllers are bottlenecked by their clairvoyant expert (inherit its local-optimality ceiling; need centralized true-state at train time); assume cooperative honest robots — corrupted neighbour features / dropped links unaddressed (the GNN message layer is a single point of failure); connectivity treated only recently, as a soft constraint — coupling coverage with a hard connectivity guarantee remains under-developed.

→ For us. LPAC is the architecture to study closely: CNN-local-map + K-hop GNN + MLP, trained by imitation of a cheap analytic expert, gives size-invariant transfer — exactly the "scale-free local rule → emergent collective" property the swarm claim requires. But its blind spots are our contributions: it has no relay role (coverage over-clusters), treats connectivity as a soft constraint, and assumes honest neighbours. Our additions — emergent relay role, hard-ish connectivity, and the covert-resilience layer — sit precisely in LPAC's gaps.

E.3 · Exploration / coverage under a connectivity constraint — the three guardrails

The core tension is intrinsic: exploration rewards spreading apart; comms have limited range, so dispersing eventually breaks the network and strands discoveries. The field's central design choice is the connectivity guardrail — the rule for which dispersed configurations are allowed and how/when the network must reform. The literature organises cleanly around three behaviours:

Guardrail	How it works	Representative works
(A) Maintain continuous connectivity	The graph stays a single component at all times. Realised as an attractive potential on the Laplacian (links = springs) or a controller keeping `λ₂>0`; any exploration command is blended with a connectivity-restoring term so robots are reeled back before a link breaks. Hard, always-on guarantee — but conservative: robots can never venture beyond the network's reach.	Hsieh–Cowley–Kumar 2008; Zavlanos–Pappas 2007; Zavlanos–Egerstedt–Pappas 2011; Sabattini et al. 2013
(B) Mask hard-constrain moves	Operates on individual decisions: check each candidate move; if it would disconnect the graph, forbid it and project to the nearest admissible move. In continuous control a CBF-QP renders the connected set forward-invariant; in discrete/distributed settings a local motion-constraint projection (even under delay). Less conservative than a global potential (free inside the connected set), and the natural form inside a learned policy (mask disconnecting actions).	Schuresko–Cortés 2009; Capelli–Sabattini 2020 (CBF); our RedWithinBlue hard guardrail (beats soft degree-floor by ~20 pts)
(C) Grace budget + incentivise reconnection	The team is deliberately allowed to split to explore far, then brought back by rendezvous. Guarantee is temporal: reconnect every T steps (periodic), whenever info must be reported (recurrent), or in the union-over-a-window sense (intermittent). Reconnection is driven by an explicit schedule (job-shop/MILP, event-triggered) or by an incentive (reconnect when the value of sharing exceeds the value of more solo exploration). Buys the most exploration speed, bounds the age-of-information of discoveries, at the price of latency.	Hollinger–Singh 2010/12 (periodic, seminal); Banfi et al. 2018 (recurrent); Kantaros–Zavlanos 2017 (intermittent + LTL); job-shop rendezvous 2024; Jensen–Gini 2013 (sentry/explorer); learned: IR2, IROS 2024

Glossary

Term	Definition
Continuous connectivity	Graph connected at every instant; no splitting move ever allowed (guarantee pointwise in time).
Recurrent connectivity	May disconnect arbitrarily long, but must periodically re-establish — typically to teammates and a base — each time there's something to report (event-driven).
Intermittent connectivity	Required only in the union sense over a sliding window, infinitely often; the instantaneous graph may be split; info flows through a sequence of pairwise meetings.
Periodic connectivity	Special case: regain full connectivity every fixed interval T (Hollinger–Singh) — a tunable knob between spreading and coupling.
Rendezvous	A coordinated meeting to exchange maps. Explicit/scheduled (planned point+time via MILP/job-shop) or implicit/emergent (converge when sharing-value > solo-exploration-value).
Connectivity budget	Bound on how much / how long / how far the team may be disconnected (the period T, max disconnection duration, droppable hops). Grace spends this budget to buy speed.
Age-of-Information (AoI)	Delay between observing info and it reaching its consumer (teammate/base). Under intermittent connectivity it's non-zero, bounded by the reconnection schedule; minimising it is the implicit objective rendezvous timing trades against exploration gain.
Relay/backbone vs frontier/explorer	Division of labour: relays hold positions stitching the network; explorers push into the unknown. Roles can be static, rule-switched (Jensen–Gini), tied to comm-tree depth (A³), or — the open goal — emergent from learning.

What each node holds: own pose/cell + local occupancy belief (free/occupied/unknown) accumulated since the last sync; its in-range neighbour set (local adjacency row) + link qualities — raw material for any connectivity check; a local estimate of a global connectivity quantity (giant-component membership, a degree count, or a distributed λ₂ estimate); frontier/utility info, and (for Grace) a "map-surplus" belief — how much more it knows than neighbours — which sets the value of reconnecting; teammates' last-known positions/headings (to anticipate disconnection and navigate to rendezvous); for scheduled/intermittent, the rendezvous commitment (where/when) + a timer/budget (AoI of its discoveries); for role-based, its current role + the info to decide a switch (comm-tree depth; whether holding position keeps the backbone intact).

How learning enters. (1) Learning replaces the planner, guardrail stays hand-engineered — a MARL/GNN policy chooses moves, connectivity enforced by an external mask/CBF/schedule (safest, most common). (2) Learning absorbs Grace — instead of an explicit schedule, the policy learns the long-horizon disconnect-vs-reconnect trade-off, so rendezvous becomes implicit/emergent (IR2, IROS 2024: SAC + attention + a map-surplus feature + curriculum — the frontier of replacing schedules with learned timing). (3) Learning is meant to grow the role structure — but the few instances (A³ network) still tie roles to a hand-defined comm-tree-depth rule, so differentiation is only semi-emergent. The natural substrate throughout is a GNN over the comm graph (size-invariant transfer) — and a learned per-agent graph-belief is the key under-used enabler.

Gaps (the deep one): learned connectivity-constrained exploration is thin, and almost always pairs a learned explorer with a hand-designed guarantee — a policy that internalises the constraint and still gives a formal guarantee is largely missing; Grace-by-learning is nearly a one-paper field (IR2); emergent explorer/relay roles under the constraint are essentially unrealized (every split is hand-assigned, rule-switched, or keyed to comm-tree depth); per-agent belief/graph substrates are under-used as the carrier of the constraint; realistic comm models (latency, loss, bandwidth, AoI) mostly absent from learned work, and adversarial/stealth robustness of a learned connectivity guardrail is unaddressed.

→ For us. Three of these gaps are literally our open questions, and we already have results in them: (C/learned) our Regime-B (soft + latency-discounted delivery) is exactly a learned-Grace formulation; (emergent roles) our crisp finding — roles emerge only when the constraint is soft enough that relaying pays; under a hard backstop the relay role is redundant and vanishes; (belief substrate) we built the GCRN size-invariant belief that no surveyed method has. The thing to adopt from here is the Grace vocabulary itself — periodic/recurrent/intermittent + age-of-information — because at scale (where one component is geometrically infeasible) this is the only honest regime, and AoI is precisely our covert-attack channel (a relay that silently drifts inflates AoI without tripping a degree-floor).

E.4 · Where classical stops and learning starts (the seams)

All three areas independently flag the same holes; the same four cross-cutting gaps surfaced here as in the flank-A–C reviews above, validated against in-house results. The five seams where classical MRS theory hands off to learning:

Global-info / time-scale seam. Distributed λ₂ needs strong time-scale separation and recovers a global quantity over many rounds. It fails for fast motion / sparse links and — for us — under partial single-step info: the deployable hard constraint is a local degree/component check, the deployable belief a learned size-invariant estimator, not an online eigensolver.
Myopia seam. Lloyd/CVT is exact but uses only its own cell and over-clusters → provably suboptimal under limited range. Learned multi-hop GNNs (LPAC) propagate the non-local info the myopic rule can't.
Known-model seam. Classical assumes φ known/static, links symmetric, neighbours truthful. Learning starts where the model is unknown/time-varying/heterogeneous — for us, the global topology and teammate positions after disconnection (estimated by the belief).
Hard-discrete-composition seam (most central). Control theory gives hard guarantees for a single smooth objective (CBF-QP, edge-barriers, move-projection) but stops at (a) emergent multi-role division of labour, (b) the non-myopic Grace trade-off, (c) composing a flexible policy with the guarantee beyond ad-hoc penalties. Our results pin it: the hard mask is the right safety layer but makes the relay role redundant ⇒ control theory supplies the invariant, learning supplies the non-myopic policy + emergent roles + reconnection timing + the belief.
Cooperative-truthful seam (our RQ). Every classical guarantee — including W-MSR — assumes neighbours are truthful or detectable outliers. Learning + our resilience study becomes necessary the moment the adversary is covert (within-bounds, corrupting the estimate, exploiting graph-position) — where classical robustness goes silent and the dual-use learned belief is the only instrument.

The MRS review's own novelty verdict (the full conjunction is novel; each axis alone is not — learned+decentralized free, λ₂-aware control exists but role-less, learned role differentiation exists but connectivity-blind, covert single-agent poisoning is an active subfield, LPAC transfer is unconstrained/role-less, learned implicit rendezvous ≈ one paper) is identical to the combined gap verdict above and is not repeated here. Compiled from a 4-agent verified literature workflow (mrs-litreview); arXiv-weighted — a robotics-venue (ICRA/IROS/RSS/CoRL/T-RO) sweep is still owed before locking paper claims.

References — MRS technical review

Connectivity estimation & maintenance: Yang, Freeman, Gordon, Lynch, Srinivasa, Sukthankar (2010), Automatica 46(2); Sabattini, Chopra, Secchi (2013), IJRR 32(12); de Gennaro, Jadbabaie (2006), IEEE CDC; Zavlanos, Pappas (2007), IEEE T-RO 23(4); Kim, Mesbahi (2006), IEEE TAC 51(1); Ji, Egerstedt (2007), IEEE T-RO 23(4); Kempe, McSherry (2008), JCSS 74(1); Tolstaya, Gama, Paulos, Pappas, Kumar, Ribeiro (2019), CoRL; Li, Jie, Kong, Cheng (2022), IEEE ICRA (arXiv:2109.08536).

Coverage control: Cortés, Martínez, Karatas, Bullo (2004), IEEE T-RA 20(2); Bullo, Cortés, Martínez (2009), Distributed Control of Robotic Networks, Princeton; Schwager, Rus, Slotine (2009), IJRR 28(3); Pierson, Figueiredo, Pimenta, Schwager (2017), IJRR 36(3); Du, Faber, Gunzburger (1999), SIAM Review 41(4) / Lloyd (1982); Tolstaya, Paulos, Kumar, Ribeiro (2021), IROS (arXiv:2011.01119); Gosrich, Mayya, Li, Paulos, Yim, Ribeiro, Kumar (2022), ICRA (arXiv:2109.15278); Agarwal, Muthukrishnan, Gosrich, Kumar, Ribeiro (2024), LPAC (arXiv:2401.04855; open-source CoverageControl); Agarwal, Kumar, Ribeiro et al. (2024), constrained coverage (arXiv:2409.11311).

Connectivity-constrained exploration: Hsieh, Cowley, Kumar, Taylor (2008), JFR 25(1-2); Zavlanos, Egerstedt, Pappas (2011), Proc. IEEE 99(9); Schuresko, Cortés (2009), JINT 56(1-2); Capelli, Sabattini (2020), ICRA (arXiv:2003.10178); Hollinger, Singh (2012), IEEE T-RO 28(4) (ICRA 2010); Banfi, Quattrini Li, Rekleitis, Amigoni, Caro (2018), Auton. Robots 42(4); Kantaros, Zavlanos (2017), IEEE TAC 62(7); Amigoni, Banfi, Basilico (2019), IEEE Intell. Syst.; Jensen, Gini (2013), IJCAI; Ribeiro da Silva, Chaimowicz, Silva, Hsieh (2024) (arXiv:2309.13494); Tan, Ma, Liang, Chng, Cao, Sartoretti (2024), IR2, IROS (arXiv:2409.04730); Zeng et al. (2025), A³ Network (arXiv:2509.18526).

F · foundational references

Foundational references (annotated)

An annotated bibliography of the field's standing taxonomies and formal-model anchors, organised by the four axes a swarm/multi-agent mission can be classified along. Each entry: full citation · thesis · why it matters here.

These were the local reference library used to position the project's mission families against the dominant axes in the literature. Note: the PDFs themselves have been retired from the workspace — each is retrievable from its publisher via the DOI / venue below.

BEHAVIOR axis — swarm behavior taxonomies (the de facto standard to cite)

Brambilla, Ferrante, Birattari, Dorigo (2013). Swarm robotics: a review from the swarm engineering perspective. Swarm Intelligence 7(1):1–41. DOI 10.1007/s11721-012-0075-2. — Thesis: presents two taxonomies, one for design/analysis methods and one for collective behaviors. Why it matters: the most-cited swarm-robotics taxonomy (>1700 citations) — the canonical anchor for any behavior-type classification.
Schranz, Umlauft, Sende, Elmenreich (2020). Swarm Robotic Behaviors and Current Applications. Frontiers in Robotics and AI 7:36. DOI 10.3389/frobt.2020.00036. — Thesis: extends Brambilla into four top-level buckets — Spatial Organization, Navigation, Decision Making, Miscellaneous — adding collective localization, collective perception, synchronization, self-healing, self-reproduction. Why it matters: the finest-grained behavior map; our mission families map onto its sub-behaviors (information-gathering → collective exploration/perception/localization).

SYSTEM / TASK-ALLOCATION axis — different axes (capabilities, allocation, coordination)

Dudek, Jenkin, Milios, Wilkes (1996). A taxonomy for multi-agent robotics. Autonomous Robots 3(4):375–397. — Thesis: classifies multi-agent systems by collective size, communication range, communication topology, bandwidth, reconfigurability, processing ability, composition. Why it matters: the system-design (not mission) taxonomy — the vocabulary for the communication-constraint dimensions our work lives in.
Gerkey, Matarić (2004). A formal analysis and taxonomy of task allocation in multi-robot systems. IJRR 23(9):939–954. DOI 10.1177/0278364904045564. — Thesis: the standard MRTA axes — single-task vs multi-task robots (ST/MT), single-robot vs multi-robot tasks (SR/MR), instantaneous vs time-extended assignment (IA/TA). Why it matters: the reference framing for role/task allocation — the lens our relay-vs-frontier division is positioned against.
Verma, Ranga (2021). Multi-Robot Coordination Analysis, Taxonomy, Challenges and Future Scope. Journal of Intelligent and Robotic Systems 102:10. DOI 10.1007/s10846-021-01378-2. — Thesis: a 6-dimension coordination taxonomy — communication, planning, control architecture, scalability, decision-making, application domain. Why it matters: the most recent coordination taxonomy; situates the decentralized/no-consensus coordination axis our design occupies.

FORMAL-MODEL axis — Dec-POMDP family (decentralized partial observability)

Oliehoek (2012). Decentralized POMDPs. In Wiering & van Otterlo (eds.), Reinforcement Learning: State-of-the-Art, Springer. — Thesis: lays out the Dec-POMDP subclass hierarchy (MMDP, Dec-MDP, Dec-POMDP, ND-POMDP, TI-Dec-MDP, OC-Dec-MDP). Why it matters: the formal model our setting is — decentralized, partially observable — and the chapter that names the tractable subclasses.
Amato, Chowdhary, Geramifard, Ure, Kochenderfer. Decentralized Control of Partially Observable Markov Decision Processes. Survey. — Thesis: covers Dec-POMDP model assumptions, complexity (NEXP-complete), and solution methods, with a subclass complexity table. Why it matters: the complexity anchor — establishes why exact decentralized planning is intractable and learning/approximation is required.

DOMAIN anchor — aerial swarm robotics

Chung, Paranjape, Dames, Shen, Kumar (2018). A Survey on Aerial Swarm Robotics. IEEE Transactions on Robotics 34(4):837–855. — Thesis: domain survey covering the graph-Laplacian / consensus machinery for coordinated aerial motion. Why it matters: the domain-specific anchor for the hybrid (sensing + comms + coordinated motion) category — ties the consensus/Laplacian formalism of §E.1 to a concrete swarm domain.