Five literature investigations — two PRISMA systematic reviews (connectivity-role-allocation; covert misbehaviour), two deep-research sweeps (connectivity-constrained exploration; resilience), and a code hunt — across four flanks. Honest verdict: each flank is well-trodden alone; the intersection is the open seam.
The field organizes connectivity-constrained exploration along a strictness spectrum — None → event-based / recurrent → intermittent → continuous. This is the soft↔hard axis our own work moved along: our soft degree-floor sits near the recurrent end; our hard guardrail is the continuous end. Under the strictest (continuous) regimes the field tends to retreat to centralized planning — which is exactly the room a decentralized, learned approach has.
A · connectivity + rolesThe literature splits into two camps that no study bridges:
Classical relay/explorer + no-consensus allocation (well-established, all heuristic or centralized): de Hoog's role-based exploration, CBAX (centralized, bandwidth-aware Steiner relays), Cesare's self-sacrificing UAVs, DRBECM-ML (RSSI-hysteresis roles), MoRoCo; GVGExp (Voronoi "gate" partition) and Cross-rank (TSP spreading, position-only sharing ~100 B/s); decentralized allocation via market/auction MRTA (Zlot, Sheng), potential fields (Liu & Lyons), and stigmergy (Andries, Kuyucu).
| our choice | prior art that validates it |
|---|---|
| hysteresis role-switch (enter hi / release lo) | DRBECM-ML's two-sided RSSI hysteresis (−65 / −75 dBm) |
| hard connectivity guardrail (mask disconnecting moves) | CBAX's hard feasible-set constraint; Lin 2020 projection; 9738445 hard mask |
| lightweight intent channel (share little) | Cross-rank shares positions only (~100 B/s), anti-redundancy by spatial spreading |
| cluster | representative work | stealthy | micro→macro | spatial mission | resilience |
|---|---|---|---|---|---|
| Covert c-MARL poisoning / backdoor | One4All, BLAST, Spatiotemporal backdoor, CAMA, AMI | yes | yes (1→team) | no (SMAC games) | attack-demo |
| Stealthy attacks on networked control (CPS) | stealthy FDI / zero-dynamics attacks | yes (formal) | partial | no (linear consensus) | detection |
| Resilient consensus / r-robustness | W-MSR; secure swarm cooperation | no (needs detectable) | via graph robustness | flocking | yes |
| Micro→macro / cascade modelling | cascading-failure risk, encounter diffusion | no (stochastic) | yes | abstract | yes |
The active subfield: single covert agent → poison the whole cooperative team is real and growing (One4All 2022; BLAST 2025 — behaves normally until a trigger, 91.6% success / 3.7% clean-variance; an OpenReview titled "Single-Agent Poisoning Attacks Suffice to Ruin Multi-Agent…"). But all of it is on game benchmarks (SMAC), trigger-based, attack-demonstrations — not spatial missions, not role/position-aware, not resilience curves.
C · resilienceDo NOT claim: "first connectivity-aware MARL coverage" (refuted by 9738445), "first learned role allocation for exploration" (refuted by 2312.01747), or "first covert single-agent attack on cooperative teams" (refuted by One4All / BLAST).
validityFlank A — connectivity & roles: Amigoni/Banfi/Basilico connectivity-exploration survey (8267592), Li 2021 (λ₂/CPO), Lin 2020, DRL Multitarget Coverage, GVGExp, intermittent-connectivity ILP, Cross-rank, de Hoog (5509803 / MRESim), DRBECM-ML (WiOpt 2025 / repo), MoRoCo; role MARL — ROMA, RODE, LDSA, ACORM, CORD, MAVEN; learned exploration — Area-Search, MARVEL, ACE, evolved swarm.
Flank B — covert misbehaviour: One4All, BLAST, Spatiotemporal backdoor, AMI, CAMA, "Single-Agent Poisoning Suffices…", Constrained Black-Box Attacks, Selective Adversarial Fault Induction.
Flank C — resilience: Sundaram et al. robust consensus / r-robustness (W-MSR), Usevitch & Panagou resilient leader-follower (Automatica 2019), Saulnier et al. Resilient Flocking (RA-L 2017), decentralized self-repair connectivity, secure swarm cooperation (Science Robotics), Towards Fault Tolerance in MARL, plus resilient-MARL preprints (2305.12872, 2111.06776, 2204.10063).
Classic taxonomies (docs/taxonomy-refs/): Dudek 1996 (multi-agent robotics taxonomy), Gerkey & Matarić 2004 (MRTA taxonomy, IJRR), Verma & Ranga 2021 (multi-robot coordination taxonomy, JIRS), Chung et al. 2018 (aerial swarm robotics survey, T-RO), Brambilla et al. 2013 (swarm robotics / swarm engineering), Schranz et al. 2020 (swarm robotic behaviours), Amato (Dec-POMDP survey), Oliehoek 2012 (Dec-POMDPs RL chapter). Plus the generated marl_taxonomy/ field map (70 entries + gap/venue reports).
Project drafts (references/): Stealth Attacks on Swarms, Swarm Resiliency, mission family, advML report, ExoRL report, Formalization.
A method-level technical review for a sparse, networked team of autonomous agents doing area coverage under a communication constraint — built toward studying covert-misbehaviour resilience. Three areas: distributed connectivity estimation · coverage control · connectivity-constrained exploration.
The multi-robot-systems (MRS) field has worked these exact problems for ~20 years. This review pulls the canonical methods, defines the technical terms from them, states what information each node must hold, shows how the classical methods fuse with learning (MARL/GNN), and lays out the three connectivity "guardrail" strategies (maintain · mask · grace). The novelty-conjunction verdict it lands on is the same one already stated above — see the combined gap verdict; this section does not repeat it, it supplies the method-level depth underneath it.
How does a team where each robot sees only its one-hop neighbours keep the whole comm graph connected? The field makes "connected" quantitative through λ₂, the second-smallest eigenvalue of the graph Laplacian (the Fiedler value): it is > 0 iff the graph is connected and grows as the graph gets better-knit. The hard part: λ₂ and its eigenvector are global properties, yet each robot only knows its own Laplacian row. The signature contribution is a family of distributed power-iteration + average-consensus algorithms that let every robot estimate its own component of the Fiedler vector and a local copy of λ₂ from neighbour-to-neighbour messages alone — then follow the gradient of λ₂ to hold connectivity. Almost all of it is control-theoretic; learning enters as a replacement controller (GNN) or as a constraint/guardrail around a learned policy.
| Term | Definition |
|---|---|
Graph Laplacian L = D − A | Degree matrix minus adjacency. Symmetric PSD, row-sums zero (L·1=0); spectrum 0=λ₁≤λ₂≤… encodes connectivity. Each robot natively knows only its own row of L. |
| Algebraic connectivity / Fiedler value (λ₂) | 2nd-smallest eigenvalue of L. λ₂>0 ⇔ graph connected (Fiedler 1973); monotone as edges/weights increase — a smooth scalar surrogate for "how connected". |
| Fiedler vector (v₂) | Eigenvector for λ₂, orthogonal to 1. Component v₂ⁱ says how robot i sits on the graph's weakest cut; the λ₂ gradient is built from edge differences (v₂ⁱ−v₂ʲ)², so a robot needs only its own component + neighbours'. |
| Why the second eigenvalue | λ₁ is always 0 (eigenvector 1), carrying no connectivity info. Connectivity lives in λ₂ — which requires deflating the trivial 1 direction so λ₂ becomes the dominant mode. |
| Distributed power iteration | Repeatedly apply a shifted/deflated operator (I−αL) to a vector; it converges to v₂. The product L·x is local (each node combines only neighbour values) — turning power iteration into rounds of neighbour message-passing. |
| Dynamic average consensus | The PI (proportional-integral) estimator (Freeman–Yang–Lynch 2006) that tracks the average of a time-varying signal from neighbour exchanges. It supplies the two global scalars — the deflation mean Ave(x) and normalisation Ave(x²) — that power iteration needs, without any node holding the whole vector. |
| Connectivity gradient | ∂λ₂/∂pᵏ = Σⱼ −Aₖⱼ(v₂ᵏ−v₂ʲ)²(pᵏ−pʲ)/σ². Local: robot k needs only its own Fiedler component, its neighbours', and relative positions. Following it (or a barrier-potential of λ₂) is the connectivity-maintenance control law. |
| Method | How it works | Learning stack |
|---|---|---|
| Decentralized deflated power-iteration + PI consensus (Yang–Freeman–Lynch 2010) | Each agent integrates one scalar xⁱ→v₂ⁱ with three terms (deflate the 1 mode, apply −L, renormalise); reads off λ₂ via Rayleigh quotient or the normalisation gain; then moves up the λ₂ gradient. The reference scheme. | control-theoretic (ODE + consensus + gradient; Lyapunov proof) |
| Power-iteration + potential-barrier controller (Sabattini–Chopra–Secchi 2013) | Same estimator, wrapped by a potential V(λ₂)→∞ as λ₂→ε. The barrier dominates near the threshold ⇒ provably keeps λ₂>ε for all time even with an added bounded task controller. | control-theoretic |
| Fiedler-supergradient ascent (de Gennaro–Jadbabaie 2006) | Estimate v₂ (distributed spectral analysis), ascend λ₂ via dλ₂ = v₂ᵀ(dL)v₂. Establishes λ₂-as-objective. | control-theoretic + distributed eigensolver |
| Nearest-neighbour connectivity potentials (Zavlanos–Pappas 2007) | Potentials on existing links diverge as a link nears comm range ⇒ no current edge is ever lost. Preserves the edge set rather than tracking λ₂ (more conservative). | control-theoretic |
| Nonlinear edge-weight coordination (Ji–Egerstedt 2007) | Edge weights blow up near sensing range ⇒ the weighted-Laplacian flow never disconnects, with no λ₂ computation. Simple, decentralized, conservative (freezes initial topology). | control-theoretic |
| λ₂-maximisation via SDP (Kim–Mesbahi 2006) | Place agents to maximise λ₂ subject to proximity, solved as a sequence of semidefinite programs. The centralized "ground truth" the distributed gradients approximate. | optimization (SDP), centralized |
| Aggregation-GNN decentralized controller (Tolstaya–Gama–Ribeiro 2019) | An order-K graph-filter GNN (K rounds of message-passing — structurally the same local aggregation as power iteration) imitates a centralized flocking controller; transfers across team sizes. But no hard connectivity guarantee — the authors document a subgroup-escape failure, motivating guardrails. | imitation (behaviour cloning) on a graph-conv GNN; K-hop message passing |
| RL policy with λ₂ as a constraint (Li et al., ICRA 2022) | Shared decentralized policy from local range-scan + neighbour positions; λ₂(Gₜ) enters as a constraint (not just reward), behaviour-cloning warm-start. The now-standard pattern: learned task policy + control-theoretic λ₂ as the safety layer. | deep MARL (PPO-style, constrained) + BC |
pⁱ and the relative positions pⁱ−pʲ of its one-hop neighbours; shared design constants (comm range r, kernel σ).Aᵢⱼ=exp(−‖pⁱ−pʲ‖²/2σ²) to each neighbour ⇒ its degree and row entries. Never sees non-neighbours' rows. The product (Lx)ⁱ=Σⱼ Aᵢⱼ(xⁱ−xʲ) is computable from this row + neighbours' xʲ.xⁱ — a single scalar integrated toward v₂ⁱ. The full vector is never materialised anywhere.Ave(x) (deflation/mean), one tracks Ave(x²) (normalisation).λ₂ⁱ=(k₃/k₂)(1−z^{i,2}) (or the Rayleigh-quotient form).{xⁱ, z^{i,1}, w^{i,1}, z^{i,2}, w^{i,2}} (+ position for the controller). Cost is O(1) per node, scaling only with degree, not n.How it fuses with learning. Three layers: (1) replacement — a graph-conv GNN imitates a centralized controller and runs decentralized (but pure imitation lacks a guarantee → subgroup-escape, so it does not subsume the λ₂ machinery); (2) λ₂ as a learning signal — MARL puts algebraic connectivity into the objective as a reward or hard constraint; (3) guardrail/shielding (most robust, now-standard) — a learned policy proposes, a control-theoretic layer (distributed λ₂ estimate, edge-weight barrier, or move-masking) projects onto the connectivity-feasible set. In all hybrids the spectral estimator stays control-theoretic; learning replaces the controller or the objective, not the estimator.
Coverage control deploys mobile sensors so every point is well-sensed by its nearest robot, with more capacity where it matters. The canonical formulation (Cortés–Martínez–Karatas–Bullo 2004) is locational optimization: minimise H(P)=Σᵢ ∫_{Vᵢ} f(‖q−pᵢ‖)·φ(q) dq. The solution partitions the region into Voronoi cells, shows the optimum is a centroidal Voronoi tessellation (each robot at its importance-weighted centroid), and reaches it by Lloyd's algorithm (move toward your cell centroid). Decentralized over the Delaunay graph. Its limitation — φ assumed known, controller myopic (uses only its own cell, never shares) — is exactly what learning attacks.
| Term | Definition |
|---|---|
Voronoi cell Vᵢ | All points closer to pᵢ than any other robot. Robot i is responsible for its cell; only Delaunay (boundary-sharing) neighbours matter. |
| Lloyd's algorithm | Iterate: compute Voronoi cells → move each site to its density-weighted centroid → repeat. As feedback: uᵢ=−k(pᵢ−Cᵢ) — gradient descent on the locational cost. |
Locational cost H(P) | Total importance-weighted sensing error (commonly f(x)=x²); the objective coverage minimises. |
| Centroidal Voronoi tessellation (CVT) | Every site = centroid of its own cell. CVTs are exactly the critical points of H. |
| Generalized mass / centroid | Mᵢ=∫_{Vᵢ}φ, Cᵢ=(1/Mᵢ)∫_{Vᵢ} q·φ dq. Gradient: ∂H/∂pᵢ=2Mᵢ(pᵢ−Cᵢ). |
Density / importance φ | Where sensing matters. Known a-priori classically; estimated from data in adaptive/learned versions. |
| Weighted / power Voronoi | Per-robot weights wᵢ make capable robots claim larger cells (heterogeneous teams); weights can be learned online. |
| CTDE-by-imitation | A centralized clairvoyant CVT expert (full state + true φ) generates optimal Lloyd actions; a decentralized GNN is trained by MSE to reproduce them from local obs + messages, then deployed decentrally. |
| Method | How it works | Learning stack |
|---|---|---|
| Voronoi/Lloyd coverage (Cortés–Bullo 2004) | Compute cell → weighted centroid → uᵢ=−k(pᵢ−Cᵢ). Exact gradient descent on H; LaSalle convergence to a CVT. Assumes φ known. | control-theoretic |
| Decentralized adaptive coverage (Schwager–Rus–Slotine 2009) | Learns φ online: φ=K(q)ᵀa with unknown weights a; each robot updates â from its own measurements + a consensus term; moves to its â-centroid. Lyapunov proof of coverage + (under persistent excitation) parameter consensus. | online adaptive control + parameter consensus (no deep nets) |
| Adaptive weighted-Voronoi (Pierson–Schwager 2017) | Heterogeneous teams: multiplicatively-weighted Voronoi; each robot learns its performance weight online from local signal + neighbours. | online adaptive control (weight adaptation) |
| Spatial-GNN coverage (Tolstaya et al. 2021) | Environment → spatial graph; a size-equivariant GNN maps each robot's neighbourhood to a move, imitating a centralized expert. Generalises to maps/teams far larger than the expert can solve. | spatial GNN + imitation |
| GNN decentralized controller (Gosrich–Kumar 2022) | Multi-hop message passing lets each robot fuse non-local info before deciding velocity → beats non-communicating Lloyd, scales/transfers to larger teams. | GNN (multi-hop) + imitation |
| LPAC (Agarwal–Kumar–Ribeiro 2024) SOTA, open-source | Per robot: CNN perception of a 32×32 local density map → GNN (K=3 hops) communication → MLP velocity. Imitates a clairvoyant CVT expert on 100k state-action pairs. Beats decentralized- and centralized-CVT by ≥20%, transfers zero-shot to larger maps/teams, robust to position noise. | CNN + K-hop GNN + MLP, end-to-end imitation (PyTorch-Geometric) |
| Constrained-learning coverage (Agarwal et al. 2024) | LPAC architecture trained with primal-dual / Lagrangian optimization so the policy satisfies extra constraints (e.g. connectivity) rather than collapsing them into one scalar. | GNN + constrained (primal-dual) learning |
| MARL coverage + dynamic density | RL (actor-critic) with coverage-cost reward; targets spawn a dynamic density that attracts agents, coupled to CVT — handles moving/unknown φ the static-φ controller can't. | MARL (actor-critic) + CVT prior |
What each node holds: own position + Voronoi/Delaunay-neighbour positions (to compute its cell); the density φ over its cell (known classically; a learned estimate â shared by consensus in adaptive coverage); mass Mᵢ and centroid Cᵢ — the move-toward target, computed locally; heterogeneous: a learned performance/trust weight wᵢ, exchanged with neighbours; GNN/LPAC: a local ego-centric density map (e.g. 32×32 + boundary + neighbour-position channels), a fixed-size learned message exchanged at each of K hops, and in-range neighbours' relative positions.
The core tension is intrinsic: exploration rewards spreading apart; comms have limited range, so dispersing eventually breaks the network and strands discoveries. The field's central design choice is the connectivity guardrail — the rule for which dispersed configurations are allowed and how/when the network must reform. The literature organises cleanly around three behaviours:
| Guardrail | How it works | Representative works |
|---|---|---|
| (A) Maintain continuous connectivity | The graph stays a single component at all times. Realised as an attractive potential on the Laplacian (links = springs) or a controller keeping λ₂>0; any exploration command is blended with a connectivity-restoring term so robots are reeled back before a link breaks. Hard, always-on guarantee — but conservative: robots can never venture beyond the network's reach. | Hsieh–Cowley–Kumar 2008; Zavlanos–Pappas 2007; Zavlanos–Egerstedt–Pappas 2011; Sabattini et al. 2013 |
| (B) Mask hard-constrain moves | Operates on individual decisions: check each candidate move; if it would disconnect the graph, forbid it and project to the nearest admissible move. In continuous control a CBF-QP renders the connected set forward-invariant; in discrete/distributed settings a local motion-constraint projection (even under delay). Less conservative than a global potential (free inside the connected set), and the natural form inside a learned policy (mask disconnecting actions). | Schuresko–Cortés 2009; Capelli–Sabattini 2020 (CBF); our RedWithinBlue hard guardrail (beats soft degree-floor by ~20 pts) |
| (C) Grace budget + incentivise reconnection | The team is deliberately allowed to split to explore far, then brought back by rendezvous. Guarantee is temporal: reconnect every T steps (periodic), whenever info must be reported (recurrent), or in the union-over-a-window sense (intermittent). Reconnection is driven by an explicit schedule (job-shop/MILP, event-triggered) or by an incentive (reconnect when the value of sharing exceeds the value of more solo exploration). Buys the most exploration speed, bounds the age-of-information of discoveries, at the price of latency. | Hollinger–Singh 2010/12 (periodic, seminal); Banfi et al. 2018 (recurrent); Kantaros–Zavlanos 2017 (intermittent + LTL); job-shop rendezvous 2024; Jensen–Gini 2013 (sentry/explorer); learned: IR2, IROS 2024 |
| Term | Definition |
|---|---|
| Continuous connectivity | Graph connected at every instant; no splitting move ever allowed (guarantee pointwise in time). |
| Recurrent connectivity | May disconnect arbitrarily long, but must periodically re-establish — typically to teammates and a base — each time there's something to report (event-driven). |
| Intermittent connectivity | Required only in the union sense over a sliding window, infinitely often; the instantaneous graph may be split; info flows through a sequence of pairwise meetings. |
| Periodic connectivity | Special case: regain full connectivity every fixed interval T (Hollinger–Singh) — a tunable knob between spreading and coupling. |
| Rendezvous | A coordinated meeting to exchange maps. Explicit/scheduled (planned point+time via MILP/job-shop) or implicit/emergent (converge when sharing-value > solo-exploration-value). |
| Connectivity budget | Bound on how much / how long / how far the team may be disconnected (the period T, max disconnection duration, droppable hops). Grace spends this budget to buy speed. |
| Age-of-Information (AoI) | Delay between observing info and it reaching its consumer (teammate/base). Under intermittent connectivity it's non-zero, bounded by the reconnection schedule; minimising it is the implicit objective rendezvous timing trades against exploration gain. |
| Relay/backbone vs frontier/explorer | Division of labour: relays hold positions stitching the network; explorers push into the unknown. Roles can be static, rule-switched (Jensen–Gini), tied to comm-tree depth (A³), or — the open goal — emergent from learning. |
What each node holds: own pose/cell + local occupancy belief (free/occupied/unknown) accumulated since the last sync; its in-range neighbour set (local adjacency row) + link qualities — raw material for any connectivity check; a local estimate of a global connectivity quantity (giant-component membership, a degree count, or a distributed λ₂ estimate); frontier/utility info, and (for Grace) a "map-surplus" belief — how much more it knows than neighbours — which sets the value of reconnecting; teammates' last-known positions/headings (to anticipate disconnection and navigate to rendezvous); for scheduled/intermittent, the rendezvous commitment (where/when) + a timer/budget (AoI of its discoveries); for role-based, its current role + the info to decide a switch (comm-tree depth; whether holding position keeps the backbone intact).
How learning enters. (1) Learning replaces the planner, guardrail stays hand-engineered — a MARL/GNN policy chooses moves, connectivity enforced by an external mask/CBF/schedule (safest, most common). (2) Learning absorbs Grace — instead of an explicit schedule, the policy learns the long-horizon disconnect-vs-reconnect trade-off, so rendezvous becomes implicit/emergent (IR2, IROS 2024: SAC + attention + a map-surplus feature + curriculum — the frontier of replacing schedules with learned timing). (3) Learning is meant to grow the role structure — but the few instances (A³ network) still tie roles to a hand-defined comm-tree-depth rule, so differentiation is only semi-emergent. The natural substrate throughout is a GNN over the comm graph (size-invariant transfer) — and a learned per-agent graph-belief is the key under-used enabler.
All three areas independently flag the same holes; the same four cross-cutting gaps surfaced here as in the flank-A–C reviews above, validated against in-house results. The five seams where classical MRS theory hands off to learning:
An annotated bibliography of the field's standing taxonomies and formal-model anchors, organised by the four axes a swarm/multi-agent mission can be classified along. Each entry: full citation · thesis · why it matters here.