Related work & the gap

Five literature investigations — two PRISMA systematic reviews (connectivity-role-allocation; covert misbehaviour), two deep-research sweeps (connectivity-constrained exploration; resilience), and a code hunt — across four flanks. Honest verdict: each flank is well-trodden alone; the intersection is the open seam.

0 · The connectivity-regime spectrum A · Connectivity-aware exploration & role allocation B · Covert micro→macro misbehaviour C · Resilience of autonomous teams D · Methods worth borrowing The combined gap verdict Threats to validity Sources & document corpus E · What to borrow from Multi-Robot Systems F · Foundational references (annotated)
0 · framing

The connectivity-regime spectrum

The field organizes connectivity-constrained exploration along a strictness spectrum — None → event-based / recurrent → intermittent → continuous. This is the soft↔hard axis our own work moved along: our soft degree-floor sits near the recurrent end; our hard guardrail is the continuous end. Under the strictest (continuous) regimes the field tends to retreat to centralized planning — which is exactly the room a decentralized, learned approach has.

A · connectivity + roles

Connectivity-aware exploration & role allocation

PRISMA review (96 → 24 extracted; 0 met all four criteria) + a connectivity-exploration deep-research sweep + a code hunt.

The literature splits into two camps that no study bridges:

Classical relay/explorer + no-consensus allocation (well-established, all heuristic or centralized): de Hoog's role-based exploration, CBAX (centralized, bandwidth-aware Steiner relays), Cesare's self-sacrificing UAVs, DRBECM-ML (RSSI-hysteresis roles), MoRoCo; GVGExp (Voronoi "gate" partition) and Cross-rank (TSP spreading, position-only sharing ~100 B/s); decentralized allocation via market/auction MRTA (Zlot, Sheng), potential fields (Liu & Lyons), and stigmergy (Andries, Kuyucu).

Independent validation of three of our design choices.
our choiceprior art that validates it
hysteresis role-switch (enter hi / release lo)DRBECM-ML's two-sided RSSI hysteresis (−65 / −75 dBm)
hard connectivity guardrail (mask disconnecting moves)CBAX's hard feasible-set constraint; Lin 2020 projection; 9738445 hard mask
lightweight intent channel (share little)Cross-rank shares positions only (~100 B/s), anti-redundancy by spatial spreading
Released code we can study. MRESim (de Hoog's role-based explorer/relay simulator, Java — a baseline to beat) and DRBECM (Python/NumPy — the geometric base, not the ML version). Most others released no code. Honest note: the strongest explicit negotiated anti-redundancy claims were the ones that got refuted in verification — that mechanism class is weakly evidenced, which makes our intent-signalling negotiation a less-trodden path.

B · covert micro→macro

Covert micro→macro misbehaviour

PRISMA review (degraded by tooling → re-established by direct scan). Four clusters; the finding is which never overlap.

clusterrepresentative workstealthymicro→macrospatial missionresilience
Covert c-MARL poisoning / backdoorOne4All, BLAST, Spatiotemporal backdoor, CAMA, AMIyesyes (1→team)no (SMAC games)attack-demo
Stealthy attacks on networked control (CPS)stealthy FDI / zero-dynamics attacksyes (formal)partialno (linear consensus)detection
Resilient consensus / r-robustnessW-MSR; secure swarm cooperationno (needs detectable)via graph robustnessflockingyes
Micro→macro / cascade modellingcascading-failure risk, encounter diffusionno (stochastic)yesabstractyes

The active subfield: single covert agent → poison the whole cooperative team is real and growing (One4All 2022; BLAST 2025 — behaves normally until a trigger, 91.6% success / 3.7% clean-variance; an OpenReview titled "Single-Agent Poisoning Attacks Suffice to Ruin Multi-Agent…"). But all of it is on game benchmarks (SMAC), trigger-based, attack-demonstrations — not spatial missions, not role/position-aware, not resilience curves.

C · resilience

Resilience of autonomous teams

Resilience deep-research sweep (automated synthesis failed on tooling → direct-read of the surfaced sources).

Key distinction this surfaced: connectivity ≠ resilience. A graph can have high algebraic connectivity (λ₂) yet not be r-robust — one well-placed adversary breaks consensus. So a connectivity objective should target λ₂, but a resilience objective should target graph robustness r (W-MSR). Crucially, all this theory assumes adversaries are detectable outliers (the defense discards extremes) — which is exactly what a stealthy agent evades, so the F-robustness guarantees go silent in the covert regime.
D · borrow

Methods worth borrowing

verdict

The combined gap verdict

Every flank is NARROWED, not open — so the contribution must be the conjunction, never a single leg. Connectivity-aware role allocation exists (homogeneous OR connectivity-blind, never both); covert single-agent poisoning of cooperative MARL exists (game benchmarks, trigger-based); resilience theory exists (assumes detectable adversaries). The open seam is the intersection: covert, within-bounds misbehaviour's micro→macro propagation in a spatial connectivity mission, governed by role / graph-position, with a learned graph belief as both attack-surface and detector, characterized as a stealth–damage frontier.

Do NOT claim: "first connectivity-aware MARL coverage" (refuted by 9738445), "first learned role allocation for exploration" (refuted by 2312.01747), or "first covert single-agent attack on cooperative teams" (refuted by One4All / BLAST).

validity

Threats to validity

corpus

Sources & document corpus

The full set of documents gathered across the five investigations and the local reference library. Web sources are the substantive/verified ones; quality was mixed and the verified set is what the verdict rests on.

Surveyed literature (web)

Flank A — connectivity & roles: Amigoni/Banfi/Basilico connectivity-exploration survey (8267592), Li 2021 (λ₂/CPO), Lin 2020, DRL Multitarget Coverage, GVGExp, intermittent-connectivity ILP, Cross-rank, de Hoog (5509803 / MRESim), DRBECM-ML (WiOpt 2025 / repo), MoRoCo; role MARL — ROMA, RODE, LDSA, ACORM, CORD, MAVEN; learned exploration — Area-Search, MARVEL, ACE, evolved swarm.

Flank B — covert misbehaviour: One4All, BLAST, Spatiotemporal backdoor, AMI, CAMA, "Single-Agent Poisoning Suffices…", Constrained Black-Box Attacks, Selective Adversarial Fault Induction.

Flank C — resilience: Sundaram et al. robust consensus / r-robustness (W-MSR), Usevitch & Panagou resilient leader-follower (Automatica 2019), Saulnier et al. Resilient Flocking (RA-L 2017), decentralized self-repair connectivity, secure swarm cooperation (Science Robotics), Towards Fault Tolerance in MARL, plus resilient-MARL preprints (2305.12872, 2111.06776, 2204.10063).

Local reference library (gathered in the workspace)

Classic taxonomies (docs/taxonomy-refs/): Dudek 1996 (multi-agent robotics taxonomy), Gerkey & Matarić 2004 (MRTA taxonomy, IJRR), Verma & Ranga 2021 (multi-robot coordination taxonomy, JIRS), Chung et al. 2018 (aerial swarm robotics survey, T-RO), Brambilla et al. 2013 (swarm robotics / swarm engineering), Schranz et al. 2020 (swarm robotic behaviours), Amato (Dec-POMDP survey), Oliehoek 2012 (Dec-POMDPs RL chapter). Plus the generated marl_taxonomy/ field map (70 entries + gap/venue reports).

Project drafts (references/): Stealth Attacks on Swarms, Swarm Resiliency, mission family, advML report, ExoRL report, Formalization.

Honesty on the corpus. The deep-research sweeps returned more candidate URLs than are listed here; many were graded low-quality and dropped, and the resilience/covert syntheses were re-established by direct reading after a tooling failure. This page lists the substantive documents; the only ones the gap verdict leans on are the verified/primary works in flanks A–C.

E · multi-robot systems

What to borrow from Multi-Robot Systems

A method-level technical review for a sparse, networked team of autonomous agents doing area coverage under a communication constraint — built toward studying covert-misbehaviour resilience. Three areas: distributed connectivity estimation · coverage control · connectivity-constrained exploration.

The multi-robot-systems (MRS) field has worked these exact problems for ~20 years. This review pulls the canonical methods, defines the technical terms from them, states what information each node must hold, shows how the classical methods fuse with learning (MARL/GNN), and lays out the three connectivity "guardrail" strategies (maintain · mask · grace). The novelty-conjunction verdict it lands on is the same one already stated above — see the combined gap verdict; this section does not repeat it, it supplies the method-level depth underneath it.

E.1 · Distributed algebraic-connectivity (λ₂) estimation & maintenance

How does a team where each robot sees only its one-hop neighbours keep the whole comm graph connected? The field makes "connected" quantitative through λ₂, the second-smallest eigenvalue of the graph Laplacian (the Fiedler value): it is > 0 iff the graph is connected and grows as the graph gets better-knit. The hard part: λ₂ and its eigenvector are global properties, yet each robot only knows its own Laplacian row. The signature contribution is a family of distributed power-iteration + average-consensus algorithms that let every robot estimate its own component of the Fiedler vector and a local copy of λ₂ from neighbour-to-neighbour messages alone — then follow the gradient of λ₂ to hold connectivity. Almost all of it is control-theoretic; learning enters as a replacement controller (GNN) or as a constraint/guardrail around a learned policy.

Glossary (defined from the canonical methods)

TermDefinition
Graph Laplacian L = D − ADegree matrix minus adjacency. Symmetric PSD, row-sums zero (L·1=0); spectrum 0=λ₁≤λ₂≤… encodes connectivity. Each robot natively knows only its own row of L.
Algebraic connectivity / Fiedler value (λ₂)2nd-smallest eigenvalue of L. λ₂>0 ⇔ graph connected (Fiedler 1973); monotone as edges/weights increase — a smooth scalar surrogate for "how connected".
Fiedler vector (v₂)Eigenvector for λ₂, orthogonal to 1. Component v₂ⁱ says how robot i sits on the graph's weakest cut; the λ₂ gradient is built from edge differences (v₂ⁱ−v₂ʲ)², so a robot needs only its own component + neighbours'.
Why the second eigenvalueλ₁ is always 0 (eigenvector 1), carrying no connectivity info. Connectivity lives in λ₂ — which requires deflating the trivial 1 direction so λ₂ becomes the dominant mode.
Distributed power iterationRepeatedly apply a shifted/deflated operator (I−αL) to a vector; it converges to v₂. The product L·x is local (each node combines only neighbour values) — turning power iteration into rounds of neighbour message-passing.
Dynamic average consensusThe PI (proportional-integral) estimator (Freeman–Yang–Lynch 2006) that tracks the average of a time-varying signal from neighbour exchanges. It supplies the two global scalars — the deflation mean Ave(x) and normalisation Ave(x²) — that power iteration needs, without any node holding the whole vector.
Connectivity gradient∂λ₂/∂pᵏ = Σⱼ −Aₖⱼ(v₂ᵏ−v₂ʲ)²(pᵏ−pʲ)/σ². Local: robot k needs only its own Fiedler component, its neighbours', and relative positions. Following it (or a barrier-potential of λ₂) is the connectivity-maintenance control law.

Methods & their learning stacks

MethodHow it worksLearning stack
Decentralized deflated power-iteration + PI consensus (Yang–Freeman–Lynch 2010)Each agent integrates one scalar xⁱ→v₂ⁱ with three terms (deflate the 1 mode, apply −L, renormalise); reads off λ₂ via Rayleigh quotient or the normalisation gain; then moves up the λ₂ gradient. The reference scheme.control-theoretic (ODE + consensus + gradient; Lyapunov proof)
Power-iteration + potential-barrier controller (Sabattini–Chopra–Secchi 2013)Same estimator, wrapped by a potential V(λ₂)→∞ as λ₂→ε. The barrier dominates near the threshold ⇒ provably keeps λ₂>ε for all time even with an added bounded task controller.control-theoretic
Fiedler-supergradient ascent (de Gennaro–Jadbabaie 2006)Estimate v₂ (distributed spectral analysis), ascend λ₂ via dλ₂ = v₂ᵀ(dL)v₂. Establishes λ₂-as-objective.control-theoretic + distributed eigensolver
Nearest-neighbour connectivity potentials (Zavlanos–Pappas 2007)Potentials on existing links diverge as a link nears comm range ⇒ no current edge is ever lost. Preserves the edge set rather than tracking λ₂ (more conservative).control-theoretic
Nonlinear edge-weight coordination (Ji–Egerstedt 2007)Edge weights blow up near sensing range ⇒ the weighted-Laplacian flow never disconnects, with no λ₂ computation. Simple, decentralized, conservative (freezes initial topology).control-theoretic
λ₂-maximisation via SDP (Kim–Mesbahi 2006)Place agents to maximise λ₂ subject to proximity, solved as a sequence of semidefinite programs. The centralized "ground truth" the distributed gradients approximate.optimization (SDP), centralized
Aggregation-GNN decentralized controller (Tolstaya–Gama–Ribeiro 2019)An order-K graph-filter GNN (K rounds of message-passing — structurally the same local aggregation as power iteration) imitates a centralized flocking controller; transfers across team sizes. But no hard connectivity guarantee — the authors document a subgroup-escape failure, motivating guardrails.imitation (behaviour cloning) on a graph-conv GNN; K-hop message passing
RL policy with λ₂ as a constraint (Li et al., ICRA 2022)Shared decentralized policy from local range-scan + neighbour positions; λ₂(Gₜ) enters as a constraint (not just reward), behaviour-cloning warm-start. The now-standard pattern: learned task policy + control-theoretic λ₂ as the safety layer.deep MARL (PPO-style, constrained) + BC
What information each node must hold & exchange (the Yang–Freeman–Lynch scheme — directly answers "what info does each node need"):

How it fuses with learning. Three layers: (1) replacement — a graph-conv GNN imitates a centralized controller and runs decentralized (but pure imitation lacks a guarantee → subgroup-escape, so it does not subsume the λ₂ machinery); (2) λ₂ as a learning signal — MARL puts algebraic connectivity into the objective as a reward or hard constraint; (3) guardrail/shielding (most robust, now-standard) — a learned policy proposes, a control-theoretic layer (distributed λ₂ estimate, edge-weight barrier, or move-masking) projects onto the connectivity-feasible set. In all hybrids the spectral estimator stays control-theoretic; learning replaces the controller or the objective, not the estimator.

Gaps: time-scale fragility (estimators correct only under strong separation: consensus ≫ power-iteration ≫ motion); λ₂ is global & slow (one weak cut sets it, gradient near-zero almost everywhere, per-node readout singular when xⁱ crosses zero); adversarial / Byzantine robustness essentially open (estimators assume truthful neighbours — one lying agent biases the whole λ₂ estimate; resilient distributed Fiedler estimation is unsolved, and W-MSR/r-robustness protect consensus values, not spectral estimates); connectivity is necessary but not the right objective (λ₂-max over-clusters), and the learning side is thin on guarantees.

→ For us. The big one is the adversarial gap — that's our headline RQ. Two of our levers attack it: the learned GCRN belief is dual-use (attack surface and a within-bounds anomaly detector), and we measure resilience as a stealth-damage frontier rather than a one-shot attack. Practically, treat a distributed/learned connectivity estimate as a policy feature, but remember a key in-house finding: λ₂ is not deployable under partial single-step info — the implementable hard form is a local degree/component check, and the deployable belief is our learned size-invariant estimator, not an online eigensolver.

E.2 · Coverage control — and how it fuses with learning

Coverage control deploys mobile sensors so every point is well-sensed by its nearest robot, with more capacity where it matters. The canonical formulation (Cortés–Martínez–Karatas–Bullo 2004) is locational optimization: minimise H(P)=Σᵢ ∫_{Vᵢ} f(‖q−pᵢ‖)·φ(q) dq. The solution partitions the region into Voronoi cells, shows the optimum is a centroidal Voronoi tessellation (each robot at its importance-weighted centroid), and reaches it by Lloyd's algorithm (move toward your cell centroid). Decentralized over the Delaunay graph. Its limitation — φ assumed known, controller myopic (uses only its own cell, never shares) — is exactly what learning attacks.

Glossary

TermDefinition
Voronoi cell VᵢAll points closer to pᵢ than any other robot. Robot i is responsible for its cell; only Delaunay (boundary-sharing) neighbours matter.
Lloyd's algorithmIterate: compute Voronoi cells → move each site to its density-weighted centroid → repeat. As feedback: uᵢ=−k(pᵢ−Cᵢ) — gradient descent on the locational cost.
Locational cost H(P)Total importance-weighted sensing error (commonly f(x)=x²); the objective coverage minimises.
Centroidal Voronoi tessellation (CVT)Every site = centroid of its own cell. CVTs are exactly the critical points of H.
Generalized mass / centroidMᵢ=∫_{Vᵢ}φ, Cᵢ=(1/Mᵢ)∫_{Vᵢ} q·φ dq. Gradient: ∂H/∂pᵢ=2Mᵢ(pᵢ−Cᵢ).
Density / importance φWhere sensing matters. Known a-priori classically; estimated from data in adaptive/learned versions.
Weighted / power VoronoiPer-robot weights wᵢ make capable robots claim larger cells (heterogeneous teams); weights can be learned online.
CTDE-by-imitationA centralized clairvoyant CVT expert (full state + true φ) generates optimal Lloyd actions; a decentralized GNN is trained by MSE to reproduce them from local obs + messages, then deployed decentrally.

Methods (classical → learned)

MethodHow it worksLearning stack
Voronoi/Lloyd coverage (Cortés–Bullo 2004)Compute cell → weighted centroid → uᵢ=−k(pᵢ−Cᵢ). Exact gradient descent on H; LaSalle convergence to a CVT. Assumes φ known.control-theoretic
Decentralized adaptive coverage (Schwager–Rus–Slotine 2009)Learns φ online: φ=K(q)ᵀa with unknown weights a; each robot updates â from its own measurements + a consensus term; moves to its â-centroid. Lyapunov proof of coverage + (under persistent excitation) parameter consensus.online adaptive control + parameter consensus (no deep nets)
Adaptive weighted-Voronoi (Pierson–Schwager 2017)Heterogeneous teams: multiplicatively-weighted Voronoi; each robot learns its performance weight online from local signal + neighbours.online adaptive control (weight adaptation)
Spatial-GNN coverage (Tolstaya et al. 2021)Environment → spatial graph; a size-equivariant GNN maps each robot's neighbourhood to a move, imitating a centralized expert. Generalises to maps/teams far larger than the expert can solve.spatial GNN + imitation
GNN decentralized controller (Gosrich–Kumar 2022)Multi-hop message passing lets each robot fuse non-local info before deciding velocity → beats non-communicating Lloyd, scales/transfers to larger teams.GNN (multi-hop) + imitation
LPAC (Agarwal–Kumar–Ribeiro 2024) SOTA, open-sourcePer robot: CNN perception of a 32×32 local density map → GNN (K=3 hops) communication → MLP velocity. Imitates a clairvoyant CVT expert on 100k state-action pairs. Beats decentralized- and centralized-CVT by ≥20%, transfers zero-shot to larger maps/teams, robust to position noise.CNN + K-hop GNN + MLP, end-to-end imitation (PyTorch-Geometric)
Constrained-learning coverage (Agarwal et al. 2024)LPAC architecture trained with primal-dual / Lagrangian optimization so the policy satisfies extra constraints (e.g. connectivity) rather than collapsing them into one scalar.GNN + constrained (primal-dual) learning
MARL coverage + dynamic densityRL (actor-critic) with coverage-cost reward; targets spawn a dynamic density that attracts agents, coupled to CVT — handles moving/unknown φ the static-φ controller can't.MARL (actor-critic) + CVT prior
How coverage fuses with learning — four increasingly aggressive layers:
  1. Learn the density, keep the controller (Schwager): φ modelled as linear-in-unknown-weights, estimated online + shared by consensus; motion law stays "move to weighted centroid" — a provable controller wrapped around an online estimator (the cleanest fusion).
  2. Learn relative capability (Pierson): per-robot performance weights learned inside a weighted-Voronoi partition; geometry kept, parameters adapted.
  3. Learn the controller itself (Kumar/Ribeiro): the analytic Lloyd rule is myopic (own cell only, no sharing) ⇒ provably suboptimal under limited range. GNNs with multi-hop message passing fuse non-local info and beat it; LPAC is the mature CNN+GNN+MLP pipeline. The recurring recipe is CTDE-by-imitation: classical theory supplies both the cost and the cheap clairvoyant expert, and a decentralized GNN is supervised to reproduce it.
  4. Learn under constraints / by reward: primal-dual constrained learning to respect connectivity; or MARL reward-shaping for time-varying φ.
Across all four, the centroidal-Voronoi objective stays the backbone (the cost, the expert, or the geometric prior); learning supplies what the theory assumes away — the density, the heterogeneity, and the non-local coordination.

What each node holds: own position + Voronoi/Delaunay-neighbour positions (to compute its cell); the density φ over its cell (known classically; a learned estimate â shared by consensus in adaptive coverage); mass Mᵢ and centroid Cᵢ — the move-toward target, computed locally; heterogeneous: a learned performance/trust weight wᵢ, exchanged with neighbours; GNN/LPAC: a local ego-centric density map (e.g. 32×32 + boundary + neighbour-position channels), a fixed-size learned message exchanged at each of K hops, and in-range neighbours' relative positions.

Gaps: only local optimality (Lloyd → a local CVT, init-dependent); known-/static-density assumption (adaptive needs persistent excitation, which conflicts with staying spread on high-φ areas); learned controllers are bottlenecked by their clairvoyant expert (inherit its local-optimality ceiling; need centralized true-state at train time); assume cooperative honest robots — corrupted neighbour features / dropped links unaddressed (the GNN message layer is a single point of failure); connectivity treated only recently, as a soft constraint — coupling coverage with a hard connectivity guarantee remains under-developed.

→ For us. LPAC is the architecture to study closely: CNN-local-map + K-hop GNN + MLP, trained by imitation of a cheap analytic expert, gives size-invariant transfer — exactly the "scale-free local rule → emergent collective" property the swarm claim requires. But its blind spots are our contributions: it has no relay role (coverage over-clusters), treats connectivity as a soft constraint, and assumes honest neighbours. Our additions — emergent relay role, hard-ish connectivity, and the covert-resilience layer — sit precisely in LPAC's gaps.

E.3 · Exploration / coverage under a connectivity constraint — the three guardrails

The core tension is intrinsic: exploration rewards spreading apart; comms have limited range, so dispersing eventually breaks the network and strands discoveries. The field's central design choice is the connectivity guardrail — the rule for which dispersed configurations are allowed and how/when the network must reform. The literature organises cleanly around three behaviours:

GuardrailHow it worksRepresentative works
(A) Maintain
continuous connectivity
The graph stays a single component at all times. Realised as an attractive potential on the Laplacian (links = springs) or a controller keeping λ₂>0; any exploration command is blended with a connectivity-restoring term so robots are reeled back before a link breaks. Hard, always-on guarantee — but conservative: robots can never venture beyond the network's reach.Hsieh–Cowley–Kumar 2008; Zavlanos–Pappas 2007; Zavlanos–Egerstedt–Pappas 2011; Sabattini et al. 2013
(B) Mask
hard-constrain moves
Operates on individual decisions: check each candidate move; if it would disconnect the graph, forbid it and project to the nearest admissible move. In continuous control a CBF-QP renders the connected set forward-invariant; in discrete/distributed settings a local motion-constraint projection (even under delay). Less conservative than a global potential (free inside the connected set), and the natural form inside a learned policy (mask disconnecting actions).Schuresko–Cortés 2009; Capelli–Sabattini 2020 (CBF); our RedWithinBlue hard guardrail (beats soft degree-floor by ~20 pts)
(C) Grace
budget + incentivise reconnection
The team is deliberately allowed to split to explore far, then brought back by rendezvous. Guarantee is temporal: reconnect every T steps (periodic), whenever info must be reported (recurrent), or in the union-over-a-window sense (intermittent). Reconnection is driven by an explicit schedule (job-shop/MILP, event-triggered) or by an incentive (reconnect when the value of sharing exceeds the value of more solo exploration). Buys the most exploration speed, bounds the age-of-information of discoveries, at the price of latency.Hollinger–Singh 2010/12 (periodic, seminal); Banfi et al. 2018 (recurrent); Kantaros–Zavlanos 2017 (intermittent + LTL); job-shop rendezvous 2024; Jensen–Gini 2013 (sentry/explorer); learned: IR2, IROS 2024

Glossary

TermDefinition
Continuous connectivityGraph connected at every instant; no splitting move ever allowed (guarantee pointwise in time).
Recurrent connectivityMay disconnect arbitrarily long, but must periodically re-establish — typically to teammates and a base — each time there's something to report (event-driven).
Intermittent connectivityRequired only in the union sense over a sliding window, infinitely often; the instantaneous graph may be split; info flows through a sequence of pairwise meetings.
Periodic connectivitySpecial case: regain full connectivity every fixed interval T (Hollinger–Singh) — a tunable knob between spreading and coupling.
RendezvousA coordinated meeting to exchange maps. Explicit/scheduled (planned point+time via MILP/job-shop) or implicit/emergent (converge when sharing-value > solo-exploration-value).
Connectivity budgetBound on how much / how long / how far the team may be disconnected (the period T, max disconnection duration, droppable hops). Grace spends this budget to buy speed.
Age-of-Information (AoI)Delay between observing info and it reaching its consumer (teammate/base). Under intermittent connectivity it's non-zero, bounded by the reconnection schedule; minimising it is the implicit objective rendezvous timing trades against exploration gain.
Relay/backbone vs frontier/explorerDivision of labour: relays hold positions stitching the network; explorers push into the unknown. Roles can be static, rule-switched (Jensen–Gini), tied to comm-tree depth (A³), or — the open goal — emergent from learning.

What each node holds: own pose/cell + local occupancy belief (free/occupied/unknown) accumulated since the last sync; its in-range neighbour set (local adjacency row) + link qualities — raw material for any connectivity check; a local estimate of a global connectivity quantity (giant-component membership, a degree count, or a distributed λ₂ estimate); frontier/utility info, and (for Grace) a "map-surplus" belief — how much more it knows than neighbours — which sets the value of reconnecting; teammates' last-known positions/headings (to anticipate disconnection and navigate to rendezvous); for scheduled/intermittent, the rendezvous commitment (where/when) + a timer/budget (AoI of its discoveries); for role-based, its current role + the info to decide a switch (comm-tree depth; whether holding position keeps the backbone intact).

How learning enters. (1) Learning replaces the planner, guardrail stays hand-engineered — a MARL/GNN policy chooses moves, connectivity enforced by an external mask/CBF/schedule (safest, most common). (2) Learning absorbs Grace — instead of an explicit schedule, the policy learns the long-horizon disconnect-vs-reconnect trade-off, so rendezvous becomes implicit/emergent (IR2, IROS 2024: SAC + attention + a map-surplus feature + curriculum — the frontier of replacing schedules with learned timing). (3) Learning is meant to grow the role structure — but the few instances (A³ network) still tie roles to a hand-defined comm-tree-depth rule, so differentiation is only semi-emergent. The natural substrate throughout is a GNN over the comm graph (size-invariant transfer) — and a learned per-agent graph-belief is the key under-used enabler.

Gaps (the deep one): learned connectivity-constrained exploration is thin, and almost always pairs a learned explorer with a hand-designed guarantee — a policy that internalises the constraint and still gives a formal guarantee is largely missing; Grace-by-learning is nearly a one-paper field (IR2); emergent explorer/relay roles under the constraint are essentially unrealized (every split is hand-assigned, rule-switched, or keyed to comm-tree depth); per-agent belief/graph substrates are under-used as the carrier of the constraint; realistic comm models (latency, loss, bandwidth, AoI) mostly absent from learned work, and adversarial/stealth robustness of a learned connectivity guardrail is unaddressed.

→ For us. Three of these gaps are literally our open questions, and we already have results in them: (C/learned) our Regime-B (soft + latency-discounted delivery) is exactly a learned-Grace formulation; (emergent roles) our crisp finding — roles emerge only when the constraint is soft enough that relaying pays; under a hard backstop the relay role is redundant and vanishes; (belief substrate) we built the GCRN size-invariant belief that no surveyed method has. The thing to adopt from here is the Grace vocabulary itself — periodic/recurrent/intermittent + age-of-information — because at scale (where one component is geometrically infeasible) this is the only honest regime, and AoI is precisely our covert-attack channel (a relay that silently drifts inflates AoI without tripping a degree-floor).

E.4 · Where classical stops and learning starts (the seams)

All three areas independently flag the same holes; the same four cross-cutting gaps surfaced here as in the flank-A–C reviews above, validated against in-house results. The five seams where classical MRS theory hands off to learning:

  1. Global-info / time-scale seam. Distributed λ₂ needs strong time-scale separation and recovers a global quantity over many rounds. It fails for fast motion / sparse links and — for us — under partial single-step info: the deployable hard constraint is a local degree/component check, the deployable belief a learned size-invariant estimator, not an online eigensolver.
  2. Myopia seam. Lloyd/CVT is exact but uses only its own cell and over-clusters → provably suboptimal under limited range. Learned multi-hop GNNs (LPAC) propagate the non-local info the myopic rule can't.
  3. Known-model seam. Classical assumes φ known/static, links symmetric, neighbours truthful. Learning starts where the model is unknown/time-varying/heterogeneous — for us, the global topology and teammate positions after disconnection (estimated by the belief).
  4. Hard-discrete-composition seam (most central). Control theory gives hard guarantees for a single smooth objective (CBF-QP, edge-barriers, move-projection) but stops at (a) emergent multi-role division of labour, (b) the non-myopic Grace trade-off, (c) composing a flexible policy with the guarantee beyond ad-hoc penalties. Our results pin it: the hard mask is the right safety layer but makes the relay role redundant ⇒ control theory supplies the invariant, learning supplies the non-myopic policy + emergent roles + reconnection timing + the belief.
  5. Cooperative-truthful seam (our RQ). Every classical guarantee — including W-MSR — assumes neighbours are truthful or detectable outliers. Learning + our resilience study becomes necessary the moment the adversary is covert (within-bounds, corrupting the estimate, exploiting graph-position) — where classical robustness goes silent and the dual-use learned belief is the only instrument.

The MRS review's own novelty verdict (the full conjunction is novel; each axis alone is not — learned+decentralized free, λ₂-aware control exists but role-less, learned role differentiation exists but connectivity-blind, covert single-agent poisoning is an active subfield, LPAC transfer is unconstrained/role-less, learned implicit rendezvous ≈ one paper) is identical to the combined gap verdict above and is not repeated here. Compiled from a 4-agent verified literature workflow (mrs-litreview); arXiv-weighted — a robotics-venue (ICRA/IROS/RSS/CoRL/T-RO) sweep is still owed before locking paper claims.

References — MRS technical review

Connectivity estimation & maintenance: Yang, Freeman, Gordon, Lynch, Srinivasa, Sukthankar (2010), Automatica 46(2); Sabattini, Chopra, Secchi (2013), IJRR 32(12); de Gennaro, Jadbabaie (2006), IEEE CDC; Zavlanos, Pappas (2007), IEEE T-RO 23(4); Kim, Mesbahi (2006), IEEE TAC 51(1); Ji, Egerstedt (2007), IEEE T-RO 23(4); Kempe, McSherry (2008), JCSS 74(1); Tolstaya, Gama, Paulos, Pappas, Kumar, Ribeiro (2019), CoRL; Li, Jie, Kong, Cheng (2022), IEEE ICRA (arXiv:2109.08536).

Coverage control: Cortés, Martínez, Karatas, Bullo (2004), IEEE T-RA 20(2); Bullo, Cortés, Martínez (2009), Distributed Control of Robotic Networks, Princeton; Schwager, Rus, Slotine (2009), IJRR 28(3); Pierson, Figueiredo, Pimenta, Schwager (2017), IJRR 36(3); Du, Faber, Gunzburger (1999), SIAM Review 41(4) / Lloyd (1982); Tolstaya, Paulos, Kumar, Ribeiro (2021), IROS (arXiv:2011.01119); Gosrich, Mayya, Li, Paulos, Yim, Ribeiro, Kumar (2022), ICRA (arXiv:2109.15278); Agarwal, Muthukrishnan, Gosrich, Kumar, Ribeiro (2024), LPAC (arXiv:2401.04855; open-source CoverageControl); Agarwal, Kumar, Ribeiro et al. (2024), constrained coverage (arXiv:2409.11311).

Connectivity-constrained exploration: Hsieh, Cowley, Kumar, Taylor (2008), JFR 25(1-2); Zavlanos, Egerstedt, Pappas (2011), Proc. IEEE 99(9); Schuresko, Cortés (2009), JINT 56(1-2); Capelli, Sabattini (2020), ICRA (arXiv:2003.10178); Hollinger, Singh (2012), IEEE T-RO 28(4) (ICRA 2010); Banfi, Quattrini Li, Rekleitis, Amigoni, Caro (2018), Auton. Robots 42(4); Kantaros, Zavlanos (2017), IEEE TAC 62(7); Amigoni, Banfi, Basilico (2019), IEEE Intell. Syst.; Jensen, Gini (2013), IJCAI; Ribeiro da Silva, Chaimowicz, Silva, Hsieh (2024) (arXiv:2309.13494); Tan, Ma, Liang, Chng, Cao, Sartoretti (2024), IR2, IROS (arXiv:2409.04730); Zeng et al. (2025), A³ Network (arXiv:2509.18526).

F · foundational references

Foundational references (annotated)

An annotated bibliography of the field's standing taxonomies and formal-model anchors, organised by the four axes a swarm/multi-agent mission can be classified along. Each entry: full citation · thesis · why it matters here.

These were the local reference library used to position the project's mission families against the dominant axes in the literature. Note: the PDFs themselves have been retired from the workspace — each is retrievable from its publisher via the DOI / venue below.

BEHAVIOR axis — swarm behavior taxonomies (the de facto standard to cite)

SYSTEM / TASK-ALLOCATION axis — different axes (capabilities, allocation, coordination)

FORMAL-MODEL axis — Dec-POMDP family (decentralized partial observability)

DOMAIN anchor — aerial swarm robotics