Theory / Formalism

Mission-Centered Resiliency for Teams of Autonomous Agents — a formal foundation for studying how teams of autonomous agents stay (or fail to stay) on-mission under decentralized control, partial observability, and adversarial pressure. The framework is built around the mission as the unit of analysis, not the individual agent.

This page consolidates the project's formal foundation from three increasingly mature versions of the same framework, transcribed faithfully so the site itself carries the math:

formalism.tex — the full Mission-Centered Resiliency document (three layers, the Dec-POMDP backbone, ~20 boxed definitions, the stealth/break budgets, propagation, viability, the resiliency metrics, hypotheses H1–H3, and focused research questions RQ1–RQ6).
Swarm Resiliency draft (v1.3) — the problem statement: the Dec-POMDP tuple, the neighbor-aggregation interface, the effect-based stealthy-misbehavior threat model D = (C, α, ε, Δ), and assumptions A1–A5.
the early Formalization — the original backbone: the four interactional regimes and the nominal / off-nominal + health / viability / resilience / recovery vocabulary.

Download links are at the bottom of the page. See also Architecture (the engineering of the same swarm) and Findings (the empirical evidence).

1 · Premise & interactional regimes 2 · The three-layer architecture 3 · Micro level 4 · Macro level (Dec-POMDP) 5 · The interaction bridge 6 · Nominal vs. off-nominal 7 · Mission objectives (definitions) 8 · Deviation, stealth & threat model 9 · Propagation 10 · Adversarial game & break budget 11 · Amplification hypotheses H1–H3 12 · Resiliency metrics 13 · Worked example: shared exploration 14 · Focused research questions RQ1–RQ6 Source documents

1 · framing

Foundational premise & interactional regimes

Some missions exceed the operational capacity of any single actor: high-stakes, time-constrained, executed in dynamic environments under partial information, evolving constraints, and little tolerance for delay, error, or failure. The challenge is sustained mission execution under conditions that progressively erode performance and stability. The mission is the primary unit of analysis — agents, structure, autonomy, and interaction matter only through their effect on mission feasibility, progress, degradation, and recovery.

Once the mission is primary, the next question is the structure of interaction through which execution unfolds. The presence of multiple agents is not, by itself, decisive; what matters is the structure of inter-agent relations in a decentralized setting where no persistent central authority governs execution. Mission-level behavior emerges from those relations — the interactional space. Different regimes produce different mechanics of execution:

Regime	Coupling	Execution character
Independent	Weak	Limited strategic dependence; progress only weakly conditioned by others.
Aligned–separable	Moderate	Positively aligned effort, substantially separable contributions (loosely “collaborative”).
Joint	Strong	Jointly constituted execution; progress, risk, and consequence are shared (loosely “cooperative”).
Competitive	Adversarial	Rivalry, contest, or adversarial interference within the same mission space.

The labels collaboration / cooperation are treated as loose cues; the regimes are defined through their interactional properties (objective alignment, action coupling, separability of effort, shared progress/risk/consequence). This four-regime vocabulary is the original backbone from the early Formalization; it survives into the mature document as the framing of the interaction structure.

Missions are grouped not by hardware or domain but by the dominant coupling mechanism tying agents together. The taxonomy exposes the structure through which local behavior becomes mission-level effect — and through which local disruption becomes mission-level loss:

Distributed information gatheringUncertainty reduction — exploration, mapping, search, monitoring.

Self-organized ad-hoc connectivityMaintenance of reachability — relay, rendezvous, formation support. The interaction graph is part of the mission state.

Fusion-coupledDistributed estimation & fusion under neighborhood constraints — tracking, environmental fields, maps.

Actuation & interventionCoordinated physical effect — manipulation, logistics, area treatment.

HybridMultiple coupling modes coexist (e.g. NASA Starling: sensing + networking + reconfiguration at once).

2 · architecture

The three-layer architecture

The formalism develops in three layers, with a Dec-POMDP backbone at the center and graph-theoretic, control-theoretic, game-theoretic, and learning-based views entering as complementary extensions. The modeling problem is fundamentally micro–macro: missions succeed or fail at the macro level, while execution unfolds at the micro level through local decisions and constrained interaction. A useful model must keep both and make the relation between them explicit.

Macro level — the latent mission process

Mission state sₜ ∈ S, trajectory τ, health Φ, viability kernel Viabδ, reward R. Mission-level claims are made over how execution unfolds through time.

⇅ joint action aₜ ↑ / observations oₜ ↓

Interaction bridge — where local becomes collective

Time-varying graph Gₜ = (I, Eₜ), messages mₜ^j→i, aggregation Aggᵢ, propagation ℰₜ^i→j. The formal location of coordination, dependence, propagation, and structural fragility.

⇅ local actions compose ↑ / comm. shapes inputs ↓

Micro level — local decision-making

Observation oₜⁱ, auxiliary input qₜⁱ, local input xₜⁱ, history hₜⁱ, policy πᵢ(·|hₜⁱ), action aₜⁱ. The level at which decentralized execution is realized.

Interaction affects the mission twice: first by shaping what enters each local history through communication, second by shaping how local decisions compose into the next global state. A local message, dropped link, delayed summary, or corrupted signal can therefore matter far beyond the agent at which it first appears.

3 · micro

Micro-level formalism

Execution is local: at each step, agent i acts not from the full mission state but from what is locally available. Let the agent set be I = {1, …, n} over horizon t ∈ {0, …, H−1}. Agent i has observation space Ωᵢ and action space Aᵢ; at time t it receives a direct observation oₜⁱ ∈ Ωᵢ and an auxiliary input qₜⁱ ∈ Qᵢ. The auxiliary input is deliberately broad — it may carry neighbor communication, role, trust, belief summaries, uncertainty, or health.

xₜⁱ = (oₜⁱ, qₜⁱ) — full local input

Because execution is decentralized, the agent chooses from its locally available history, not the current input alone:

hₜⁱ = (x₀ⁱ, a₀ⁱ, x₁ⁱ, a₁ⁱ, …, aₜ₋₁ⁱ, xₜⁱ)

aₜⁱ ∼ πᵢ(· | hₜⁱ) — the core micro-level decision rule

Any quantity that later matters for execution must enter through qₜⁱ, become part of xₜⁱ, and influence action through hₜⁱ and πᵢ. The policy may be hand-designed, model-based, learned, recurrent, stochastic, or role-conditioned. History-based local decision-making does not conflict with a first-order Markov assumption at the system level — the global dynamics can be Markovian while no single agent sees the full global state.

4 · macro

Macro-level formalism — the Dec-POMDP backbone

The latent mission process that local actions jointly induce is modeled as a finite-horizon cooperative Decentralized Partially Observable Markov Decision Process — the right starting point because the mission is sequential, partially observed, decentralized, and team-level by construction. The mission model is the tuple:

P = ⟨ I, S, {Aⁱ}_i∈I, {Ωⁱ}_i∈I, T, O, R, ρ₀, H ⟩

Object	Meaning
I = {1,…,n}	Set of n autonomous agents; each selects its action from locally available information.
S	Mission-state space — the true mission condition sₜ (variables governing evolution and scoring). Latent in the information sense: no agent directly accesses the full sₜ.
Aⁱ	Action space of agent i; joint action aₜ = (aₜ¹,…,aₜⁿ).
Ωⁱ	Local observation space of agent i.
T	Transition kernel sₜ₊₁ ∼ T(· \| sₜ, aₜ) — how the mission state evolves under the joint action.
O	Observation model oₜ ∼ O(· \| sₜ) — how the latent state generates local observations.
R	Shared team reward R(sₜ, aₜ) — scores mission progress / utility as a team-level quantity.
ρ₀	Initial-state distribution s₀ ∼ ρ₀ (fixed starts are a point mass).
H	Finite mission horizon (time budget in decision steps).

The joint policy π = (π₁,…,πₙ) induces the mission trajectory — the main macro-level object, since mission-level claims are made over how execution unfolds through time:

τ = (s₀, o₀, a₀, s₁, o₁, a₁, …, s_H)

J(π) = E [ Σ_t=0^H−1 R(sₜ, aₜ) ] — the cooperative objective

The return J(π) is a standard mission-level abstraction, but it is insufficient on its own: a mission may accumulate reward while its structural capacity to continue erodes. Degradation, viability, and failure are built on top of this backbone once the interaction and micro–macro mapping are explicit.

5 · bridge

The interaction bridge

The bridge between micro and macro is the interaction layer: communication, message passing, neighborhood structure, role-dependent influence, and propagation. Decentralized coordination is not produced by local policy alone — it is produced by local policy acting through a structured pattern of inter-agent communication and dependence. The structure is a time-varying directed interaction graph:

Gₜ = (I, Eₜ) · Nᵢ(t) = { j ∈ I : (j,i) ∈ Eₜ } (in-neighbors of i)

If agent j sends a signal to i, write it mₜ^j→i. The communication input to i is the aggregation of received neighbor signals:

cₜⁱ = Aggᵢ( { mₜ^j→i : j ∈ Nᵢ(t) } )

The operator Aggᵢ is left abstract on purpose — averaging, consensus, message fusion, belief update, map merging, or trust-weighted filtering. This is the communication component inside the auxiliary input: simplest case qₜⁱ = cₜⁱ; richer settings qₜⁱ = (cₜⁱ, rₜⁱ, uₜⁱ, …) with role/mode rₜⁱ and a local uncertainty/trust/health variable uₜⁱ. The trust-by-default assumption enters exactly here: under nominal operation agents treat neighbor signals as valid decision inputs, discounting them only after clear inconsistency.

Working assumptions

The framework is developed under explicit assumptions — realistic starting conditions, not universal claims. The mature document carries eight; the Swarm-Resiliency draft states the load-bearing first five (A1–A5):

A1 — Local agency at executionEach agent selects actions from its own information state; a central node may log/score/supervise offline but holds no real-time decision authority.

A2 — Partial observabilityNo agent has full access to the mission state; observations are local, incomplete, possibly noisy, delayed, or occluded.

A3 — Topology-constrained interactionAgents interact through limited, possibly time-varying channels; influence depends on who is connected to whom, not on broadcast.

A4 — Mission-level objectiveThe objective is over the joint trajectory, not a sum of node-level goals; success/degradation/failure are assessed at the mission level.

A5 — Isolation & agent loss admissibleAgents may become isolated, degraded, or unavailable; isolation changes redundancy, reachability, feasibility — its effect depends on role, timing, structure.

A6 — Trust-by-defaultUnder nominal execution, neighbor information is treated as usable unless clear evidence suggests otherwise.

A7 — Disturbance & misbehavior possibleExecution may include noise, faults, misbehavior, or adversarial influence; the question is when local deviation stays local vs. propagates.

A8 — Adversarial influence is stealth-constrainedAdversarial effects are assumed locally plausible rather than overt — influence that can persist without immediate rejection.

6 · regimes

Nominal vs. off-nominal execution

Nominal execution is execution within the expected operating regime: the joint trajectory stays inside an admissible region where sensing, communication, and local decision-making behave within design assumptions. It may include noise and variation, but not in a form that changes the basic logic of execution — geometrically, motion within an expected operating region, near a constrained manifold of admissible mission dynamics.

Off-nominal execution begins when the trajectory departs that regime. The cause is intentionally cause-agnostic — disturbance, fault, isolation, anomaly, misbehavior, or adversarial influence. It marks a regime shift, not yet a verdict on success or failure: information quality shifts, interaction patterns weaken or distort, redundancy erodes, or local decisions no longer compose as expected.

Within that shift, four trajectory-level concepts orient everything downstream — health tracks the broader condition of execution; viability asks whether successful completion is still achievable; degradation marks weakening of health; failure begins when viability is lost. Distinct from these are four response concepts:

Concept	Description
Robustness	Ability to remain effective under disturbance without major reconfiguration.
Resilience	Ability to absorb disruption and preserve or restore acceptable mission behavior.
Recovery	The return of mission health or viability after degradation.
Healing	Repair/restoration/reconfiguration of damaged components, links, policies, or subsystems — may enable recovery but does not guarantee it.

A mission can recover without full healing, and a system can heal locally without full mission recovery. This nominal / off-nominal + health / viability / resilience / recovery vocabulary originates in the early Formalization and is made precise by the definitions and metrics below.

7 · definitions

Mission objectives — the core definitions

Because the cooperative return alone cannot detect eroding structural capacity, the framework adds formal objects capturing the conditions for continued execution: the constraint set, mission health, the viability kernel, and trajectory-level predicates for success, degradation, and failure.

Definition — Mission Constraint Set The set of mission states from which completion remains physically and informationally feasible, encoding hard constraints (minimum connectivity, surviving agents, time budget, safety) as state-dependent inequalities. The goal set K_goal ⊆ K collects the terminal conditions for completion. K = { s ∈ S : gⱼ(s) ≤ 0, j = 1,…,J } e.g. for a connectivity mission, g_conn(s) = −λ₂(L(Gₜ)) keeps the algebraic connectivity (Fiedler value) of the graph Laplacian strictly positive.

Definition — Mission Health Function A weighted combination of normalized health indicators — a diagnostic, not a reward: one can have R > 0 (reward earned) while Φ declines (capacity to keep earning it eroding). Φ(sₜ, t) = Σ_d=1^D w_d φ_d(sₜ, t), w_d > 0, Σ w_d = 1 Canonical indicators: connectivity λ₂(L(Gₜ))/λ₂^max; information quality 1 − H(sₜ|o₀ₒₜ)/H_max; task progress P(sₜ)/P*; capacity (fraction of agents operational / roles filled / channels active). Weights are mission-family-dependent.

Definition — Stochastic Viability Kernel The set of states from which some joint policy keeps the trajectory inside the constraint set with probability at least 1−δ for all remaining time — viability theory adapted to the discrete-time stochastic Dec-POMDP setting (a backward recursion analogous to Hamilton–Jacobi reachability). Viabδ(K, t) = { sₜ ∈ K : ∃ π s.t. Pr( sτ ∈ K ∀ τ ∈ {t,…,H} | sₜ, π ) ≥ 1−δ }

Definition — Mission Viability The mission is viable at time t if and only if sₜ ∈ Viabδ(K, t).

Definition — Mission Success A trajectory achieves success under sustained constraint satisfaction and goal attainment — “always safe” ∧ “goal reached”. Success(τ) ⇔ [ ∀ t, sₜ ∈ K ] ∧ [ s_H ∈ K_goal ] (in LTL: □(s∈K) ∧ ◇(s∈K_goal))

Definition — Mission Failure Failure at t* occurs when the state has left the constraint set and viability is lost — an absorbing condition (once viability is lost, failure is irreversible). Failure(τ, t*) ⇔ s_t* ∉ K ∧ s_t* ∉ Viabδ(K, t*)

Definition — Mission Degradation A trajectory is degraded on [t₁,t₂] if viability is maintained but health has dropped below the nominal baseline — “still viable” ∧ “health loss”. Degraded(τ, t₁, t₂) ⇔ [ ∀ t, sₜ ∈ Viabδ(K,t) ] ∧ [ ∃ t, Φ(sₜ,t) < Φ^nom(t) − ε ]

8 · threat

Deviation, stealth & the threat model

The deviation model is cause-agnostic — it captures any bounded departure from nominal behavior. The Swarm-Resiliency draft frames this as an effect-based threat model: rather than model the exploit path, model its mission-relevant outcome, namely that a subset of agents no longer follows the nominal decentralized decision rule. A disturbance instance is the tuple D = (C, α, ε, Δ).

Definition — Compromised Set & Activation Let C ⊆ I be the compromised agents, k = |C|. A binary activation variable βₜⁱ ∈ {0,1} (with βₜⁱ=0 ∀ i∉C); the activation pattern Δ = {βₜⁱ} over the horizon specifies when each compromised agent deviates.

Definition — Perturbed / Effective Policy When activated, a compromised agent runs a perturbed policy π̃ᵢ in place of the nominal πᵢ: π̂ᵢ(·|hₜⁱ) = (1−βₜⁱ) πᵢ(·|hₜⁱ) + βₜⁱ π̃ᵢ(·|hₜⁱ)

Definition — Amplitude Bound Deviation amplitude is bounded by total-variation distance whenever active. α=0: identical to nominal; α=1: arbitrary. d_TV( π̃ᵢ(·|hₜⁱ), πᵢ(·|hₜⁱ) ) ≤ α (when βₜⁱ=1)

Definition — Deviation Profile The tuple specifying which agents are compromised, when they activate, what they do instead, and how far they can deviate. D = ( C, Δ, π̃_C, α )

Definition — Local Stealth Constraint (the formal meaning of “covert”) An attack is stealthy if it is indistinguishable from normal operation given the observations available to local monitors — and in the decentralized setting the monitors are the neighboring agents. Agent i's deviation is ε_s-stealthy to neighbor j if the KL divergence between the nominal and perturbed message distributions is bounded: D_KL( P^nom(mₜ^i→j) ∥ P^pert(mₜ^i→j) ) ≤ ε_s ε_s=0: perfectly undetectable; ε_s→∞: the constraint becomes vacuous. This is a local, per-edge condition — a deviation can be stealthy to every individual neighbor yet have detectable mission-level consequences. By Neyman–Pearson, ε_s bounds the Type-II error exponent of a neighbor's likelihood-ratio test.

The framework also bounds intervention frequency — the total activated steps across all compromised agents is capped by an intervention budget T_int: Σ_i∈C Σ_t βₜⁱ ≤ T_int. Together these define the feasible strategy set Φ_k(α, ε_s, T_int) of deviation profiles with |C|≤k satisfying amplitude, stealth, and intervention bounds.

Threat-model taxonomy: natural faults vs. adversarial attacks

Dimension	Natural faults	Adversarial attacks
Agent selection C	Random (i.i.d. failure prob. p_f)	Chosen to maximize damage
Activation Δ	Persistent or random-onset	Strategically timed
Perturbed policy π̃ᵢ	Unstructured (crash, stuck, random)	Optimized against the rest of the team
Amplitude α	Uncontrolled	As large as stealth permits
Stealth intent	None	Essential; attacks must persist
Coordination	Independent	Centralized adversary across C

f-local / f-total models (from Byzantine consensus): f-total caps the whole network at |C|≤f; f-local caps any agent's neighbors at |C∩Nᵢ(t)|≤f. Resilient consensus under f-local faults requires (2f+1)-robustness of the graph — connecting interaction-layer structure directly to the threats the mission can withstand. Three attack surfaces exist: message corruption (KL on the message distribution), observation tampering (within the sensor-noise envelope), and action perturbation (TV on the action distribution).

9 · propagation

Propagation — how local deviation becomes mission-level effect

A purely local deviation need not stay local. Once it enters the interaction graph it alters neighbors' inputs, shifts downstream decisions, changes the future joint action, and modifies later mission states and observations.

Definition — One-Step Influence The TV shift on agent j's input at t+1 caused by agent i's deviation at t (with i deviating and all others nominal): ℰₜ^i→j = d_TV( P(x_t+1^j | π), P(x_t+1^j | π̂) )

For agents not directly connected, influence propagates over multi-hop paths in the time-expanded interaction graph. The ℓ-step influence is bounded by the sum over directed length-ℓ paths of the product of edge-level influences:

ℰ_t:t+ℓ^i→j ≤ Σ_{p ∈ Paths(i,j,ℓ)} ∏_{(u,v) ∈ p} ℰ_{t_e}^u→v

This upper bound follows from the data-processing inequality and the triangle inequality for total variation. It is a strict generalization of consensus error propagation: in linear consensus x_t+1=Wₜxₜ+ηₜ, a perturbation at node i reaches node j at t+ℓ as [W_t+ℓ−1…W_t+1]_ji times the perturbation — the linear, deterministic special case of the general influence chain.

Definition — Mission-Level Impact The shift in the whole trajectory distribution induced by a deviation profile: ℐ(D) = d_TV( P(τ | π), P(τ | π̂) )

10 · the game

The adversarial game & the break budget

Under an adversarial threat model the contest is a two-player zero-sum partially observable stochastic game (POSG): blue (the swarm) plays a defensive joint policy to maximize mission return; red (a centralized adversary on k of n agents) selects a deviation profile from the feasible set to minimize it.

Blue: π* ∈ argmax_π min_{D∈Φ_k} J(π, D) · Red: D* ∈ argmin_{D∈Φ_k} J(π, D)

The return gap quantifies degradation under worst-case attack at resource level k: ΔJ(k) = J(π; clean) − min_{D∈Φ_k} J(π, D).

Definition — Break Budget The minimum-resource deviation profile that causes mission failure (subject to the stealth constraints): D* = argmin_D cost(D) s.t. Pr( Failure(τ) | π̂ ) ≥ 1 − δ_fail Cost can be cardinality |C|, amplitude α, duration Σβₜⁱ, or a combined product. The minimum compromise size for a target loss θ is k*(θ) = min{ k : ΔJ(k) ≥ θ }. A large break budget ⇒ structurally hard to break; a small one ⇒ a few agents with small deviations cause failure. How k* scales with swarm size n and connectivity is a central empirical question.

The stealth–degradation trade-off (the stealth–damage frontier)

The stealth constraint forces a Pareto frontier between mission degradation and detectability. A maximally damaging attack (ε_s→∞) is trivially detected; a perfectly stealthy attack (ε_s=0) may produce little degradation. Between them lies the achievable frontier — and its shape is mission-family-dependent (information-coupled missions may expose a steep frontier where small stealth budgets yield large damage; connectivity-coupled missions may flatten it).

The gap between local stealth and global consequence is the heart of the problem. A deviation can satisfy the per-edge KL budget against every neighbor while still shifting the whole trajectory distribution — because no single agent sees the global effect. Aggregate monitoring misses the threat; the team's own posterior supplies the resilience signal. This is the formal version of what the empirical work calls the stealth–damage frontier.

11 · hypotheses

Hypotheses on adversarial amplification (H1–H3)

The zero-sum framing plus the propagation formalism motivate three structural hypotheses that distinguish adversarial pressure from random faults.

H1 — Amplified effects through communication. Small compromise sets can have amplified mission-level effects when they target the information flow that drives coordination. Under learned communication, messages act as shared belief features or intent signals; corrupting a few agents propagates through aggregation and downstream decision rules, producing minority-to-majority influence and disproportionate return loss. Amplification is strongest when compromised agents occupy influential positions (bottlenecks, highly relied-upon senders).

H2 — Robustness gap: random vs. adaptive. The same “amount” of corruption is far more damaging placed strategically. Random noise/dropouts spread their effect across agents and time; an adaptive attacker spends its limited budget only when and where the swarm is most sensitive — near decision boundaries, during critical coordination, on high-influence senders — producing substantially larger return loss under the same budgets (k, ε_s, T_int) while staying stealthy by acting sparsely.

H3 — Attack-surface dominance. Not all intervention channels are equally damaging. When red can influence at most k agents, some surfaces — messages vs. observations vs. actions — produce larger worst-case degradation under the same stealth budgets. Which surface dominates depends on the mission family and the attacker's information (knowledge of policies or the communication graph).

12 · metrics

Resiliency metrics

The formal objects above enable a precise, trajectory-based, threat-parameterized vocabulary for resiliency, aligned with the resilience-engineering distinctions of robustness, brittleness, elasticity, and recovery.

Definition — Mission Robustness Worst-case performance ratio over admissible deviation profiles with |C|≤k and amplitude ≤α. Near 1: robust; near 0: fragile. R_rob(π, 𝔻_α,k) = inf_{D∈𝔻_α,k} J(π̂) / J(π) The robustness margin α* is the largest amplitude for which performance stays above 1−ε_tol of nominal.

Definition — Performance Response Function Maps each point in threat space (k, α, Δ) to the worst-case mission return under that level of adversarial pressure. Γ(k, α, Δ) = inf_{D : |C|=k, amp=α, pattern=Δ} J(π̂)

Definition — Brittleness Frontier The level set in (k, α, Δ)-space where the mission transitions from degradation to failure — the boundary of the region where the mission stays viable. ℱ = ∂{ (k, α, Δ) : Pr( sₜ ∈ Viabδ(K,t) ∀ t ) ≥ 1 − δ_crit }

Definition — Brittleness Index How sharply performance collapses near the frontier — the gradient magnitude of the response function along ℱ. Large ⇒ a cliff (a small resource increase past the frontier causes a large drop); small ⇒ smooth degradation. B_brit = sup_(k,α)∈ℱ ∥ ∇_(k,α) Γ(k, α, Δ) ∥

Definition — Elasticity The average rate of performance loss per unit deviation amplitude, integrated up to the robustness margin — how gracefully performance degrades before the frontier. Near-zero magnitude + high brittleness = cliff-like; steady negative = slope-like. E_elas(π) = (1/α*) ∫₀^α* (∂Γ/∂α)|_k,Δ dα

Definition — Recovery Value, Ratio & Time After a deviation that subsequently ceases at degraded state s_{t_d}: V_rec(s_{t_d}, π) = E_π[ Σ_{t=t_d}^H−1 R(sₜ, aₜ) | s_{t_d} ] ρ_rec = V_rec(s_{t_d}, π) / V_nom(t_d, π) (1: full recovery · 0: irrecoverable) T_rec(s_{t_d}) = min{ T ≥ 0 : Φ(s_{t_d+T}, t_d+T) ≥ Φ^nom(t_d+T) − ε } (∞ if none)

Summary

Metric	Symbol	Domain	What it captures
Robustness	R_rob	[0,1]	Worst-case performance ratio under bounded adversarial pressure
Robustness margin	α*	[0,1]	Largest tolerable deviation amplitude
Brittleness frontier	ℱ	(k,α,Δ)	Boundary separating viability from failure
Brittleness index	B_brit	[0,∞)	Gradient magnitude at frontier — sharpness of collapse
Elasticity	E_elas	(−∞,0]	Average rate of graceful performance loss
Recovery ratio	ρ_rec	[0,1]	Fraction of nominal value recovered after degradation
Recovery time	T_rec	ℕ∪{∞}	Steps to restore health above threshold

13 · instantiation

Worked example — shared exploration & phantom-coverage injection

The full formalism is instantiated on a shared exploration mission: team-level coverage of an unknown environment within a finite horizon. It is information-coupled with a secondary topology coupling, and it shows concretely how a local lie propagates through frontier sharing into mission-level coverage loss.

State. A W×W grid; each agent occupies a cell pₜⁱ; a coverage indicator ξₜ marks sensed cells; the comm graph is proximity-based: (j,i)∈Eₜ ⇔ ∥pₜ^j−pₜⁱ∥_∞ ≤ r_comm. So sₜ = ({pₜⁱ}, ξₜ, Gₜ).
Observation. Cells within a sensing radius r_sense — the local coverage fragment. No agent sees the full map.
Action / reward. {N,S,E,W,stay}; coverage is monotone; team reward = incremental coverage gained per step.
Coupling — frontier sharing. Agent j sends mₜ^j→i = (Fₜ^j, ξ̂ₜ^j) (its frontier + local coverage belief); aggregation is set union cₜⁱ = ∪_j∈Nᵢ(t) ξ̂ₜ^j. This is where propagation originates: a slightly wrong but plausible frontier shifts neighbors' moves, they sense the wrong cells, and forward messages that reinforce the mistake.

Phantom-coverage injection. A compromised agent broadcasts a corrupted frontier by injecting phantom covered cells — cells reported as covered that are actually uncovered: m̃ₜ^i→j = (Fₜⁱ, ξ̂ₜⁱ ∪ Pₜⁱ). Neighbors mark those cells covered, never visit them, and coverage loss accumulates and spreads ℓ hops in the time-expanded graph. Stealth: phantom cells outside every neighbor's sensing range are locally unverifiable — the false report satisfies the KL stealth constraint with ε_s=0 for those cells. Break budget: a single well-positioned agent injecting O(W²(1−θ_cov)) phantom cells can cause failure in d (graph-diameter) steps — k can be as small as 1 if the graph is well-connected and stealth is loose. Overlapping neighbor sensing lets phantom cells be verified and rejected, raising the break budget — the central structural trade-off between information redundancy and vulnerability.

Second instantiation — communication relay (topology-coupled): agents maintain end-to-end connectivity between source–destination pairs; the dominant health indicator is algebraic connectivity λ₂; the attack surface is the relay structure itself (position drift / message corruption reducing path redundancy); the break-budget driver is vertex connectivity κ. The two instantiations demonstrate the formalism generalizes across mission families.

14 · agenda

Focused research questions (RQ1–RQ6)

The broad questions narrow into a focused agenda, each connected to a formal handle developed above.

RQ	Question	Formal handle
RQ1 Propagation	How does a deviation in one agent's policy propagate into a system-level outcome under topology constraints and trust-by-default messaging? Which mechanisms dominate — information flow, role criticality, bottlenecks, feedback loops?	Multi-step influence ℰ_t:t+ℓ^i→j; mission-level impact ℐ(D)
RQ2 Single→swarm	When can a single misbehaving agent cause only local inefficiency, and when can it trigger global collapse? Which mission-level conditions (relay role, unique information source) amplify small deviations into mission-wide failure?	Break budget D* with k=1
RQ3 Brittleness	What is the minimum misbehavior budget that breaks the mission, not just degrades it? The smallest combination of subset size, amplitude, and time pattern crossing the feasibility boundary.	Brittleness frontier ℱ & elasticity surface
RQ4 Homo vs. hetero	How do homogeneous disturbances (same rule across C) differ from heterogeneous ones (diverse rules across agents/time) in failure modes and elasticity surfaces? Which families are more sensitive to heterogeneity?	Deviation profile D; performance response Γ
RQ5 Stealth	What does stealth mean when checks are local, lightweight, and topology-limited? Can a detector-agnostic plausibility budget be operational, and what local evidence suffices to detect misbehavior early enough to prevent breakage?	Stealth budget ε_s (local KL constraint)
RQ6 Recovery	If an agent reboots / loses its policy state, how can the swarm restore competence under the same decentralized constraints? When is it optimal to reboot-and-reintegrate vs. isolate? Can neighbor-assisted reconstruction shift the brittleness frontier?	Recovery ratio ρ_rec; recovery time T_rec

sources

Source documents

This page transcribes the formal content so the site carries it independently. The original source files (with their full TikZ figures, Pareto plots, and bibliography) are available for download:

formalism.pdf

The full Mission-Centered Resiliency document, typeset (figures, frontier plots, notation tables).

formalism.tex

LaTeX source of the formalism (~3,100 lines).

Swarm_resiliency_v1.3.pdf

The problem statement: Dec-POMDP tuple, aggregation interface, effect-based threat model, A1–A5.

Formalization.docx

The earliest backbone: the four interactional regimes & the health / viability / resilience vocabulary.

Related pages: Architecture — the engineering of the same swarm (the substrate these objects are measured on) · Findings — the empirical evidence, including the compromise sweep that instantiates the break budget and the stealth–damage frontier · Literature — where this framework sits in the field.