Ferkans — Interactive Telecom Tutor

When the Data Is Spread Across the Network

Classical estimation places all observations at a single processor. Modern wireless systems — cell-free massive MIMO, sensor networks, federated learning deployments — do not. Each node holds a fragment of the data, a fragment of the parameter, or both. The question is how to produce a globally consistent estimate using only local computation and neighbor-to-neighbor messages.

The answer, remarkably, is the same idea in three guises: consensus (iterate a local averaging rule until everyone agrees), gossip (randomize the pair that averages to avoid synchronization), and distributed Kalman filtering (combine local innovations while consensus-averages the state estimate). All three reduce to properties of the second-largest eigenvalue of a doubly stochastic matrix. And all three are convex in their update rule — which is why they converge.

Definition:
Distributed Average Consensus

Let $\mathcal{G} = (\mathcal{V}, \mathcal{E})$ be an undirected connected graph with $N = |\mathcal{V}|$ nodes. Node $i$ holds a scalar $x_i^{(0)} \in \mathbb{R}$ . Each node updates its state using only neighbors:

$x_i^{(t+1)} = \sum_{j \in \mathcal{N}(i) \cup \{i\}} W_{ij}\, x_j^{(t)}, \quad i = 1, \ldots, N,$

where $\mathbf{W} \in \mathbb{R}^{N \times N}$ is a consensus matrix respecting the graph (i.e., $W_{ij} = 0$ if $(i,j) \notin \mathcal{E}$ and $i \neq j$ ). The task is to design $\mathbf{W}$ so that every node converges to the average $\bar{x} = (1/N)\sum_i x_i^{(0)}$ .

Theorem: Consensus Convergence Conditions

The iteration $\mathbf{x}^{(t+1)} = \mathbf{W}\mathbf{x}^{(t)}$ converges to the average $\bar{x}\mathbf{1}$ for every initial condition if and only if:

$\mathbf{W}\mathbf{1} = \mathbf{1}$ (row stochastic)
$\mathbf{1}^T \mathbf{W} = \mathbf{1}^T$ (column stochastic)
$\rho(\mathbf{W} - \tfrac{1}{N}\mathbf{1}\mathbf{1}^T) < 1$

Moreover, the rate of convergence is geometric: $\|\mathbf{x}^{(t)} - \bar{x}\mathbf{1}\|_2 \leq \rho^t \|\mathbf{x}^{(0)} - \bar{x}\mathbf{1}\|_2, \quad \rho = \rho(\mathbf{W} - \tfrac{1}{N}\mathbf{1}\mathbf{1}^T).$

$\mathbf{1}$ is a fixed point of the iteration (condition 1) with eigenvalue 1, and the average is preserved (condition 2). Condition 3 ensures that every other eigen-mode decays. The spectral radius $\rho$ is exactly the second-largest-in-magnitude eigenvalue of $\mathbf{W}$ , often written $\lambda_2(\mathbf{W})$ .

Show Hint

Decompose $\mathbf{x}^{(t)} - \bar{x}\mathbf{1}$ in the eigenbasis of $\mathbf{W}$ ; the components along $\mathbf{1}$ vanish.

Use that $\mathbf{W}$ is doubly stochastic plus the spectral radius condition to bound the remaining modes.

The tightness of the rate is achieved when $\mathbf{x}^{(0)} - \bar{x}\mathbf{1}$ is aligned with the eigenvector of $\lambda_2$ .

Proof

Invariance of the average

Column stochasticity gives $\mathbf{1}^T \mathbf{x}^{(t+1)} = \mathbf{1}^T \mathbf{W} \mathbf{x}^{(t)} = \mathbf{1}^T \mathbf{x}^{(t)}$ , so $N\bar{x}^{(t)} = N\bar{x}^{(0)}$ and the average is preserved across iterations.

Deviation dynamics

Let $\mathbf{e}^{(t)} = \mathbf{x}^{(t)} - \bar{x}\mathbf{1}$ . Then $\mathbf{e}^{(t+1)} = \mathbf{W}\mathbf{e}^{(t)}$ since $\mathbf{W}\mathbf{1} = \mathbf{1}$ . Moreover $\mathbf{1}^T \mathbf{e}^{(t)} = 0$ , so $\mathbf{e}^{(t)}$ lies in the subspace orthogonal to $\mathbf{1}$ .

Spectral bound

On this subspace, $\mathbf{W}$ acts with spectral radius $\rho = \rho(\mathbf{W} - \tfrac{1}{N}\mathbf{1}\mathbf{1}^T)$ . By Gelfand's formula, $\|\mathbf{e}^{(t)}\|_2 \leq \rho^t \|\mathbf{e}^{(0)}\|_2 + o(\rho^t)$ , and with symmetry the bound is tight: $\|\mathbf{e}^{(t)}\|_2 / \|\mathbf{e}^{(0)}\|_2 \leq \rho^t$ . $\blacksquare$

Choosing the Consensus Matrix: Metropolis Weights

A classical and practical choice is the Metropolis–Hastings weights: $W_{ij} = \begin{cases} 1/\max(d_i, d_j) + 1, & (i,j) \in \mathcal{E}, \\ 1 - \sum_{k \neq i} W_{ik}, & i = j, \\ 0, & \text{otherwise}. \end{cases}$ where $d_i = |\mathcal{N}(i)|$ . These are always doubly stochastic and only use local degree information. The optimal choice of $\mathbf{W}$ (minimizing $\lambda_2$ ) is a semidefinite program — convex, and solvable offline.

Definition:
Randomized Pairwise Gossip

At each round, an edge $(i, j) \in \mathcal{E}$ is activated (uniformly at random, or according to a specified distribution). The two endpoints average: $x_i^{(t+1)} = x_j^{(t+1)} = \frac{x_i^{(t)} + x_j^{(t)}}{2},$ while all other nodes hold their values. The expected update matrix $\bar{\mathbf{W}} = \mathbb{E}[\mathbf{W}_t]$ is doubly stochastic, and the deviation decays as $\mathbb{E}\|\mathbf{e}^{(t)}\|_2^2 \leq \lambda_2(\bar{\mathbf{W}})^t \|\mathbf{e}^{(0)}\|_2^2$ .

Gossip trades communication efficiency for rate: each round only two nodes communicate, so the per-round cost is $\mathcal{O}(1)$ not $\mathcal{O}(|\mathcal{E}|)$ , at the price of slower convergence on poorly-connected graphs.

Distributed Kalman–Consensus Filter

Complexity:

\mathcal{O}(K \cdot |\mathcal{E}| \cdot n_x)

per time step, where

K

= consensus rounds,

n_x

= state dimension

Input: Local observations

y_i^{(t)}

at each node

i

; consensus matrix

\mathbf{W}

Output: Local estimate

\hat{\mathbf{x}}_i^{(t)}

of global state at each node

1. Local prediction:

2.

\quad \hat{\mathbf{x}}_i^{(t|t-1)} \leftarrow \mathbf{F}\, \hat{\mathbf{x}}_i^{(t-1|t-1)}

3.

\quad \mathbf{P}_i^{(t|t-1)} \leftarrow \mathbf{F}\mathbf{P}_i^{(t-1|t-1)}\mathbf{F}^T + \mathbf{Q}

4. Innovation fusion (consensus on information vectors):

5.

\quad \mathbf{u}_i^{(t)} \leftarrow \mathbf{H}_i^T \mathbf{R}_i^{-1} y_i^{(t)}

,

\mathbf{U}_i^{(t)} \leftarrow \mathbf{H}_i^T \mathbf{R}_i^{-1} \mathbf{H}_i

6.

\quad

Run

K

consensus rounds:

\mathbf{u}_i \leftarrow \sum_j W_{ij} \mathbf{u}_j

, similarly for

\mathbf{U}_i

7. Local update:

8.

\quad \mathbf{P}_i^{(t|t)} \leftarrow \bigl((\mathbf{P}_i^{(t|t-1)})^{-1} + N\, \mathbf{U}_i^{(t)}\bigr)^{-1}

9.

\quad \hat{\mathbf{x}}_i^{(t|t)} \leftarrow \hat{\mathbf{x}}_i^{(t|t-1)} + \mathbf{P}_i^{(t|t)} (N\, \mathbf{u}_i^{(t)} - \mathbf{U}_i^{(t)} \hat{\mathbf{x}}_i^{(t|t-1)})

This is the Olfati-Saber consensus-Kalman filter. With infinite consensus rounds ( $K \to \infty$ ), every node recovers the centralized Kalman estimate. With finite $K$ , there is a consensus error that scales with $\lambda_2^K$ .

Example: Distributed Temperature Sensing

Twenty sensors deployed in a building measure local temperatures $y_i = T + n_i$ with $n_i \sim \mathcal{N}(0, \sigma^2)$ , independent. The goal is for every node to compute the empirical average $\bar{y}$ using only neighbor exchanges over a ring graph.

Solution

Metropolis weights on the ring

Every node has degree 2. Metropolis weights give $W_{ij} = 1/3$ for neighbors and $W_{ii} = 1/3$ for the self-loop. The consensus matrix has eigenvalues $\lambda_k = \frac{1}{3}(1 + 2\cos(2\pi k/N))$ for $k = 0, 1, \ldots, N-1$ .

Convergence rate

The second-largest eigenvalue magnitude is $\lambda_2 = \frac{1}{3}(1 + 2\cos(2\pi/N))$ . For $N = 20$ , $\lambda_2 \approx 0.990$ — slow convergence because the ring is poorly connected. Reaching $\|\mathbf{e}\|_\infty < 0.01$ requires $\approx \ln(0.01)/\ln(0.990) \approx 458$ rounds.

Lesson

Ring graphs are bad for consensus. A small-world rewiring (adding a few long-range edges) dramatically improves $\lambda_2$ — the Watts–Strogatz phenomenon. In cell-free massive MIMO, this argues for a sparse set of long-distance backhaul links, not purely local connectivity.

Consensus Convergence on Common Graphs

Simulate consensus with Metropolis weights on a ring, a complete graph, or an Erdős–Rényi random graph. Track per-node error $\|x_i^{(t)} - \bar{x}\|$ and compare to the theoretical rate $\lambda_2^t$ .

Parameters

N

(nodes)20

Graph topology

p

(ER edge prob)0.3

T

(iterations)100

seed7

Gossip vs Synchronous Consensus

Compare deterministic synchronous consensus (all nodes update each round) to randomized pairwise gossip. Axis is the number of messages, not rounds — gossip looks worse per-round but competitive per-message.

Parameters

N

15

Message budget800

seed3

Watching Consensus Happen on a Random Graph

A 15-node Erdős–Rényi graph with node states color-coded. Over 100 rounds the heat-map equalizes to the global average.

Node values converge to the average

\bar{x}

at rate

\lambda_2(\mathbf{W})^t

.

Definition:
$\varepsilon$ -Differential Privacy

A randomized estimator $\mathcal{M}$ is $\varepsilon$ -differentially private if for any two datasets $D, D'$ differing in a single record and any measurable set $S$ , $\Pr\{\mathcal{M}(D) \in S\} \leq e^{\varepsilon} \Pr\{\mathcal{M}(D') \in S\}.$ The Laplace mechanism adds noise $\mathrm{Lap}(0, \Delta f / \varepsilon)$ to a query $f$ with sensitivity $\Delta f = \max_{D \sim D'} |f(D) - f(D')|$ .

Small $\varepsilon$ = strong privacy but noisy estimates. The tradeoff between privacy and statistical efficiency is a direct analogue of the rate-distortion tradeoff: there is a fundamental lower bound on MSE as a function of $\varepsilon$ .

⚠️Engineering Note

Privacy-Preserving Consensus

If each node perturbs its initial value with Laplace noise before starting consensus, the network computes a noisy average while preserving $\varepsilon$ -differential privacy for each node's local data. The MSE floor is $\mathcal{O}(1/(N\varepsilon^2))$ — good privacy ( $\varepsilon$ small) forces a large MSE. In cell-free massive MIMO, this becomes relevant when access points belong to different operators that cannot share raw pilots.

Practical Constraints

•
Total Laplace noise variance scales as $2/\varepsilon^2$ per node
•
Composition across multiple consensus queries degrades $\varepsilon$ by basic composition theorem
•
MSE floor implies a minimum $\varepsilon$ to achieve target estimation accuracy

📋 Ref: NIST SP 800-188 draft

Why This Matters: Cell-Free Massive MIMO as Distributed Estimation

In cell-free massive MIMO, tens of access points (APs) serve a set of users cooperatively. Local channel estimates are computed at each AP from uplink pilots, then fused — via consensus over a fronthaul graph — into a coherent global estimate. The rate at which fronthaul overhead scales is exactly the number of consensus rounds needed to reach a target accuracy, which is $\mathcal{O}(\log(1/\varepsilon)/\log(1/\lambda_2))$ . This connects the combinatorics of AP placement (graph topology) to the spectral efficiency of the resulting air interface.

🎓CommIT Contribution(2021)

Distributed MMSE Processing for Cell-Free Massive MIMO

Z. Chen, E. Björnson, G. Caire — IEEE Trans. Wireless Commun., vol. 20, no. 4

CommIT-group work developed a scalable distributed processing architecture for cell-free massive MIMO. Each AP executes local MMSE combining over a limited neighborhood and exchanges compressed statistics over the fronthaul. The algorithm is exactly a distributed LMMSE estimator in the sense of Section 16: each node computes a local posterior and the fronthaul implements a consensus-style fusion. The result is near-centralized SINR with fronthaul overhead that scales linearly in the number of APs, not quadratically.

distributed-estimationcell-free-mimolmmseView Paper →

Common Mistake: Finite Consensus Rounds Are Not Centralized

Mistake:

A paper claims a "distributed Kalman filter that matches the centralized performance." On inspection, the algorithm uses a fixed number of consensus rounds per time step.

Correction:

With $K$ consensus rounds, the local estimate lags the centralized estimate by an error proportional to $\lambda_2^K$ . The claim "matches centralized" is true only in the limit $K \to \infty$ . Papers must state the finite- $K$ error or specify the consensus rate requirement explicitly.

Common Mistake: Just Adding Noise at the End Is Not Private

Mistake:

A protocol computes the exact consensus average over a graph, then each node adds Laplace noise to the final value. This is claimed to be differentially private.

Correction:

The noise must be added before any value leaves the node — consensus itself leaks information through intermediate messages. Correct differential privacy for distributed estimation requires noise injection at each round, or a centralized trusted aggregator that adds noise at the end.

Historical Note: From Polyak to Olfati-Saber: A Consensus Timeline

1974–present

The consensus idea has deep roots. Morris DeGroot (1974) proposed iterative averaging as a model of consensus belief formation in social networks. Boris Polyak and coauthors studied similar iterations in the 1980s control literature. Lin–Moreau (2004) and Olfati-Saber–Murray (2004) put the modern formulation in place: distributed consensus over directed graphs with convergence tied to graph connectivity. Xiao and Boyd (2004) cast optimal weight design as a convex SDP — the algorithm you would actually run today.

Key Takeaway

Distributed estimation on a graph has a single fundamental quantity: $\lambda_2(\mathbf{W})$ , the second-largest eigenvalue magnitude of the consensus matrix. It governs the rate of convergence, the fronthaul overhead of cell-free massive MIMO, and the noise-amplification in private consensus. Graph design = eigenvalue design.

Quick Check

You have a ring graph of 100 sensor nodes and want $\|\mathbf{e}^{(t)}\|_2 / \|\mathbf{e}^{(0)}\|_2 < 0.01$ . The second-largest eigenvalue of the Metropolis consensus matrix is $\lambda_2 \approx 0.9993$ . Roughly how many consensus rounds are required?

About 10

About 100

About 1000

About 6600

Correction:

About 6600

$\ln(0.01)/\ln(0.9993) \approx 4.6/6.99\times 10^{-4} \approx 6578$ rounds. This is why ring topologies are terrible for consensus.

Estimation on Graphs and Distributed Inference

When the Data Is Spread Across the Network

Definition: Distributed Average Consensus

Theorem: Consensus Convergence Conditions

Invariance of the average

Deviation dynamics

Spectral bound

Choosing the Consensus Matrix: Metropolis Weights

Definition: Randomized Pairwise Gossip

Distributed Kalman–Consensus Filter

Example: Distributed Temperature Sensing

Metropolis weights on the ring

Convergence rate

Lesson

Consensus Convergence on Common Graphs

Parameters

Gossip vs Synchronous Consensus

Parameters

Watching Consensus Happen on a Random Graph

Definition: ε\varepsilonε-Differential Privacy

Privacy-Preserving Consensus

Why This Matters: Cell-Free Massive MIMO as Distributed Estimation

Distributed MMSE Processing for Cell-Free Massive MIMO

Common Mistake: Finite Consensus Rounds Are Not Centralized

Common Mistake: Just Adding Noise at the End Is Not Private

Historical Note: From Polyak to Olfati-Saber: A Consensus Timeline

Key Takeaway

Quick Check

Definition:
Distributed Average Consensus

Definition:
Randomized Pairwise Gossip

Definition:
$\varepsilon$ -Differential Privacy