Ferkans — Interactive Telecom Tutor

From Communication to Computation Over the Air

Classical multiple access (FDMA, TDMA, CDMA, OFDMA) is designed to separate users' signals so that each can be decoded individually. But many emerging applications do not need individual messages — they need an aggregated function of all users' data:

Federated learning (FL): The server needs the average gradient $\bar{\mathbf{g}} = \frac{1}{K}\sum_{k=1}^{K} \mathbf{g}_k$ from $K$ edge devices, not each individual gradient $\mathbf{g}_k$ .
Distributed sensing / IoT: A fusion centre needs the mean temperature, maximum pollution level, or sum of sensor readings.
Consensus and distributed optimisation: Agents need to compute weighted averages of their local states.

In all these cases, the multiple-access channel (MAC) is not an obstacle to be overcome — it is an asset. The superposition property of the wireless channel naturally computes a sum:

$y = \sum_{k=1}^{K} h_k x_k + n$

Over-the-air computation (AirComp) exploits this superposition to compute the desired function in a single channel use, regardless of $K$ . This stands in stark contrast to conventional orthogonal access, which requires $K$ channel uses (one per device).

Definition:
Over-the-Air Computation (AirComp)

Consider $K$ single-antenna devices, each holding a local value $s_k \in \mathbb{R}$ (e.g., a gradient component, a sensor reading). The server (equipped with a single antenna) wishes to compute the arithmetic mean:

$g(s_1, \ldots, s_K) = \frac{1}{K}\sum_{k=1}^{K} s_k$

Transmission protocol:

Each device $k$ knows its own channel coefficient $h_k$ (via downlink pilot) and transmits the pre-equalised signal: $x_k = \eta_k \, s_k, \qquad \eta_k = \frac{\sqrt{p_0}}{h_k}$ where $p_0$ is a common power scaling factor chosen to satisfy per-device power constraints.
The server receives: $y = \sum_{k=1}^{K} h_k x_k + n = \sqrt{p_0} \sum_{k=1}^{K} s_k + n$
The server estimates: $\hat{g} = \frac{y}{K\sqrt{p_0}} = \frac{1}{K}\sum_{k=1}^{K} s_k + \frac{n}{K\sqrt{p_0}}$

The mean squared error (MSE) of this estimate is:

$\text{MSE} = \mathbb{E}\!\left[(\hat{g} - g)^2\right] = \frac{\sigma^2}{K^2 p_0}$

Remarkably, the MSE decreases with the number of devices $K$ (noise averaging), whereas in orthogonal access the total communication latency increases linearly with $K$ .

The key requirement is channel inversion at the transmitters: each device must pre-equalise its signal so that all signals arrive coherently aligned at the server. This requires accurate CSI at the transmitters and synchronised transmission.

,

The Power Control Bottleneck

The channel inversion $\eta_k = \sqrt{p_0}/h_k$ means that a device with a weak channel ( $|h_k|$ small) must transmit with high power to compensate. If device $k$ has a power constraint $P_k$ :

$|\eta_k|^2 \, |s_k|^2 \leq P_k \implies p_0 \leq P_k |h_k|^2 / |s_k|^2$

The common scaling factor $p_0$ is limited by the weakest device:

$p_0 = \min_{k=1,\ldots,K} \frac{P_k |h_k|^2}{|s_k|^2}$

This "weakest-link" bottleneck can severely degrade performance when channel conditions are heterogeneous. Mitigation strategies include:

Truncated channel inversion: Exclude devices with $|h_k|^2 < \gamma_{\text{th}}$ (accept some bias for lower MSE).
Multi-antenna receiver (MIMO AirComp): Use beamforming at the server to boost weak channels before aggregation.
RIS-assisted AirComp: Use a reconfigurable intelligent surface to reshape channels and reduce heterogeneity.

AirComp with Truncated Channel Inversion

Input: Local values

\{s_k\}_{k=1}^{K}

, channel estimates

\{h_k\}_{k=1}^{K}

, power budgets

\{P_k\}_{k=1}^{K}

,

truncation threshold

\gamma_\text{th}

Output: Estimate

\hat{g}

of the arithmetic mean

g = \frac{1}{K}\sum_k s_k

1. Active device selection:

\mathcal{K} \leftarrow \{k : |h_k|^2 \geq \gamma_\text{th}\}

// Exclude devices with very weak channels

2.

K_a \leftarrow |\mathcal{K}|

// Number of active devices

3. Common power level:

p_0 \leftarrow \min_{k \in \mathcal{K}} P_k |h_k|^2 / \max(|s_k|^2, \epsilon)

4. Transmit (each active device $k \in \mathcal{K}$ ):

x_k \leftarrow (\sqrt{p_0} / h_k) \cdot s_k

// Channel inversion pre-equalisation

5. Receive (server):

y = \sum_{k \in \mathcal{K}} h_k x_k + n = \sqrt{p_0} \sum_{k \in \mathcal{K}} s_k + n

6. Estimate:

\hat{g} \leftarrow y / (K_a \sqrt{p_0})

7. Return

\hat{g}

MSE:

\text{MSE} = \text{Var}(n)/(K_a^2 p_0) + \text{Bias}^2

,

where

\text{Bias} = \frac{1}{K}\sum_{k \notin \mathcal{K}} s_k

(contribution of excluded devices).

OTA Computation MSE vs Number of Devices

Observe how the MSE of over-the-air aggregation varies with the number of devices $K$ . With perfect alignment, MSE decreases as $1/K^2$ (noise averaging). Imperfect phase alignment (nonzero alignment error) and finite SNR create an MSE floor. Compare with orthogonal access, where communication latency grows linearly with $K$ .

Parameters

Max devices

K

50

Alignment error (degrees)5

SNR (dB)10

MIMO AirComp and Broadband Extensions

The single-antenna AirComp framework extends naturally to multi-antenna and OFDM settings:

MIMO AirComp: If the server has $M$ receive antennas and the devices have single antennas, the received signal is:

$\mathbf{y} = \sum_{k=1}^{K} \mathbf{h}_k x_k + \mathbf{n} \in \mathbb{C}^{M}$

The server applies a receive beamformer $\mathbf{w}$ :

$\hat{y} = \mathbf{w}^H \mathbf{y} = \sum_{k=1}^{K} (\mathbf{w}^H \mathbf{h}_k) x_k + \mathbf{w}^H\mathbf{n}$

The joint optimisation of $\{\eta_k\}$ and $\mathbf{w}$ to minimise MSE subject to per-device power constraints is a non-convex problem but admits efficient alternating optimisation: fix $\mathbf{w}$ and optimise $\{\eta_k\}$ (convex), then fix $\{\eta_k\}$ and optimise $\mathbf{w}$ (closed-form MMSE receiver). The multi-antenna gain alleviates the weakest-link bottleneck by boosting weak channels through spatial combining.

Broadband AirComp (OFDM): On subcarrier $m$ , the received signal is $y_m = \sum_k h_{k,m} x_{k,m} + n_m$ . Per-subcarrier channel inversion allows parallel aggregation across all subcarriers, aggregating a high-dimensional vector (e.g., an entire gradient vector in FL) in a single OFDM symbol.

AirComp for Federated Learning

The most prominent application of AirComp is wireless federated learning (FL), where $K$ devices collaboratively train a shared model without exchanging raw data. In each FL round:

The server broadcasts the current global model $\mathbf{w}_t$ .
Each device $k$ computes a local gradient $\mathbf{g}_k$ on its private data.
The devices transmit $\mathbf{g}_k$ via AirComp; the server receives $\hat{\bar{\mathbf{g}}} \approx \frac{1}{K}\sum_k \mathbf{g}_k$ .
The server updates: $\mathbf{w}_{t+1} = \mathbf{w}_t - \alpha \hat{\bar{\mathbf{g}}}$ .

The MSE of the AirComp aggregation acts as gradient noise, which is analogous to stochastic gradient noise and can be absorbed into the convergence analysis. Under mild conditions, AirComp-based FL converges at the same rate as ideal (noiseless) FL up to an SNR-dependent constant.

The communication efficiency gain is dramatic: conventional orthogonal FL requires $K$ time slots per round (or $K/B$ with bandwidth $B$ ), while AirComp uses a single time slot regardless of $K$ . For $K = 100$ devices, this is a $100\times$ latency reduction per FL round.

Open Research Directions in AirComp

AirComp is a vibrant research area with several open problems:

Beyond arithmetic mean: Computing other functions (max, min, geometric mean, polynomial functions) over the air requires nonlinear pre-processing and is generally harder. Nazer and Gastpar's computation coding framework (2007) provides information-theoretic foundations using nested lattice codes.
Asynchronous AirComp: In practice, devices cannot be perfectly synchronised. Timing offsets cause inter-carrier interference in OFDM AirComp. Robust designs using guard intervals and timing-error-aware equalisation are needed.
Privacy: While FL avoids sharing raw data, the transmitted signals $x_k = \eta_k s_k$ leak information about $s_k$ . Differential privacy noise injection at each device is compatible with AirComp but degrades MSE — the privacy-utility trade-off is an active research direction.
Heterogeneous computing: Devices with different computing capabilities produce gradients at different rates (stragglers). Combining AirComp with partial-participation FL and coded computing is an emerging topic.

Over-the-Air Computation — Channel Superposition

Visualise how

K

devices simultaneously transmit pre-equalised signals that coherently combine at the server antenna, producing a noisy sum of the local values in a single channel use.

Five devices transmit their local values; the wireless channel naturally computes the sum. The server divides by

K

to estimate the arithmetic mean — all in one time slot.

Why This Matters: AirComp and Secure Computation in the SC Book

The SC book (Chapters 8--10) develops AirComp in the context of secure and private computation, including differential privacy guarantees for federated learning, Byzantine-resilient aggregation (ByzSecAgg by Jahani-Nezhad/Maddah-Ali/Caire), and coded computing for straggler mitigation. The ITA book (Chapter 28) provides the information-theoretic foundations of computation over MACs, connecting Nazer-Gastpar's lattice coding framework to practical AirComp system design.

See full treatment in Model-Based vs Data-Driven Design

Over-the-Air Computation (AirComp)

A transmission technique that exploits the superposition property of the wireless MAC to compute aggregate functions (e.g., arithmetic mean) of distributed devices' data in a single channel use, regardless of the number of devices.

Related: Federated Learning (FL)

Federated Learning (FL)

A distributed machine learning paradigm where devices collaboratively train a shared model by exchanging gradient updates (not raw data) with a central server. AirComp enables efficient gradient aggregation over the air.

Over-the-Air Computation