Over-the-Air Computation

From Communication to Computation Over the Air

Classical multiple access (FDMA, TDMA, CDMA, OFDMA) is designed to separate users' signals so that each can be decoded individually. But many emerging applications do not need individual messages β€” they need an aggregated function of all users' data:

  • Federated learning (FL): The server needs the average gradient gΛ‰=1Kβˆ‘k=1Kgk\bar{\mathbf{g}} = \frac{1}{K}\sum_{k=1}^{K} \mathbf{g}_k from KK edge devices, not each individual gradient gk\mathbf{g}_k.

  • Distributed sensing / IoT: A fusion centre needs the mean temperature, maximum pollution level, or sum of sensor readings.

  • Consensus and distributed optimisation: Agents need to compute weighted averages of their local states.

In all these cases, the multiple-access channel (MAC) is not an obstacle to be overcome β€” it is an asset. The superposition property of the wireless channel naturally computes a sum:

y=βˆ‘k=1Khkxk+ny = \sum_{k=1}^{K} h_k x_k + n

Over-the-air computation (AirComp) exploits this superposition to compute the desired function in a single channel use, regardless of KK. This stands in stark contrast to conventional orthogonal access, which requires KK channel uses (one per device).

Definition:

Over-the-Air Computation (AirComp)

Consider KK single-antenna devices, each holding a local value sk∈Rs_k \in \mathbb{R} (e.g., a gradient component, a sensor reading). The server (equipped with a single antenna) wishes to compute the arithmetic mean:

g(s1,…,sK)=1Kβˆ‘k=1Kskg(s_1, \ldots, s_K) = \frac{1}{K}\sum_{k=1}^{K} s_k

Transmission protocol:

  1. Each device kk knows its own channel coefficient hkh_k (via downlink pilot) and transmits the pre-equalised signal: xk=Ξ·k sk,Ξ·k=p0hkx_k = \eta_k \, s_k, \qquad \eta_k = \frac{\sqrt{p_0}}{h_k} where p0p_0 is a common power scaling factor chosen to satisfy per-device power constraints.

  2. The server receives: y=βˆ‘k=1Khkxk+n=p0βˆ‘k=1Ksk+ny = \sum_{k=1}^{K} h_k x_k + n = \sqrt{p_0} \sum_{k=1}^{K} s_k + n

  3. The server estimates: g^=yKp0=1Kβˆ‘k=1Ksk+nKp0\hat{g} = \frac{y}{K\sqrt{p_0}} = \frac{1}{K}\sum_{k=1}^{K} s_k + \frac{n}{K\sqrt{p_0}}

The mean squared error (MSE) of this estimate is:

MSE=E ⁣[(g^βˆ’g)2]=Οƒ2K2p0\text{MSE} = \mathbb{E}\!\left[(\hat{g} - g)^2\right] = \frac{\sigma^2}{K^2 p_0}

Remarkably, the MSE decreases with the number of devices KK (noise averaging), whereas in orthogonal access the total communication latency increases linearly with KK.

The key requirement is channel inversion at the transmitters: each device must pre-equalise its signal so that all signals arrive coherently aligned at the server. This requires accurate CSI at the transmitters and synchronised transmission.

,

The Power Control Bottleneck

The channel inversion ηk=p0/hk\eta_k = \sqrt{p_0}/h_k means that a device with a weak channel (∣hk∣|h_k| small) must transmit with high power to compensate. If device kk has a power constraint PkP_k:

∣ηk∣2β€‰βˆ£sk∣2≀Pkβ€…β€ŠβŸΉβ€…β€Šp0≀Pk∣hk∣2/∣sk∣2|\eta_k|^2 \, |s_k|^2 \leq P_k \implies p_0 \leq P_k |h_k|^2 / |s_k|^2

The common scaling factor p0p_0 is limited by the weakest device:

p0=min⁑k=1,…,KPk∣hk∣2∣sk∣2p_0 = \min_{k=1,\ldots,K} \frac{P_k |h_k|^2}{|s_k|^2}

This "weakest-link" bottleneck can severely degrade performance when channel conditions are heterogeneous. Mitigation strategies include:

  • Truncated channel inversion: Exclude devices with ∣hk∣2<Ξ³th|h_k|^2 < \gamma_{\text{th}} (accept some bias for lower MSE).
  • Multi-antenna receiver (MIMO AirComp): Use beamforming at the server to boost weak channels before aggregation.
  • RIS-assisted AirComp: Use a reconfigurable intelligent surface to reshape channels and reduce heterogeneity.

AirComp with Truncated Channel Inversion

Input: Local values {sk}k=1K\{s_k\}_{k=1}^{K}, channel estimates
{hk}k=1K\{h_k\}_{k=1}^{K}, power budgets {Pk}k=1K\{P_k\}_{k=1}^{K},
truncation threshold Ξ³th\gamma_\text{th}
Output: Estimate g^\hat{g} of the arithmetic mean
g=1Kβˆ‘kskg = \frac{1}{K}\sum_k s_k
1. Active device selection:
K←{k:∣hk∣2β‰₯Ξ³th}\mathcal{K} \leftarrow \{k : |h_k|^2 \geq \gamma_\text{th}\}
// Exclude devices with very weak channels
2. Kaβ†βˆ£K∣K_a \leftarrow |\mathcal{K}|
// Number of active devices
3. Common power level:
p0←min⁑k∈KPk∣hk∣2/max⁑(∣sk∣2,Ο΅)p_0 \leftarrow \min_{k \in \mathcal{K}} P_k |h_k|^2 / \max(|s_k|^2, \epsilon)
4. Transmit (each active device k∈Kk \in \mathcal{K}):
xk←(p0/hk)β‹…skx_k \leftarrow (\sqrt{p_0} / h_k) \cdot s_k
// Channel inversion pre-equalisation
5. Receive (server):
y=βˆ‘k∈Khkxk+n=p0βˆ‘k∈Ksk+ny = \sum_{k \in \mathcal{K}} h_k x_k + n = \sqrt{p_0} \sum_{k \in \mathcal{K}} s_k + n
6. Estimate:
g^←y/(Kap0)\hat{g} \leftarrow y / (K_a \sqrt{p_0})
7. Return g^\hat{g}
MSE: MSE=Var(n)/(Ka2p0)+Bias2\text{MSE} = \text{Var}(n)/(K_a^2 p_0) + \text{Bias}^2,
where Bias=1Kβˆ‘kβˆ‰Ksk\text{Bias} = \frac{1}{K}\sum_{k \notin \mathcal{K}} s_k
(contribution of excluded devices).

OTA Computation MSE vs Number of Devices

Observe how the MSE of over-the-air aggregation varies with the number of devices KK. With perfect alignment, MSE decreases as 1/K21/K^2 (noise averaging). Imperfect phase alignment (nonzero alignment error) and finite SNR create an MSE floor. Compare with orthogonal access, where communication latency grows linearly with KK.

Parameters
50
5
10

MIMO AirComp and Broadband Extensions

The single-antenna AirComp framework extends naturally to multi-antenna and OFDM settings:

MIMO AirComp: If the server has MM receive antennas and the devices have single antennas, the received signal is:

y=βˆ‘k=1Khkxk+n∈CM\mathbf{y} = \sum_{k=1}^{K} \mathbf{h}_k x_k + \mathbf{n} \in \mathbb{C}^{M}

The server applies a receive beamformer w\mathbf{w}:

y^=wHy=βˆ‘k=1K(wHhk)xk+wHn\hat{y} = \mathbf{w}^H \mathbf{y} = \sum_{k=1}^{K} (\mathbf{w}^H \mathbf{h}_k) x_k + \mathbf{w}^H\mathbf{n}

The joint optimisation of {Ξ·k}\{\eta_k\} and w\mathbf{w} to minimise MSE subject to per-device power constraints is a non-convex problem but admits efficient alternating optimisation: fix w\mathbf{w} and optimise {Ξ·k}\{\eta_k\} (convex), then fix {Ξ·k}\{\eta_k\} and optimise w\mathbf{w} (closed-form MMSE receiver). The multi-antenna gain alleviates the weakest-link bottleneck by boosting weak channels through spatial combining.

Broadband AirComp (OFDM): On subcarrier mm, the received signal is ym=βˆ‘khk,mxk,m+nmy_m = \sum_k h_{k,m} x_{k,m} + n_m. Per-subcarrier channel inversion allows parallel aggregation across all subcarriers, aggregating a high-dimensional vector (e.g., an entire gradient vector in FL) in a single OFDM symbol.

AirComp for Federated Learning

The most prominent application of AirComp is wireless federated learning (FL), where KK devices collaboratively train a shared model without exchanging raw data. In each FL round:

  1. The server broadcasts the current global model wt\mathbf{w}_t.
  2. Each device kk computes a local gradient gk\mathbf{g}_k on its private data.
  3. The devices transmit gk\mathbf{g}_k via AirComp; the server receives gΛ‰^β‰ˆ1Kβˆ‘kgk\hat{\bar{\mathbf{g}}} \approx \frac{1}{K}\sum_k \mathbf{g}_k.
  4. The server updates: wt+1=wtβˆ’Ξ±gΛ‰^\mathbf{w}_{t+1} = \mathbf{w}_t - \alpha \hat{\bar{\mathbf{g}}}.

The MSE of the AirComp aggregation acts as gradient noise, which is analogous to stochastic gradient noise and can be absorbed into the convergence analysis. Under mild conditions, AirComp-based FL converges at the same rate as ideal (noiseless) FL up to an SNR-dependent constant.

The communication efficiency gain is dramatic: conventional orthogonal FL requires KK time slots per round (or K/BK/B with bandwidth BB), while AirComp uses a single time slot regardless of KK. For K=100K = 100 devices, this is a 100Γ—100\times latency reduction per FL round.

Open Research Directions in AirComp

AirComp is a vibrant research area with several open problems:

  • Beyond arithmetic mean: Computing other functions (max, min, geometric mean, polynomial functions) over the air requires nonlinear pre-processing and is generally harder. Nazer and Gastpar's computation coding framework (2007) provides information-theoretic foundations using nested lattice codes.

  • Asynchronous AirComp: In practice, devices cannot be perfectly synchronised. Timing offsets cause inter-carrier interference in OFDM AirComp. Robust designs using guard intervals and timing-error-aware equalisation are needed.

  • Privacy: While FL avoids sharing raw data, the transmitted signals xk=Ξ·kskx_k = \eta_k s_k leak information about sks_k. Differential privacy noise injection at each device is compatible with AirComp but degrades MSE β€” the privacy-utility trade-off is an active research direction.

  • Heterogeneous computing: Devices with different computing capabilities produce gradients at different rates (stragglers). Combining AirComp with partial-participation FL and coded computing is an emerging topic.

Over-the-Air Computation β€” Channel Superposition

Visualise how KK devices simultaneously transmit pre-equalised signals that coherently combine at the server antenna, producing a noisy sum of the local values in a single channel use.
Five devices transmit their local values; the wireless channel naturally computes the sum. The server divides by KK to estimate the arithmetic mean β€” all in one time slot.

Why This Matters: AirComp and Secure Computation in the SC Book

The SC book (Chapters 8--10) develops AirComp in the context of secure and private computation, including differential privacy guarantees for federated learning, Byzantine-resilient aggregation (ByzSecAgg by Jahani-Nezhad/Maddah-Ali/Caire), and coded computing for straggler mitigation. The ITA book (Chapter 28) provides the information-theoretic foundations of computation over MACs, connecting Nazer-Gastpar's lattice coding framework to practical AirComp system design.

See full treatment in Model-Based vs Data-Driven Design

Over-the-Air Computation (AirComp)

A transmission technique that exploits the superposition property of the wireless MAC to compute aggregate functions (e.g., arithmetic mean) of distributed devices' data in a single channel use, regardless of the number of devices.

Related: Federated Learning (FL)

Federated Learning (FL)

A distributed machine learning paradigm where devices collaboratively train a shared model by exchanging gradient updates (not raw data) with a central server. AirComp enables efficient gradient aggregation over the air.

Related: Over-the-Air Computation (AirComp)