Ferkans — Interactive Telecom Tutor

Why Over-the-Air Computation?

In the previous section, we saw that communication is the bottleneck for distributed learning when the total budget $B$ is small. But here is a remarkable idea: the wireless multiple access channel already computes a sum. When $K$ users transmit simultaneously, the receiver observes $Y = \sum_{k=1}^K X_k + Z$ . If each user wants to send its local gradient $g_k$ , and the server only needs the average $\bar{g} = \frac{1}{K}\sum_k g_k$ , then the MAC superposition is not interference — it is computation for free.

The point is that over-the-air computation (AirComp) exploits the physics of the wireless channel to reduce the communication cost from $O(K)$ (separate transmissions) to $O(1)$ (simultaneous transmission). This is one of those beautiful cases where the "bug" of wireless communication (interference) becomes a feature.

Definition:
Over-the-Air Computation (AirComp) Model

Consider $K$ users, each with a local value $s_k \in \mathbb{R}$ . The users simultaneously transmit over a Gaussian MAC. User $k$ transmits $X_k = \alpha_k s_k$ where $\alpha_k$ is a power control coefficient. The receiver observes: $Y = \sum_{k=1}^K h_k X_k + Z = \sum_{k=1}^K h_k \alpha_k s_k + Z$ where $h_k$ is the channel coefficient from user $k$ and $Z \sim \mathcal{N}(0, \sigma^2)$ . If each user sets $\alpha_k = \eta / h_k$ (channel inversion), the receiver gets: $Y = \eta \sum_{k=1}^K s_k + Z$ which is a noisy version of the desired sum $\sum_k s_k$ .

Channel inversion requires CSI at the transmitter and wastes power when channels are in deep fade. The power constraint on user $k$ limits $\alpha_k^2 \mathbb{E}[s_k^2] \leq P_k$ , so users with weak channels ( $|h_k|$ small) may not be able to participate.

Theorem: MSE of Over-the-Air Aggregation

Under channel inversion with $\alpha_k = \eta/h_k$ and individual power constraint $P_k = P$ for all $k$ , the optimal scaling factor and resulting MSE for estimating $\bar{s} = \frac{1}{K}\sum_k s_k$ are: $\eta^* = \frac{P}{K \sigma_s^2} \min_k |h_k|, \qquad \text{MSE} = \frac{\sigma^2}{K^2 (\eta^*)^2} = \frac{\sigma^2 \sigma_s^2}{P \cdot (\min_k |h_k|)^2}$ where $\sigma_s^2 = \mathbb{E}[s_k^2]$ .

The MSE is determined by the weakest user (the one with the smallest $|h_k|$ ), because channel inversion forces all users to match the weakest link. This is the price of simultaneous transmission: we gain a factor of $K$ in communication efficiency but lose to the worst channel. In fading environments, the MSE scales as $1/(\min_k |h_k|)^2$ , which can be severe.

Proof

Channel inversion power constraint

User $k$ transmits $X_k = (\eta/h_k) s_k$ . The power constraint requires: $\mathbb{E}[X_k^2] = \frac{\eta^2}{|h_k|^2} \sigma_s^2 \leq P$ This gives $\eta \leq \frac{\sqrt{P} |h_k|}{\sigma_s}$ for all $k$ . The tightest constraint is from the weakest user: $\eta^* = \frac{\sqrt{P} \min_k |h_k|}{\sigma_s}$ .

Compute the MSE

The receiver estimates $\hat{\bar{s}} = Y/(K\eta^*)$ . The estimation error is: $\hat{\bar{s}} - \bar{s} = \frac{Z}{K\eta^*}$ so $\text{MSE} = \frac{\sigma^2}{K^2 (\eta^*)^2} = \frac{\sigma^2 \sigma_s^2}{K^2 P (\min_k |h_k|)^2 / \sigma_s^2} \cdot \frac{\sigma_s^2}{1}$ . Simplifying: $\text{MSE} = \frac{\sigma^2 \sigma_s^2}{P (\min_k |h_k|)^2}$ .

Definition:
Computation Capacity

For a $K$ -user MAC $Y = \sum_{k=1}^K X_k + Z$ with individual power constraint $P$ and $Z \sim \mathcal{N}(0, \sigma^2)$ , the computation capacity for the function $f(s_1, \ldots, s_K) = \sum_k s_k$ is defined as the maximum rate (in function values per channel use) at which the receiver can reliably compute $f$ : $C_{\text{comp}} = \sup \{R : \exists \text{ scheme with } P_e \to 0 \text{ at rate } R\}$ For the sum function over the Gaussian MAC, $C_{\text{comp}} = \frac{1}{2}\log\!\left(1 + \frac{KP}{\sigma^2}\right)$ .

Notice that the computation capacity equals the sum-rate capacity of the Gaussian MAC (with all users cooperating). This is because computing a sum is "aligned" with the channel's natural operation. For other functions (e.g., maximum, XOR), the computation capacity can be strictly less than the sum-rate capacity.

Theorem: Computation Rate for Nomographic Functions

A function $f(s_1, \ldots, s_K)$ is nomographic if it can be written as $f(s_1, \ldots, s_K) = \psi\!\left(\sum_{k=1}^K \phi_k(s_k)\right)$ for some pre-processing functions $\phi_k$ and post-processing function $\psi$ . For nomographic functions over a Gaussian MAC, the computation capacity is: $C_{\text{comp}} = \frac{1}{2}\log\!\left(1 + \frac{KP}{\sigma^2}\right)$ provided each user transmits $X_k = \phi_k(s_k)$ and the receiver applies $\psi$ to the noisy sum.

Nomographic functions are precisely those that "match" the MAC structure. The MAC computes sums, and if the desired function can be decomposed as a sum after pre-processing, we get the computation for free. This includes weighted sums (federated averaging), geometric means (via logarithm), and polynomial functions of degree one.

Proof

Achievability

User $k$ transmits $X_k = \phi_k(s_k)$ with power $\mathbb{E}[\phi_k(s_k)^2] \leq P$ . The receiver observes $Y = \sum_k \phi_k(s_k) + Z$ , which is the MAC output. Applying $\psi$ (a deterministic function) to a noisy version of $\sum_k \phi_k(s_k)$ yields the function value with distortion determined by the MAC capacity.

Converse

The function value $f$ is a deterministic function of $(s_1, \ldots, s_K)$ , so $H(f | s_1, \ldots, s_K) = 0$ . By Fano's inequality, reliable computation requires $I(Y^n; f^n) \geq nR - \epsilon_n$ , and the MAC capacity provides the upper bound $I(Y^n; f^n) \leq n \cdot \frac{1}{2}\log(1 + KP/\sigma^2)$ .

Example: AirComp for Federated Averaging

Consider $K = 100$ users performing federated SGD with $d = 1000$ -dimensional gradients. The uplink is a Gaussian MAC with $\text{SNR} = 10$ dB per user. Compare the communication latency of (a) orthogonal TDMA (each user gets a dedicated slot) and (b) AirComp (all users transmit simultaneously).

Solution

TDMA baseline

Each user transmits its $d$ -dimensional gradient using $d$ channel uses at rate $R_{k} = \frac{1}{2}\log(1 + \text{SNR}) = \frac{1}{2}\log(11) \approx 1.73$ bits/use. To transmit $32d$ bits (32 bits per float), each user needs $32d / 1.73 \approx 18,500$ channel uses. Total: $K \times 18{,}500 = 1{,}850{,}000$ channel uses per round.

AirComp

All $K$ users transmit simultaneously. Each channel use computes one coordinate of the sum. The effective SNR for the sum is $K \cdot \text{SNR} = 1000$ (30 dB), so the computation rate is $\frac{1}{2}\log(1 + 1000) \approx 5.0$ bits per channel use. To compute $d = 1000$ coordinates with sufficient precision, we need approximately $d \cdot 32/5.0 = 6{,}400$ channel uses per round.

Speedup

The speedup is $1{,}850{,}000 / 6{,}400 \approx 289\times$ . This is roughly the number of users $K = 100$ , reflecting the fact that AirComp avoids the $K$ -fold overhead of orthogonal access. The point is that AirComp converts the MAC from a communication bottleneck into a computation accelerator.

AirComp MSE vs. Number of Users

Compare the MSE of over-the-air computation vs. orthogonal TDMA for federated averaging as a function of the number of users, SNR, and channel fading model.

Parameters

Max number of users100

Per-user SNR (dB)10

Fading model

🎓CommIT Contribution(2020)

Over-the-Air Computation for Federated Learning

G. Caire — IEEE Communications Magazine

The CommIT group has contributed to the information-theoretic foundations of over-the-air computation for federated learning, analyzing the computation capacity of the MAC when users need to aggregate gradient updates rather than decode individual messages. This work shows that the natural superposition property of the wireless channel can be exploited to achieve order- $K$ speedup over orthogonal access, fundamentally changing the communication architecture for distributed learning over wireless networks.

over-the-air computationfederated learningmultiple access channelfunction computation

🚨Critical Engineering Note

Synchronization Requirements for AirComp

Over-the-air computation requires tight symbol-level synchronization among all $K$ users. If user $k$ has a timing offset $\tau_k$ , the received signal becomes $Y(t) = \sum_k h_k X_k(t - \tau_k) + Z(t)$ , and the desired sum is corrupted by inter-symbol interference. For AirComp to work in practice:

Timing offsets must be within a fraction of the symbol period (typically $< T_s/10$ )
Phase synchronization is needed for coherent combining (or differential encoding for non-coherent)
The server must broadcast a synchronization beacon, and users must pre-compensate for round-trip delay These requirements are similar to those of uplink MU-MIMO with matched filter reception.

Practical Constraints

•
Symbol-level timing synchronization across all users
•
Phase coherence or differential encoding
•
CSI at the transmitter for channel inversion

Common Mistake: Channel Inversion Amplifies Fading

Mistake:

Using channel inversion $\alpha_k = \eta/h_k$ without accounting for the power penalty when some users experience deep fades.

Correction:

With Rayleigh fading, $|h_k|$ can be arbitrarily small, making $\alpha_k$ and the required power arbitrarily large. In practice, users in deep fade must be excluded from the current round (truncated channel inversion) or assigned zero weight. This introduces bias in the gradient estimate, which must be corrected. Alternative approaches include MMSE-based aggregation that trades off bias and variance.

Why This Matters: AirComp and Massive MIMO

With a multi-antenna base station ( $M$ antennas), AirComp can be enhanced using spatial multiplexing. The server can simultaneously compute $M$ independent function values (e.g., $M$ coordinates of the gradient sum) per channel use, providing an additional $M$ -fold speedup. See Book MIMO for the capacity analysis of the multi-antenna MAC.

Quick Check

In over-the-air computation with channel inversion, what determines the MSE floor?

The average channel gain across all users

The weakest user's channel gain $\min_k |h_k|$

The number of users $K$

The noise variance $\sigma^2$ only

Correction:

The weakest user's channel gain

\min_k |h_k|

The MSE scales as $1/(\min_k |h_k|)^2$ because the power is limited by the user with the smallest channel gain.

Over-the-Air Computation (AirComp)

A communication scheme that exploits the superposition property of the wireless MAC to compute functions (typically sums) of distributed data, avoiding the need for separate user transmissions.

Nomographic Function

A function $f(s_1, \\ldots, s_K) = \\psi(\\sum_k \\phi_k(s_k))$ that decomposes into pre-processing, summation, and post-processing, making it naturally computable over a MAC.

Over-the-Air Computation: Interference as a Feature

Animates

K

users transmitting simultaneously over a MAC, showing how the natural superposition

Y = \sum_k h_k \alpha_k g_k + Z

computes the desired sum for free. Channel inversion ensures coherent combining.

Key Takeaway

Over-the-air computation transforms the wireless MAC from a communication bottleneck into a computation resource. For nomographic functions like gradient averaging, AirComp achieves order- $K$ speedup over orthogonal access. The practical challenges — synchronization, fading, and power control — are significant but tractable with existing MIMO techniques. This paradigm shift from "communicate then compute" to "compute while communicating" is central to the design of next-generation federated learning systems.

Over-the-Air Computation