The Wireless FL Pipeline
From Cloud FL to Wireless FL
Chapter 9 developed federated learning on a stylized communication model: each round, every selected user uploads a gradient digitally to a central server via a dedicated orthogonal channel. This abstraction masks the wireless reality. Real deployments face: heterogeneous and time-varying channels, limited uplink bandwidth, energy-constrained devices, active channel contention with other traffic, and stringent delay budgets.
Wireless federated learning (wireless FL) is FL over a physical wireless channel. The design choices multiply: which users upload each round (scheduling), how much transmit power to spend (resource allocation), whether to aggregate digitally (per Chapter 10) or analogously (AirComp, Chapter 16). Each affects the FL convergence rate — the number of rounds to reach target accuracy — which is the real optimization objective.
The point is that wireless-FL joint design is a three-axis optimization: convergence (rounds), per-round MSE (aggregation fidelity), and per-round cost (channel uses, energy). This chapter develops the coupling, derives the convergence rate under bounded aggregation MSE, and closes with the CommIT contribution on information-theoretically secure federated representation learning.
Wireless Federated Learning (Generic Protocol)
Definition: The Wireless-FL Problem
The Wireless-FL Problem
The wireless-FL problem is to learn a global parameter minimizing using gradient-based updates, under the following constraints:
-
Per-round aggregation noise. The server's estimate of the aggregate gradient has mean-squared error .
-
Scheduling. A subset participates in round ; the rest are excluded.
-
Energy/bandwidth. Per-round bandwidth and per-user energy budgets must be respected.
-
Privacy (optional). If the aggregator is AirComp (Chapter 16), the MAC superposition is the privacy mechanism. If digital, cryptographic masking (Chapter 10) is required for information-theoretic privacy.
The joint design variable is the tuple for each round, subject to the convergence objective.
Digital vs. Analog Aggregation for Wireless FL
| Aspect | Digital (Ch. 10-style) | Analog (AirComp, Ch. 16) |
|---|---|---|
| Channel uses per round | orthogonal slots | analog superposition |
| Aggregate MSE | Quantization + channel noise per user | |
| Native privacy | None — cryptographic layer needed | Weak-asymptotic IT privacy |
| Scalability in | Poor — bandwidth scales linearly | Excellent — bandwidth constant |
| Sync requirement | Symbol-level per user | Symbol + carrier-phase across users |
| CSIT | Not required | Required for pre-equalization |
| Integrity (Byzantine) | Can detect (gradient checksums) | No integrity — any user can spoof sum |
When to Choose Which Aggregator
A pragmatic rule of thumb, derived from the comparison table:
-
Few users (), high per-user bandwidth: digital. The orthogonal-slot overhead is small; quantization MSE is tight; Byzantine tolerance is easy to layer.
-
Many users (), cellular / WiFi bandwidth limits: AirComp. The -to- saving dominates. Pair with privacy dither for DP.
-
Mission-critical / adversarial settings: digital with ByzSecAgg (Chapter 11). AirComp's lack of integrity makes it unsuitable for high-assurance applications.
-
Privacy-focused, many-user FL: AirComp with aggregated Gaussian dither for differential privacy. The -amplification (Theorem 16.4.2) is especially valuable.
The hybrid approach — AirComp for speed with digital ACK/checksum rounds for integrity — is emerging as a robust compromise. Chapter 18 discusses open problems in this hybrid design.
Example: Digital vs. Analog at
A wireless-FL deployment has users, per-user gradient dimension , per-round bandwidth MHz, symbol rate Msymbol/s, bits/symbol for digital, and an AirComp-suitable MAC with zero-forcing MSE (relative to gradient norm). The target accuracy requires rounds. Compare the total FL time for digital vs. analog aggregation.
Digital channel uses
Per round: symbols users symbols at the MAC. Wait: per user, symbols. Orthogonal slots: symbols total.
Digital time per round
symbols at /s s.
Digital total
s.
Analog channel uses
Per round: symbols (one AirComp round per gradient coordinate, multiplexed in time).
Analog time per round
symbols at /s s.
Analog total
s — faster than digital.
Operational interpretation
Analog's bandwidth savings are dramatic at . The factor is the orthogonal-slot multiplier ; for , analog would be faster. The trade-off (per Chapter 16): analog has an MSE floor of per round vs. digital's much smaller quantization noise. The downstream effect on FL convergence — whether the extra MSE costs more rounds — is Section 17.2's question.
Total FL Time: Digital vs. Analog Aggregation
Explore the break-even between digital and analog aggregation as a function of the number of users , gradient dimension , and per-round symbol rate. The plot shows total FL time (sec) for both options, assuming a fixed target accuracy requiring rounds. Analog dominates when orthogonal-slot overhead outweighs the gain from lower per-round MSE.
Parameters
Key Takeaway
Wireless FL is a three-axis optimization: convergence rate (rounds), per-round aggregation MSE, per-round resource cost (channel uses, energy). Digital aggregation scales as bandwidth with tight MSE; AirComp scales as bandwidth with a noise-limited MSE floor. The right choice depends on , integrity requirements, and privacy model. Section 17.2 quantifies how per-round MSE translates into convergence-rate degradation.
Quick Check
In the wireless-FL protocol with AirComp aggregation, the server's model update uses which of the following at round ?
The exact aggregate .
An MSE-perturbed estimate with .
Only user 's gradient (dropping others for privacy).
A cryptographically-masked aggregate (Chapter 10 only).
The receiver observes the noisy superposition; the update uses this estimate.