The Wireless FL Pipeline

From Cloud FL to Wireless FL

Chapter 9 developed federated learning on a stylized communication model: each round, every selected user uploads a gradient digitally to a central server via a dedicated orthogonal channel. This abstraction masks the wireless reality. Real deployments face: heterogeneous and time-varying channels, limited uplink bandwidth, energy-constrained devices, active channel contention with other traffic, and stringent delay budgets.

Wireless federated learning (wireless FL) is FL over a physical wireless channel. The design choices multiply: which users upload each round (scheduling), how much transmit power to spend (resource allocation), whether to aggregate digitally (per Chapter 10) or analogously (AirComp, Chapter 16). Each affects the FL convergence rate — the number of rounds to reach target accuracy — which is the real optimization objective.

The point is that wireless-FL joint design is a three-axis optimization: convergence (rounds), per-round MSE (aggregation fidelity), and per-round cost (channel uses, energy). This chapter develops the coupling, derives the convergence rate under bounded aggregation MSE, and closes with the CommIT contribution on information-theoretically secure federated representation learning.

,

Wireless Federated Learning (Generic Protocol)

Inputs:
n users, initial global model θ_0, learning rate η_lr,
round count T, scheduling rule Sched(·),
aggregator ∈ {Digital, AirComp}.
For t = 0, 1, …, T - 1:
1. Server broadcasts θ_t to all users.
2. Each user k computes g_k^(t) = ∇ℓ_k(θ_t) on its local data.
3. Server selects scheduled set S_t ← Sched(channel state, history).
4. Users in S_t upload g_k^(t) via the chosen aggregator:
– Digital: each user transmits quantized g_k^(t) on its
dedicated channel; server sums exactly.
– AirComp: users in S_t transmit synchronously;
server observes MSE-perturbed sum.
5. Server forms estimate Ĝ_t = ∑_{k ∈ S_t} g_k^(t) + noise.
6. Update: θ_{t+1} = θ_t - η_lr · Ĝ_t / |S_t|.
Output: final model θ_T.
,

Definition:

The Wireless-FL Problem

The wireless-FL problem is to learn a global parameter θ\boldsymbol{\theta}^{\star} minimizing F(θ)  =  1nk=1nFk(θ),Fk(θ)=EξDk[(θ,ξ)],F(\boldsymbol{\theta}) \;=\; \frac{1}{n}\sum_{k=1}^{n} F_k(\boldsymbol{\theta}), \qquad F_k(\boldsymbol{\theta}) = \mathbb{E}_{\xi \sim \mathcal{D}_k}[\ell(\boldsymbol{\theta}, \xi)], using gradient-based updates, under the following constraints:

  • Per-round aggregation noise. The server's estimate G^t\hat{\mathbf{G}}_t of the aggregate gradient Gt=kStgk(t)\mathbf{G}_t = \sum_{k \in \mathcal{S}_t} \mathbf{g}_k^{(t)} has mean-squared error \ntnmseagg(t)\ntn{mseagg}(t).

  • Scheduling. A subset St[n]\mathcal{S}_t \subseteq [n] participates in round tt; the rest are excluded.

  • Energy/bandwidth. Per-round bandwidth BB and per-user energy EkE_k budgets must be respected.

  • Privacy (optional). If the aggregator is AirComp (Chapter 16), the MAC superposition is the privacy mechanism. If digital, cryptographic masking (Chapter 10) is required for information-theoretic privacy.

The joint design variable is the tuple (St,Pk(t),bk(t))(\mathcal{S}_t, P_k^{(t)}, b_k^{(t)}) for each round, subject to the convergence objective.

Digital vs. Analog Aggregation for Wireless FL

AspectDigital (Ch. 10-style)Analog (AirComp, Ch. 16)
Channel uses per roundO(n)O(n) orthogonal slotsO(1)O(1) analog superposition
Aggregate MSEQuantization + channel noise per user\ntnmseagg=σ2/minkγk\ntn{mseagg} = \sigma^2/\min_k \gamma_k
Native privacyNone — cryptographic layer neededWeak-asymptotic IT privacy
Scalability in nnPoor — bandwidth scales linearlyExcellent — bandwidth constant
Sync requirementSymbol-level per userSymbol + carrier-phase across users
CSITNot requiredRequired for pre-equalization
Integrity (Byzantine)Can detect (gradient checksums)No integrity — any user can spoof sum

When to Choose Which Aggregator

A pragmatic rule of thumb, derived from the comparison table:

  • Few users (n10n \leq 10), high per-user bandwidth: digital. The orthogonal-slot overhead is small; quantization MSE is tight; Byzantine tolerance is easy to layer.

  • Many users (n50n \geq 50), cellular / WiFi bandwidth limits: AirComp. The O(n)O(n)-to-O(1)O(1) saving dominates. Pair with privacy dither for DP.

  • Mission-critical / adversarial settings: digital with ByzSecAgg (Chapter 11). AirComp's lack of integrity makes it unsuitable for high-assurance applications.

  • Privacy-focused, many-user FL: AirComp with aggregated Gaussian dither for differential privacy. The n\sqrt{n}-amplification (Theorem 16.4.2) is especially valuable.

The hybrid approach — AirComp for speed with digital ACK/checksum rounds for integrity — is emerging as a robust compromise. Chapter 18 discusses open problems in this hybrid design.

,

Example: Digital vs. Analog at n=50n = 50

A wireless-FL deployment has n=50n = 50 users, per-user gradient dimension d=105d = 10^5, per-round bandwidth B=1B = 1 MHz, symbol rate 11 Msymbol/s, b=8b = 8 bits/symbol for digital, and an AirComp-suitable MAC with zero-forcing MSE \ntnmseagg=0.01\ntn{mseagg} = 0.01 (relative to gradient norm). The target accuracy requires T=100T = 100 rounds. Compare the total FL time for digital vs. analog aggregation.

Total FL Time: Digital vs. Analog Aggregation

Explore the break-even between digital and analog aggregation as a function of the number of users nn, gradient dimension dd, and per-round symbol rate. The plot shows total FL time (sec) for both options, assuming a fixed target accuracy requiring TT rounds. Analog dominates when orthogonal-slot overhead outweighs the gain from lower per-round MSE.

Parameters
50
5
100

Key Takeaway

Wireless FL is a three-axis optimization: convergence rate (rounds), per-round aggregation MSE, per-round resource cost (channel uses, energy). Digital aggregation scales as O(n)O(n) bandwidth with tight MSE; AirComp scales as O(1)O(1) bandwidth with a noise-limited MSE floor. The right choice depends on nn, integrity requirements, and privacy model. Section 17.2 quantifies how per-round MSE translates into convergence-rate degradation.

Quick Check

In the wireless-FL protocol with AirComp aggregation, the server's model update uses which of the following at round tt?

The exact aggregate Gt=kgk(t)\mathbf{G}_t = \sum_k \mathbf{g}_k^{(t)}.

An MSE-perturbed estimate G^t=Gt+et\hat{\mathbf{G}}_t = \mathbf{G}_t + \mathbf{e}_t with \ntnmseagg(t)\ntn{mseagg}(t).

Only user 11's gradient (dropping others for privacy).

A cryptographically-masked aggregate (Chapter 10 only).