Ferkans — Interactive Telecom Tutor

From Cloud FL to Wireless FL

Chapter 9 developed federated learning on a stylized communication model: each round, every selected user uploads a gradient digitally to a central server via a dedicated orthogonal channel. This abstraction masks the wireless reality. Real deployments face: heterogeneous and time-varying channels, limited uplink bandwidth, energy-constrained devices, active channel contention with other traffic, and stringent delay budgets.

Wireless federated learning (wireless FL) is FL over a physical wireless channel. The design choices multiply: which users upload each round (scheduling), how much transmit power to spend (resource allocation), whether to aggregate digitally (per Chapter 10) or analogously (AirComp, Chapter 16). Each affects the FL convergence rate — the number of rounds to reach target accuracy — which is the real optimization objective.

The point is that wireless-FL joint design is a three-axis optimization: convergence (rounds), per-round MSE (aggregation fidelity), and per-round cost (channel uses, energy). This chapter develops the coupling, derives the convergence rate under bounded aggregation MSE, and closes with the CommIT contribution on information-theoretically secure federated representation learning.

,

Wireless Federated Learning (Generic Protocol)

Inputs:

n users, initial global model θ_0, learning rate η_lr,

round count T, scheduling rule Sched(·),

aggregator ∈ {Digital, AirComp}.

For t = 0, 1, …, T - 1:

1. Server broadcasts θ_t to all users.

2. Each user k computes g_k^(t) = ∇ℓ_k(θ_t) on its local data.

3. Server selects scheduled set S_t ← Sched(channel state, history).

4. Users in S_t upload g_k^(t) via the chosen aggregator:

– Digital: each user transmits quantized g_k^(t) on its

dedicated channel; server sums exactly.

– AirComp: users in S_t transmit synchronously;

server observes MSE-perturbed sum.

5. Server forms estimate Ĝ_t = ∑_{k ∈ S_t} g_k^(t) + noise.

6. Update: θ_{t+1} = θ_t - η_lr · Ĝ_t / |S_t|.

Output: final model θ_T.

,

Definition:
The Wireless-FL Problem

The wireless-FL problem is to learn a global parameter $\boldsymbol{\theta}^{\star}$ minimizing $F(\boldsymbol{\theta}) \;=\; \frac{1}{n}\sum_{k=1}^{n} F_k(\boldsymbol{\theta}), \qquad F_k(\boldsymbol{\theta}) = \mathbb{E}_{\xi \sim \mathcal{D}_k}[\ell(\boldsymbol{\theta}, \xi)],$ using gradient-based updates, under the following constraints:

Per-round aggregation noise. The server's estimate $\hat{\mathbf{G}}_t$ of the aggregate gradient $\mathbf{G}_t = \sum_{k \in \mathcal{S}_t} \mathbf{g}_k^{(t)}$ has mean-squared error $\ntn{mseagg}(t)$ .
Scheduling. A subset $\mathcal{S}_t \subseteq [n]$ participates in round $t$ ; the rest are excluded.
Energy/bandwidth. Per-round bandwidth $B$ and per-user energy $E_k$ budgets must be respected.
Privacy (optional). If the aggregator is AirComp (Chapter 16), the MAC superposition is the privacy mechanism. If digital, cryptographic masking (Chapter 10) is required for information-theoretic privacy.

The joint design variable is the tuple $(\mathcal{S}_t, P_k^{(t)}, b_k^{(t)})$ for each round, subject to the convergence objective.

Digital vs. Analog Aggregation for Wireless FL

Aspect	Digital (Ch. 10-style)	Analog (AirComp, Ch. 16)
Channel uses per round	$O(n)$ orthogonal slots	$O(1)$ analog superposition
Aggregate MSE	Quantization + channel noise per user	$\ntn{mseagg} = \sigma^2/\min_k \gamma_k$
Native privacy	None — cryptographic layer needed	Weak-asymptotic IT privacy
Scalability in $n$	Poor — bandwidth scales linearly	Excellent — bandwidth constant
Sync requirement	Symbol-level per user	Symbol + carrier-phase across users
CSIT	Not required	Required for pre-equalization
Integrity (Byzantine)	Can detect (gradient checksums)	No integrity — any user can spoof sum

When to Choose Which Aggregator

A pragmatic rule of thumb, derived from the comparison table:

Few users ( $n \leq 10$ ), high per-user bandwidth: digital. The orthogonal-slot overhead is small; quantization MSE is tight; Byzantine tolerance is easy to layer.
Many users ( $n \geq 50$ ), cellular / WiFi bandwidth limits: AirComp. The $O(n)$ -to- $O(1)$ saving dominates. Pair with privacy dither for DP.
Mission-critical / adversarial settings: digital with ByzSecAgg (Chapter 11). AirComp's lack of integrity makes it unsuitable for high-assurance applications.
Privacy-focused, many-user FL: AirComp with aggregated Gaussian dither for differential privacy. The $\sqrt{n}$ -amplification (Theorem 16.4.2) is especially valuable.

The hybrid approach — AirComp for speed with digital ACK/checksum rounds for integrity — is emerging as a robust compromise. Chapter 18 discusses open problems in this hybrid design.

,

Example: Digital vs. Analog at $n = 50$

A wireless-FL deployment has $n = 50$ users, per-user gradient dimension $d = 10^5$ , per-round bandwidth $B = 1$ MHz, symbol rate $1$ Msymbol/s, $b = 8$ bits/symbol for digital, and an AirComp-suitable MAC with zero-forcing MSE $\ntn{mseagg} = 0.01$ (relative to gradient norm). The target accuracy requires $T = 100$ rounds. Compare the total FL time for digital vs. analog aggregation.

Solution

Digital channel uses

Per round: $50 \cdot d / b = 50 \cdot 10^5 / 8 = 6.25 \cdot 10^5$ symbols $\times 50$ users $= 6.25 \cdot 10^5$ symbols at the MAC. Wait: per user, $d/b \cdot b = d/b$ symbols. Orthogonal slots: $n \cdot d/b = 50 \cdot 10^5 / 8 = 6.25 \cdot 10^5$ symbols total.

Digital time per round

$6.25 \cdot 10^5$ symbols at $10^6$ /s $= 0.625$ s.

Digital total

$100 \times 0.625 = 62.5$ s.

Analog channel uses

Per round: $d = 10^5$ symbols (one AirComp round per gradient coordinate, multiplexed in time).

Analog time per round

$10^5$ symbols at $10^6$ /s $= 0.1$ s.

Analog total

$100 \times 0.1 = 10$ s — $6.25\times$ faster than digital.

Operational interpretation

Analog's bandwidth savings are dramatic at $n = 50$ . The factor is the orthogonal-slot multiplier $n$ ; for $n = 500$ , analog would be $\sim 60\times$ faster. The trade-off (per Chapter 16): analog has an MSE floor of $0.01$ per round vs. digital's much smaller quantization noise. The downstream effect on FL convergence — whether the extra MSE costs more rounds — is Section 17.2's question.

Total FL Time: Digital vs. Analog Aggregation

Explore the break-even between digital and analog aggregation as a function of the number of users $n$ , gradient dimension $d$ , and per-round symbol rate. The plot shows total FL time (sec) for both options, assuming a fixed target accuracy requiring $T$ rounds. Analog dominates when orthogonal-slot overhead outweighs the gain from lower per-round MSE.

Parameters

n

— users50

\log_{10} d

— gradient dim.5

T

— FL rounds100

Key Takeaway

Wireless FL is a three-axis optimization: convergence rate (rounds), per-round aggregation MSE, per-round resource cost (channel uses, energy). Digital aggregation scales as $O(n)$ bandwidth with tight MSE; AirComp scales as $O(1)$ bandwidth with a noise-limited MSE floor. The right choice depends on $n$ , integrity requirements, and privacy model. Section 17.2 quantifies how per-round MSE translates into convergence-rate degradation.

Quick Check

In the wireless-FL protocol with AirComp aggregation, the server's model update uses which of the following at round $t$ ?

The exact aggregate $\mathbf{G}_t = \sum_k \mathbf{g}_k^{(t)}$ .

An MSE-perturbed estimate $\hat{\mathbf{G}}_t = \mathbf{G}_t + \mathbf{e}_t$ with $\ntn{mseagg}(t)$ .

Only user $1$ 's gradient (dropping others for privacy).

A cryptographically-masked aggregate (Chapter 10 only).

Correction:

An MSE-perturbed estimate

\hat{\mathbf{G}}_t = \mathbf{G}_t + \mathbf{e}_t

with

\ntn{mseagg}(t)

.

The receiver observes the noisy superposition; the update uses this estimate.

The Wireless FL Pipeline