Ferkans — Interactive Telecom Tutor

ex17-1

Easy

Given $n = 100$ users, gradient dimension $d = 10^5$ , symbol rate $10^6$ /s, and digital uses $b = 8$ bits/symbol. Compute the per-round channel-use count for each aggregator.

Show Hint

Digital: $n \cdot d/b$ orthogonal symbols.

AirComp: $d$ symbols (one per coordinate).

Solution

Digital

$n \cdot d/b = 100 \cdot 10^5 / 8 = 1.25 \times 10^6$ symbols.

AirComp

$d = 10^5$ symbols.

Ratio

Digital is $12.5\times$ longer.

ex17-2

Easy

Given $\sigma_g^2 = 1, n = 50, \ntn{mseagg} = 0.5, \eta_{\text{lr}} = 0.1, \mu = 1$ , compute the predicted convergence floor per Theorem 17.2.1.

Show Hint

Floor $= \eta_{\text{lr}}(\sigma_g^2/n + \ntn{mseagg}/n^2)/(2\mu)$ .

Solution

Variance

$V = 1/50 + 0.5/2500 = 0.02 + 0.0002 = 0.0202$ .

Floor

$0.1 \cdot 0.0202 / 2 = 0.00101$ .

ex17-3

Easy

For $n = 100$ users, $\sigma_g^2 = 2$ , what is the matched aggregation MSE target (where the two variance terms equal)?

Show Hint

Equality: $\ntn{mseagg}/n^2 = \sigma_g^2/n$ .

Solution

Setup

$\ntn{mseagg}/n^2 = \sigma_g^2/n$ $\Rightarrow \ntn{mseagg} = n \sigma_g^2$ .

Target

$\ntn{mseagg}^{\text{matched}} = 100 \cdot 2 = 200$ .

Operational

AirComp with MSE $\ll 200$ is over-engineered; MSE $\gg 200$ is under-engineered. Design around 200.

ex17-4

Easy

Users have Rayleigh-fading channels $|h_k|^2 \sim \text{Exp}(1)$ , per-user budget $P = 1$ , source variance $\sigma_s^2 = 1$ . For MSE target $\mathsf{MSE}^{\text{tol}} = 0.02$ and noise $\sigma^2 = 0.01$ , compute the threshold $\tau$ and the expected fraction of users included in $\mathcal{S}_t$ .

Show Hint

$\tau = \sigma^2/\mathsf{MSE}^{\text{tol}}$ .

$P(|h|^2 \geq \tau) = e^{-\tau}$ for exponential.

Solution

Threshold

$\tau = 0.01/0.02 = 0.5$ .

Fraction

$P(|h|^2 \geq 0.5) = e^{-0.5} \approx 0.607$ .

Operational

About $61\%$ of users included per round. The remaining $39\%$ are excluded; mitigate via $\alpha$ - fairness.

ex17-5

Medium

For Rayleigh channels, if each user participates in at least $\alpha = 0.7$ of their proportional-share rounds, estimate the average MSE relative to unconstrained. Use the exponential channel's $\alpha$ -percentile.

Show Hint

Percentile: $\gamma_{(\alpha)} = -\ln(1 - \alpha)$ .

Factor $\approx 1/(1 - \alpha \kappa)$ where $\kappa$ depends on channel.

Solution

Percentile

$-\ln(0.3) \approx 1.20$ .

Compute $\kappa$

Under unconstrained: $\mathbb{E}[\max \gamma] = O(\log n)$ in large $n$ . For $n = 20$ , this is around $\sim 3$ . $\kappa \approx 3/1.2 - 1 = 1.5$ .

Factor

$\eta_{0.7} = 1/(1 - 0.7 \cdot 0.5) = 1/0.65 \approx 1.5$ . (Different derivation — calibrate empirically.)

Operational

About $50\%$ more convergence rounds to reach the same aggregate floor. In exchange: unbiased participation.

ex17-6

Medium

User $k$ has energy budget $E_k = 10$ and participates in 5 rounds with $\gamma_k^{(t)} = \{2, 0.5, 1.5, 0.3, 1.0\}$ . Compute the water-filling power allocation.

Show Hint

$P_k^{(t)} = [1/\lambda - 1/\gamma_k^{(t)}]^+$ .

Solve $\sum_t P_k^{(t)} = 10$ for $\lambda$ .

Solution

Order by $\gamma$

Sort descending: $\{2, 1.5, 1.0, 0.5, 0.3\}$ .

Try all 5 active

$\sum_t (1/\lambda - 1/\gamma_k^{(t)}) = 5/\lambda - (0.5 + 0.67 + 1 + 2 + 3.33) = 5/\lambda - 7.5 = 10$ . $\Rightarrow 1/\lambda = 3.5$ .

Check feasibility

All powers positive: $P^{(t)} = 3.5 - 1/\gamma_k^{(t)} = \{3.0, 2.83, 2.5, 1.5, 0.17\}$ . All positive ✓.

Verify sum

$\sum = 3.0 + 2.83 + 2.5 + 1.5 + 0.17 = 10$ . ✓

Operational

Strong rounds (high $\gamma$ ) receive most power; the weak round (0.3) receives only $0.17$ — nearly idle. Water-filling naturally emphasizes good channel realizations.

ex17-7

Medium

For an FL task with target loss gap $\varepsilon = 0.01$ , $\sigma_g^2/n = 0.02$ , $\ntn{mseagg}/n^2 = 0.001$ , $\eta_{\text{lr}} = 0.1, \mu = 1, L = 10$ , initial gap $10$ , compute the required round count $T$ .

Show Hint

Floor: $\eta_{\text{lr}}(V)/(2\mu)$ .

$T \geq \log(\varepsilon/10)/\log(1 - \eta_{\text{lr}}\mu)$ .

Solution

Variance

$V = 0.02 + 0.001 = 0.021$ .

Floor

$0.1 \cdot 0.021 / 2 = 0.00105$ . Below target.

Round count

$(0.9)^T \leq 0.01/10 = 0.001$ $\Rightarrow T \log(0.9) \leq \log(0.001)$ $\Rightarrow T \geq \log(0.001)/\log(0.9) \approx 65.4$ . So $T \geq 66$ rounds.

ex17-8

Medium

In the CommIT IT-secure scheme (Theorem 17.4.1), derive the mask variance $\sigma_m^2$ needed so that the per-user MI leak is at most $\varepsilon$ nats. Use $n = 100, d = 64, \sigma_z^2 = 1, |\eta|^2 = 1, \sigma^2 = 0.01, \varepsilon = 0.01$ .

Show Hint

$I \leq \tfrac{d}{2} \log(1 + 1/((n-1)\sigma_m^2/\sigma_z^2))$ .

Solve for $\sigma_m^2$ .

Solution

Bound

Require $\tfrac{64}{2}\log(1 + 1/(99 \sigma_m^2)) \leq 0.01$ .

Solve

$32 \log(1 + 1/(99\sigma_m^2)) \leq 0.01$ $\log(1 + 1/(99\sigma_m^2)) \leq 3.1 \cdot 10^{-4}$ $1 + 1/(99\sigma_m^2) \leq e^{3.1 \cdot 10^{-4}} \approx 1 + 3.1 \cdot 10^{-4}$ $1/(99\sigma_m^2) \leq 3.1 \cdot 10^{-4}$ $\sigma_m^2 \geq 1/(99 \cdot 3.1 \cdot 10^{-4}) \approx 32.6$ .

Operational

$\sigma_m^2 \approx 33$ times $\sigma_z^2 = 1$ . Mask dominates per-user representation variance by an order of magnitude, giving 0.01-nat-per-round privacy.

Note

The masks don't degrade the aggregate MSE (they cancel). The mask variance is a pure privacy knob.

ex17-9

Medium

Compare convergence over $T = 100$ rounds with (a) constant $\eta_{\text{lr}} = 0.1$ and (b) decreasing $\eta_{\text{lr},t} = 1/t$ . Which reaches a smaller final loss gap?

Show Hint

(a): Theorem 17.2.1, reaches a floor.

(b): Theorem 17.2.2, converges but slowly.

Solution

(a) Constant

Exponential decay to floor of $\sim 0.001$ (computed above). At $T = 100$ : both terms of Theorem 17.2.1 matter; need $\sim 66$ rounds to reach floor. At $T = 100$ , loss $\approx 0.001$ .

(b) Decreasing

$O(L V / (\mu^2 T)) = O(10 \cdot 0.021 / 100) = 0.0021$ . Actually slightly larger than (a) at $T = 100$ .

Conclusion

For $T = 100$ , constant $\eta_{\text{lr}}$ is slightly better. For $T = 1000$ , (b) would outperform (a). Decreasing is asymptotically optimal but pays a slower rate.

Operational

Short horizon $T$ — constant $\eta_{\text{lr}}$ . Long horizon $T$ — decreasing. The crossover depends on the target loss gap.

ex17-10

Hard

For Theorem 17.4.2's rate-privacy region, plot the boundary at $n = 50, d = 128, \sigma_z^2 = 1, \sigma^2 = 0.01$ and varying $\sigma_m^2 \in [0.1, 1000]$ . Show that rate and privacy are decoupled at fixed $|\eta|^2$ .

Show Hint

Rate depends on $|\eta|^2 n \sigma_z^2$ , independent of $\sigma_m^2$ .

Privacy depends on $\sigma_m^2/\sigma_z^2$ .

Solution

Rate at $|\eta|^2 = 1$

$R = \tfrac{1}{2}\log(1 + 50/0.01) = \tfrac{1}{2}\log(5001) \approx 6.15$ bits per channel use per scalar. Independent of $\sigma_m^2$ .

Privacy vs. $\sigma_m^2$

$I(z_k; r) \leq \tfrac{d}{2} \log(1 + 1/(49 \sigma_m^2))$ . For $\sigma_m^2 = 0.1$ : $\approx 64 \log(1.2) \approx 64 \cdot 0.19 = 12$ nats. $\sigma_m^2 = 1$ : $\approx 64 \cdot 0.02 = 1.3$ nats. $\sigma_m^2 = 10$ : $\approx 0.13$ nats. $\sigma_m^2 = 100$ : $\approx 0.013$ nats. $\sigma_m^2 = 1000$ : $\approx 0.0013$ nats.

Boundary

The feasible region is a rectangle: rate fixed, privacy monotone in $\sigma_m^2$ .

Decoupling

This is the key engineering benefit of the CommIT scheme: rate and privacy are separate design knobs — a privacy improvement is not purchased with a rate loss. Compare with schemes (e.g., digital + cryptographic SecAgg) where privacy adds communication cost.

ex17-11

Hard

For a non-convex FL task (Theorem 17.2.3), compute the iterations needed to reach $\mathbb{E}[\|\nabla F\|^2] \leq \varepsilon^2$ starting from $F(\boldsymbol{\theta}_0) - F^{\star} = F_0$ . Use $L, V, \eta_{\text{lr}}$ as parameters.

Show Hint

$\min_t \mathbb{E}[\|\nabla F\|^2] \leq 2F_0/(\eta_{\text{lr}} T) + L\eta_{\text{lr}} V$ .

Solution

Setup

Require both terms $\leq \varepsilon^2/2$ . Gives: $\eta_{\text{lr}} \leq \varepsilon^2/(2LV)$ and $T \geq 4F_0/(\eta_{\text{lr}} \varepsilon^2)$ .

Substitute

$T \geq 4F_0 \cdot 2LV/\varepsilon^2 \cdot 1/\varepsilon^2 = 8F_0 L V/\varepsilon^4$ .

Operational

Non-convex FL takes $O(1/\varepsilon^4)$ rounds — much worse than $O(\log(1/\varepsilon))$ for strong convexity. Aggregation MSE enters as $V$ , scaling the round count linearly.

Deep-learning implication

For deep-network FL, thousands of rounds are typical. Aggregation MSE affects round count linearly; halving MSE halves round count.

ex17-12

Hard

Suppose threshold scheduling systematically excludes user 1, whose gradient satisfies $\mathbf{g}_1 = \nabla F(\boldsymbol{\theta}) + \mathbf{b}$ (a bias relative to the true gradient). Derive the convergence bias introduced.

Show Hint

Missing user 1 means server sees $\sum_{k \geq 2} \mathbf{g}_k$ .

Expected update is biased.

Solution

Aggregate bias

$\sum_{k \geq 2} \mathbf{g}_k = (n-1) \nabla F + \sum_{k \geq 2} \mathbf{b}_k - \mathbf{b}_1$ (using $\sum_k \mathbf{b}_k = 0$ w.l.o.g.). The estimate of the gradient is biased by $-\mathbf{b}_1/n$ .

Iterative amplification

Over $T$ rounds, the bias accumulates into a steady-state error of $O(\|\mathbf{b}_1\|/\mu)$ .

Operational

Systematically excluding any user with biased gradient introduces a persistent convergence bias. Even if MSE is controlled, the model is drawn toward an unrepresentative optimum. The fix: $\alpha$ -fairness, even at MSE cost.

Regulatory angle

In applications with mandatory fairness (e.g., demographic parity in healthcare), systematic exclusion violates law. Fairness is not just nice-to-have; it may be a compliance requirement.

ex17-13

Hard

Sketch a hybrid scheme where the majority of the round uses AirComp but occasional digital ACK/checksum sub-rounds provide integrity. Compare its total overhead with pure digital.

Show Hint

Digital checksum every $K$ rounds.

AirComp for $K - 1$ rounds.

Solution

Hybrid protocol

Rounds 1 to $K-1$ : AirComp aggregation. Round $K$ : digital upload, use a cryptographic digest (Merkle root or hash) to verify that previous aggregations were not spoofed.

Overhead calculation

Digital round costs $\Theta(n \cdot d/b)$ symbols; AirComp rounds cost $\Theta(d)$ each. Total over $K$ rounds: $n d/b + (K-1) d$ symbols.

Comparison

Pure digital: $K \cdot n d / b$ symbols. Hybrid: $n d /b + (K-1) d$ symbols. Ratio: $\text{hybrid/digital} = 1/K + (K-1)/(Knb/b) \cdot b/n = 1/K + (K-1)b/(Kn)$ . For $n = 50, b = 8, K = 10$ : $1/10 + 9 \cdot 8/(10 \cdot 50) = 0.1 + 0.144 = 0.244$ — four times more efficient than pure digital.

Trade-off

Hybrid gets AirComp's bandwidth gain plus periodic integrity checks. Detection of spoofing is delayed by at most $K - 1$ rounds. Trade $K$ against detection latency.

Open design problem

Optimal $K$ given attacker power, FL convergence impact, and integrity requirement: open. Chapter 18 revisits.

ex17-14

Hard

Theorem 17.2.1 gives a noise floor. Is the bound tight? Consider the example of FL with i.i.d.\ Gaussian losses and Gaussian aggregation noise — does the bound match simulations?

Show Hint

For exact Gaussian, the SGD recursion is exactly solvable.

Compare analytical bound with simulated expected loss.

Solution

Gaussian setting

$F(\boldsymbol{\theta}) = \tfrac{1}{2}\|\boldsymbol{\theta}\|^2$ ; gradients are Gaussian with mean $\boldsymbol{\theta}$ and covariance $\sigma_g^2 \mathbf{I}$ . Aggregation adds Gaussian noise with covariance $\ntn{mseagg} \mathbf{I}/n^2$ .

Exact recursion

$\boldsymbol{\theta}_{t+1} = (1 - \eta_{\text{lr}})\boldsymbol{\theta}_t - \eta_{\text{lr}} \mathbf{v}_t$ where $\mathbf{v}_t$ is Gaussian with covariance $V = \sigma_g^2/n + \ntn{mseagg}/n^2$ . $\mathbb{E}[\|\boldsymbol{\theta}_t\|^2]$ satisfies a linear recursion converging to $\eta_{\text{lr}} V/(2 - \eta_{\text{lr}})$ = $\eta_{\text{lr}} V/(1 + \mu)$ for $\mu = 1$ .

Comparison with bound

Theorem 17.2.1 bound: $\eta_{\text{lr}} V/(2 \cdot 1) = \eta_{\text{lr}} V/2$ (using $\mu = L = 1$ ). Simulated: $\eta_{\text{lr}} V/2$ . The bound is tight in this regime.

Operational

In regimes close to the Gaussian model, the bound is tight. For non-Gaussian, bounded gradients can be tighter than $\sigma_g^2$ , giving a slightly smaller floor. Practical measurement: simulate a realistic FL task; compare with bound; typically within $30\%$ .

ex17-15

Challenge

The joint wireless-FL problem (Definition 17.3.1) is non-convex. Discuss the decomposition, the known suboptimality gap, and what would be needed to close it.

Show Hint

Per-round Pareto optimality doesn't imply global optimality.

Channel correlations across rounds add complexity.

Solution

The joint structure

Variables: $\{\mathcal{S}_t, P_k^{(t)}\}_{t=0}^{T-1}$ . Objective: $\mathbb{E}[F(\boldsymbol{\theta}_T)]$ . Constraints: per-user energy, per-round power.

Known decomposition

(1) Per-round: threshold + water-filling (Pareto per round). (2) Meta: choose target MSE from convergence.

Suboptimality

Empirically, $< 5\%$ loss vs. optimal in many regimes. The gap comes from failing to coordinate across rounds: if a good-channel round is expected soon, spending less now to preserve energy pays off.

Closing the gap

Methods: dynamic programming on channel-state trajectory (exponential in state space); Lyapunov optimization for stationary channels (tractable, gives competitive bounds); reinforcement learning for non-stationary channels (heuristic, no guarantees).

Open research

Characterize the optimal joint-design for non-stationary channels with energy and fairness constraints. This is an open problem at the intersection of optimization and information theory — the Chapter 18 frontier.

Exercises

ex17-1

Digital

AirComp

Ratio

ex17-2

Variance

Floor

ex17-3

Setup

Target

Operational

ex17-4

Threshold

Fraction

Operational

ex17-5

Percentile

Compute $\kappa$

Factor

Operational

ex17-6

Order by $\gamma$

Try all 5 active

Check feasibility

Verify sum

Operational

ex17-7

Variance

Floor

Round count

ex17-8

Bound

Solve

Operational

Note

ex17-9

(a) Constant

(b) Decreasing

Conclusion

Operational

ex17-10

Rate at $|\eta|^2 = 1$

Privacy vs. $\sigma_m^2$

Boundary

Decoupling

ex17-11

Setup

Substitute

Operational

Deep-learning implication

ex17-12

Aggregate bias

Iterative amplification

Operational

Regulatory angle

ex17-13

Hybrid protocol

Overhead calculation

Comparison

Trade-off

Open design problem

ex17-14

Gaussian setting

Exact recursion

Comparison with bound

Operational

ex17-15

The joint structure

Known decomposition

Suboptimality

Closing the gap

Open research