Ferkans — Interactive Telecom Tutor

ex16-1

Easy

Four users share an AirComp MAC with $|h_k|^2 = \{1.0, 0.8, 0.3, 0.5\}$ , per-user budget $P_k = 1$ , source variance $\sigma_s^2 = 1$ , and $\sigma^2 = 0.02$ . Compute $\eta^{\star}$ and $\mathsf{MSE}^{\star}$ under zero-forcing.

Show Hint

$\gamma_k = |h_k|^2 P_k / \sigma_s^2$ .

$\eta^{\star} = \sqrt{\min_k \gamma_k}$ .

Solution

Compute $\gamma_k$

$\gamma_k = \{1.0, 0.8, 0.3, 0.5\}$ .

Minimum

$\min_k \gamma_k = 0.3$ (user 3).

Results

$\eta^{\star} = \sqrt{0.3} \approx 0.548$ ; $\mathsf{MSE}^{\star} = 0.02/0.3 \approx 0.067$ .

ex16-2

Easy

The harmonic mean of positive reals $s_1, \ldots, s_n$ is $H(s_1, \ldots, s_n) = n / \sum_{k=1}^{n}(1/s_k)$ . Give $\varphi_k$ and $\psi$ showing this is nomographic.

Show Hint

Each term $1/s_k$ is a pre-processing.

The denominator is a sum.

Solution

Identify $\varphi_k$

$\varphi_k(s) = 1/s$ .

Identify $\psi$

$\psi(u) = n/u$ .

Verify

$\psi(\sum_k \varphi_k(s_k)) = \psi(\sum_k 1/s_k) = n / \sum_k (1/s_k)$ . ✓ One AirComp channel use suffices.

ex16-3

Easy

Users have uniformly distributed phase errors $\phi_k \sim \text{Uniform}[-10°, 10°]$ . Source variance $\sigma_s^2 = 1$ . Compute the misalignment MSE floor (Theorem 16.4.3).

Show Hint

Convert degrees to radians: $10° \approx 0.175$ rad.

$\mathrm{sinc}(x) = \sin(x)/x$ .

Solution

Convert

$\phi_{\max} = 10° = 0.175$ rad.

Compute sinc

$\sin(0.175)/0.175 \approx 0.1736/0.175 \approx 0.9949$ .

Floor

$\sigma_s^2 (1 - 0.9949^2) = 1 \cdot (1 - 0.9898) \approx 0.0102$ .

Comparison with AWGN floor

If noise floor is $0.01$ , misalignment adds about $100\%$ — doubling the total MSE. Tightening to $\phi_{\max} = 5°$ would reduce the misalignment floor to $\approx 0.0026$ , essentially negligible.

ex16-4

Easy

With $n = 100$ users, unit source variance $\sigma_s^2 = 1$ , and $|\eta|^2/\sigma^2 = 10$ (10 dB effective aggregation SNR), compute the per-user MI bound from Theorem 16.4.1.

Show Hint

Use the Gaussian MI formula.

Express bound in nats.

Solution

Per-user SNR

$|\eta|^2 \sigma_s^2 / ((n-1)|\eta|^2\sigma_s^2 + \sigma^2) = 1 / (99 + 0.1) \approx 0.0101$ .

MI bound

$\tfrac{1}{2}\ln(1 + 0.0101) \approx \tfrac{1}{2} \cdot 0.01005 \approx 0.005$ nats.

In bits

$\approx 0.0072$ bits per user per round.

Operational

Less than one percent of a bit of individual information leaks per round at $n = 100$ . For longitudinal privacy over $T$ rounds, the total leakage is bounded by $T \cdot 0.005$ nats; the user can budget a target leakage ceiling to set maximum rounds.

ex16-5

Medium

A system with $n = 20$ heterogeneous users has $\min_k \gamma_k = 0.5$ at $P = 0$ dB (per-user budget $P = 1$ ). The target MSE is $0.01$ , noise variance $0.01$ . The operator has two options: (a) increase $P$ by $x$ dB (assume all $\gamma_k$ scale with $P$ ); (b) drop $k$ weakest users leaving $n - k$ users with $\min_k \gamma_k = 1.2$ (threshold scheduled). Which option needs less "cost," and what is the threshold for (a)?

Show Hint

Option (a): $\eta^2$ scales linearly with $P$ .

Target: $\sigma^2/\eta^2 \leq 0.01$ , i.e., $\eta^2 \geq 1$ .

Solution

Option (a) — increase $P$

Current: $\eta^2 = 0.5$ . Target: $\eta^2 \geq 1$ . Factor: $2\times$ in $\eta^2$ needs $2\times$ in $P$ : $3$ dB increase.

Option (b) — drop weak users

After dropping: $\min \gamma_k = 1.2$ , $\mathsf{MSE} = 0.01/1.2 \approx 0.0083$ — meets target. But aggregate is now $\sum_{k \notin \text{dropped}} s_k$ , missing some users.

Compare

Option (a) costs $3$ dB transmit power (halving battery life). Option (b) costs statistical representativeness ( $k$ users excluded). For FL, (a) is usually preferred because it preserves all users' gradients; (b) is preferred when all users' data is i.i.d. and battery is critical.

Engineering perspective

The operator faces the golden thread in miniature: power vs. statistical representativeness. No universal answer; deployment specifics determine the choice.

ex16-6

Medium

Consider the quadratic aggregate $f(s_1, \ldots, s_n) = \frac{1}{n}\sum_{k=1}^{n}(s_k - c)^2$ where $c$ is a fixed constant known to all users. Give a nomographic form for $f$ and identify any pre/post-processing caveats.

Show Hint

Users can compute $(s_k - c)^2$ locally.

The aggregate is a simple average.

Solution

Pre-processing

$\varphi_k(s) = (s - c)^2 / n$ . Each user computes this locally (they know their own $s_k$ and the constant $c$ ).

Post-processing

$\psi(u) = u$ (identity).

Caveat — what if $c$ is unknown?

If $c$ must itself be computed (e.g., $c = \bar{s}$ ), then the aggregate is not nomographic — it's a two-step protocol: first AirComp to compute $\bar{s}$ , then a second AirComp round to compute the empirical variance $\sum_k (s_k - \hat{\bar{s}})^2/n$ . The second round inherits MSE from the first.

Operational

A one-step AirComp of empirical variance requires $c$ to be a known reference value (e.g., zero, or a previous-round parameter). Otherwise, plan two rounds.

ex16-7

Medium

Verify the $\sqrt{n}$ -factor differential privacy amplification (Theorem 16.4.2) for a concrete setting: sensitivity $\Delta f = 1$ , target $(\varepsilon, \delta) = (1, 10^{-5})$ at the aggregate level. What is the required per-user dither $\sigma_z$ (i) without AirComp (pre-sum digital) and (ii) with AirComp? Set $n = 100$ .

Show Hint

Gaussian mechanism: $\sigma = \Delta f \sqrt{2\ln(1.25/\delta)}/\varepsilon$ .

Per-user dither in AirComp is $\sigma^{\text{agg}}/\sqrt{n}$ .

Solution

Aggregate-level $\sigma^{\text{agg}}$

$\sigma^{\text{agg}} = 1 \cdot \sqrt{2\ln(1.25/10^{-5})}/1 = \sqrt{2 \cdot 11.7} \approx 4.84$ .

(i) Digital, without AirComp

Each user must add their own $\sigma_z = \sigma^{\text{agg}} \approx 4.84$ to their individual upload. Aggregate $\sum_k s_k$ has noise $\sqrt{n} \cdot \sigma^{\text{agg}} \approx 48.4$ — privacy still holds but noise is much larger.

(ii) AirComp with amplification

Each user adds $\sigma_z^{\text{per-user}} = \sigma^{\text{agg}}/\sqrt{n} = 4.84/10 = 0.484$ . Aggregate dither is $\sqrt{n \cdot (0.484)^2} = \sqrt{100 \cdot 0.234} = \sqrt{23.4} \approx 4.84$ — the required level.

Privacy-utility gain

The aggregate has the same $\sigma^{\text{agg}}$ in both cases, but AirComp uses $10\times$ less per-user dither. The useful gradient survives in the aggregate much better under AirComp.

ex16-8

Medium

AirComp computes the second moment $f = \sum_k s_k^2 / n$ via $\varphi_k(s) = s^2/n$ and $\psi = \mathrm{id}$ . But for computing the variance (an unbiased estimator), one wants $\text{Var}(s) = \mathbb{E}[s^2] - (\mathbb{E}[s])^2$ . Argue that the variance cannot be computed nomographically in one round; then sketch a two-round AirComp.

Show Hint

Variance requires $\mathbb{E}[s]$ first.

Two rounds: AirComp mean, then AirComp of $(s_k - \bar{s})^2$ .

Solution

Why one round fails

Variance is $\frac{1}{n}\sum_k s_k^2 - (\frac{1}{n}\sum_k s_k)^2$ . The second term depends on all $s_k$ non-trivially (not via a sum of pre-processing). In particular, $(\frac{1}{n}\sum_k s_k)^2$ is quadratic in the sum — not nomographic.

Two-round AirComp

Round 1: compute $\hat{\bar{s}} = \frac{1}{n}\sum_k s_k$ . Round 2: users compute $(s_k - \hat{\bar{s}})^2 / n$ locally; AirComp these. Result: empirical variance $\hat{\text{Var}}$ .

Inheritance of MSE

The Round-2 estimate inherits additional MSE from the Round-1 noise in $\hat{\bar{s}}$ : perturbation propagates through the squared-residual computation. The total MSE is $\approx 2\mathsf{MSE}$ (first round on variance, second round's propagation).

Operational

Compound-step AirComp doubles the channel-use cost and MSE. For FL of moments, prefer first-moment aggregates (gradient mean) and accept the variance is a secondary-round quantity.

ex16-9

Medium

A user has $\gamma_k = 0.1$ (worst of $n = 20$ users, others have $\gamma_k \geq 0.5$ ). Dropping this user improves MSE from $\sigma^2/0.1$ to $\sigma^2/0.5$ — a $5\times$ reduction. Alternatively, the weak user can boost transmit power by $5\times$ at $5\times$ battery cost. Under what condition is "drop" optimal vs. "boost"?

Show Hint

Opex cost: battery life vs. data exclusion.

Think in terms of total task utility.

Solution

Drop analysis

Dropping leaves 19 users, aggregating $\sum_{k=2}^{20} s_k$ . MSE drops $5\times$ , but data is $19/20 \cdot \sum_{\text{all}} s_k$ — a $5\%$ bias if the aggregate matters globally.

Boost analysis

Boost by $5\times$ (7 dB): all users included, MSE still $5\times$ better than status quo. Battery drain for the weak user is $5\times$ .

Optimal decision

Depends on (i) whether the excluded user's data is biased (favor boost) and (ii) battery constraints (favor drop). Quantitatively: let $B$ = value of including user per round; $C$ = cost of $5\times$ battery drain. Drop if $B < C$ ; boost if $B > C$ . In practice, most FL deployments favor rotation across rounds to amortize costs.

ex16-10

Hard

The access point has $n$ receive antennas (not one). Show that the access point can now recover each individual $s_k$ (no AirComp privacy). Sketch the math.

Show Hint

Use the invertibility of a non-degenerate channel matrix.

The $n \times n$ matrix of channel gains can be inverted.

Solution

Signal model with MIMO receiver

Let $\mathbf{h}_k \in \mathbb{C}^n$ be user $k$ 's channel to the $n$ receive antennas. The received signal is $\mathbf{r} = \mathbf{H} \mathbf{b} \odot \mathbf{s} + \mathbf{w}$ , where $\mathbf{H}$ is the $n \times n$ matrix with columns $\mathbf{h}_k$ , $\mathbf{b}$ is the user-transmit-scaling vector, and $\mathbf{s}$ is the source vector.

Zero-forcing decoding

If $\mathbf{H}$ has full rank (which it does w.p. 1 for random fading channels), the receiver can invert: $\mathbf{H}^{-1} \mathbf{r} = \mathbf{b} \odot \mathbf{s} + \mathbf{H}^{-1} \mathbf{w}$ . Dividing by $b_k$ recovers each $s_k$ (with noise).

Privacy collapses

The access point now directly observes each $s_k$ (up to noise). The "superposition privacy" of single-antenna AirComp is gone — MIMO fundamentally changes the threat model.

Engineering implication

Always assume the AP may one day upgrade to MIMO. Layer cryptographic aggregation (Chapter 10) or information- theoretically secure federated learning (Chapter 17) on top for defense in depth.

ex16-11

Hard

The exact maximum $\max(s_1, \ldots, s_n)$ is continuous but not smoothly nomographic (its nomographic representation per Kolmogorov-Arnold requires $2n+1$ terms). Estimate the required number of AirComp channel uses to compute the exact max versus the smooth-max approximation with tolerance $\varepsilon$ .

Show Hint

Smooth-max: $\max_k s_k \approx (\sum_k s_k^p)^{1/p}$ for $p \gg 1$ .

Kolmogorov's bound: $2n + 1$ .

Solution

Exact max via Kolmogorov

The max is nomographic over $2n+1$ AirComp rounds: each round computes one term of Kolmogorov's sum. Each round incurs its own AirComp MSE.

Smooth-max (single round)

$\varphi_k(s) = s^p$ gives aggregate $\sum_k s_k^p$ ; $\psi(u) = u^{1/p}$ gives smooth-max. One channel use.

Trade-off

Approximation error: $(\sum_k s_k^p)^{1/p}$ converges to $\max_k s_k$ as $p \to \infty$ , but $\varphi_k$ becomes numerically unstable (large $p$ amplifies differences).

Operational

For reasonable tolerances ( $\varepsilon = 0.01$ of the max value), $p \approx 100$ suffices. One round vs. $2n+1$ rounds — $>99\%$ communication savings at the cost of approximation error. Choose based on exact-vs-approximate requirement.

ex16-12

Hard

Users add Gaussian dither for DP. As transmit power $P$ increases, the aggregation SNR improves but the DP guarantee (at fixed $\sigma_z$ ) does not change. Show that for fixed $(\varepsilon, \delta)$ , increasing $P$ is Pareto-dominated by increasing $P$ and decreasing $\sigma_z$ proportionally. Derive the per-user power-dither trade-off.

Show Hint

AirComp MSE $= \sigma^2/(|\eta|^2 n) + n\sigma_z^2/\eta^2$ with total dither contribution.

$(\varepsilon, \delta)$ -DP fixes $n\sigma_z^2$ .

Solution

Aggregate dither constraint

For aggregate-level $(\varepsilon, \delta)$ -DP, the aggregate dither variance $n \sigma_z^2 = \sigma_{\text{agg}}^2$ is fixed.

AirComp MSE decomposition

$\mathsf{MSE} = \sigma^2/|\eta|^2 + \sigma_{\text{agg}}^2$ where $|\eta|^2 = P \cdot |h|^2/\sigma_s^2$ (assuming aligned best-case channel).

Scaling

Increasing $P$ reduces the first term; the second term (DP floor) is independent of $P$ . As $P \to \infty$ , MSE $\to \sigma_{\text{agg}}^2$ — an irreducible DP floor.

Trade-off Pareto frontier

At fixed $(\varepsilon, \delta)$ , the operator can trade $P$ (battery cost) against the inability to reduce MSE below the DP floor. If MSE floor is above the downstream tolerance, reduce DP guarantee (increase $\varepsilon$ , relax privacy) to reduce $\sigma_{\text{agg}}^2$ and permit lower MSE.

Engineering

The Pareto frontier is sharp: AirComp with DP is a power- and-privacy-constrained system. Production FL must specify both simultaneously.

ex16-13

Hard

Digital uplink aggregates $n$ gradients each quantized to $b$ bits. Assume each quantizer contributes uniform quantization noise with variance $\sigma_Q^2 = \text{range}^2 / (12 \cdot 4^b)$ . Compare digital aggregate MSE ( $n\sigma_Q^2$ ) with AirComp MSE at comparable bandwidth.

Show Hint

Digital uses $\sim nb$ channel uses per aggregation; AirComp uses 1.

Compare at equal channel-use budget.

Solution

Digital MSE

Aggregate noise: $n\sigma_Q^2 = n \cdot \text{range}^2 / (12 \cdot 4^b)$ . Bandwidth: $\Theta(nb)$ channel uses.

AirComp MSE at equal bandwidth

With the same channel-use budget $nb$ , AirComp can do $nb$ rounds — each of MSE $\sigma^2/\eta^2$ . With averaging, the effective MSE scales as $1/nb \cdot \sigma^2/\eta^2$ — typically much smaller than $n \sigma_Q^2$ .

Operational

For modest $n$ and $b$ , digital is fine. For large $n$ and modest MSE tolerance, AirComp's bandwidth efficiency dominates.

Realistic break-even

For $n = 100, b = 8$ : $\sigma_Q^2/\text{range}^2 \approx 10^{-5}$ , aggregate $\approx 10^{-3}$ . AirComp with $\sigma^2/\eta^2 = 10^{-4}$ — comparable. Above $n = 100$ , AirComp typically wins on bandwidth by $10\times$ or more.

ex16-14

Hard

Show that the AirComp MSE $\mathsf{MSE}(P) = \sigma^2/P \cdot \min_k (|h_k|^2/\sigma_s^2)^{-1}$ is convex in $1/P$ and decreasing in $P$ . Use this to derive the water-filling structure of optimal power allocation across multiple AirComp rounds with a total energy budget.

Show Hint

MSE is $\propto 1/P$ .

Convexity of $1/P$ is standard.

Solution

MSE structure

$\mathsf{MSE}(P) = \kappa/P$ where $\kappa = \sigma^2/\min_k(|h_k|^2/\sigma_s^2)$ . As $P \to \infty$ , MSE $\to 0$ .

Convexity

$d^2(\kappa/P)/dP^2 = 2\kappa/P^3 > 0$ . Convex in $P$ .

Multi-round optimization

With $T$ rounds and total energy $E$ , distribute as $P_t = E/T$ uniformly (by convexity / AM-GM). Total MSE $T \cdot \kappa/(E/T) = T^2 \kappa/E$ . Minimize over $T$ : only one round is optimal (more rounds hurts).

Refinement: time-varying channels

Under time-varying channels ( $\kappa_t$ varying across rounds), water-filling assigns more power to good-channel rounds, breaking uniformity. Optimal $P_t^{\star} \propto \sqrt{\kappa_t}$ .

ex16-15

Challenge

For a nomographic function $f = \psi(\sum_k \varphi_k(s_k))$ with specific $\psi, \varphi_k$ , what is the fundamental MSE lower bound? Zero-forcing achieves $|\psi'|^2 \sigma^2/\eta^2$ ; is this optimal? Discuss the open problem and possible improvements via non-linear receivers.

Show Hint

ZF is linear; $\psi$ is non-linear.

Post-processing can be Bayesian.

Solution

ZF baseline

$\mathsf{MSE}_{\text{ZF}} = |\psi'|^2 \sigma^2/\eta^2$ . Linear receiver.

Non-linear receiver

Bayesian post-processor: $\hat{f}(r) = \mathbb{E}[f | r]$ . For Gaussian $s_k$ and sub-quadratic $\psi$ , MMSE can outperform ZF by using priors on the source.

Cramer-Rao bound

Per the Cramer-Rao inequality, unbiased estimation of $f$ has MSE $\geq \mathbb{E}[|\psi'|^2] \sigma^2/\eta^2$ — matching ZF up to a constant. Biased estimation (e.g., shrinkage) can go below — but introduces bias.

Open problem

The minimum MSE for nomographic AirComp is characterized only in special cases (linear $\psi$ , Gaussian sources). The general optimal receiver is open.

Research directions

Approximate message passing (AMP) receivers, deep-learning post-processors, and structured-prior Bayesian methods all offer potential improvements. Chapter 18 revisits this frontier.

Exercises

ex16-1

Compute $\gamma_k$

Minimum

Results

ex16-2

Identify $\varphi_k$

Identify $\psi$

Verify

ex16-3

Convert

Compute sinc

Floor

Comparison with AWGN floor

ex16-4

Per-user SNR

MI bound

In bits

Operational

ex16-5

Option (a) — increase $P$

Option (b) — drop weak users

Compare

Engineering perspective

ex16-6

Pre-processing

Post-processing

Caveat — what if $c$ is unknown?

Operational

ex16-7

Aggregate-level $\sigma^{\text{agg}}$

(i) Digital, without AirComp

(ii) AirComp with amplification

Privacy-utility gain

ex16-8

Why one round fails

Two-round AirComp

Inheritance of MSE

Operational

ex16-9

Drop analysis

Boost analysis

Optimal decision

ex16-10

Signal model with MIMO receiver

Zero-forcing decoding

Privacy collapses

Engineering implication

ex16-11

Exact max via Kolmogorov

Smooth-max (single round)

Trade-off

Operational

ex16-12

Aggregate dither constraint

AirComp MSE decomposition

Scaling

Trade-off Pareto frontier

Engineering

ex16-13

Digital MSE

AirComp MSE at equal bandwidth

Operational

Realistic break-even

ex16-14

MSE structure

Convexity

Multi-round optimization

Refinement: time-varying channels

ex16-15

ZF baseline

Non-linear receiver

Cramer-Rao bound

Open problem

Research directions