Exercises
ex16-1
EasyFour users share an AirComp MAC with , per-user budget , source variance , and . Compute and under zero-forcing.
.
.
Compute $\gamma_k$
.
Minimum
(user 3).
Results
; .
ex16-2
EasyThe harmonic mean of positive reals is . Give and showing this is nomographic.
Each term is a pre-processing.
The denominator is a sum.
Identify $\varphi_k$
.
Identify $\psi$
.
Verify
. β One AirComp channel use suffices.
ex16-3
EasyUsers have uniformly distributed phase errors . Source variance . Compute the misalignment MSE floor (Theorem 16.4.3).
Convert degrees to radians: rad.
.
Convert
rad.
Compute sinc
.
Floor
.
Comparison with AWGN floor
If noise floor is , misalignment adds about β doubling the total MSE. Tightening to would reduce the misalignment floor to , essentially negligible.
ex16-4
EasyWith users, unit source variance , and (10 dB effective aggregation SNR), compute the per-user MI bound from Theorem 16.4.1.
Use the Gaussian MI formula.
Express bound in nats.
Per-user SNR
.
MI bound
nats.
In bits
bits per user per round.
Operational
Less than one percent of a bit of individual information leaks per round at . For longitudinal privacy over rounds, the total leakage is bounded by nats; the user can budget a target leakage ceiling to set maximum rounds.
ex16-5
MediumA system with heterogeneous users has at dB (per-user budget ). The target MSE is , noise variance . The operator has two options: (a) increase by dB (assume all scale with ); (b) drop weakest users leaving users with (threshold scheduled). Which option needs less "cost," and what is the threshold for (a)?
Option (a): scales linearly with .
Target: , i.e., .
Option (a) β increase $P$
Current: . Target: . Factor: in needs in : dB increase.
Option (b) β drop weak users
After dropping: , β meets target. But aggregate is now , missing some users.
Compare
Option (a) costs dB transmit power (halving battery life). Option (b) costs statistical representativeness ( users excluded). For FL, (a) is usually preferred because it preserves all users' gradients; (b) is preferred when all users' data is i.i.d. and battery is critical.
Engineering perspective
The operator faces the golden thread in miniature: power vs. statistical representativeness. No universal answer; deployment specifics determine the choice.
ex16-6
MediumConsider the quadratic aggregate where is a fixed constant known to all users. Give a nomographic form for and identify any pre/post-processing caveats.
Users can compute locally.
The aggregate is a simple average.
Pre-processing
. Each user computes this locally (they know their own and the constant ).
Post-processing
(identity).
Caveat β what if $c$ is unknown?
If must itself be computed (e.g., ), then the aggregate is not nomographic β it's a two-step protocol: first AirComp to compute , then a second AirComp round to compute the empirical variance . The second round inherits MSE from the first.
Operational
A one-step AirComp of empirical variance requires to be a known reference value (e.g., zero, or a previous-round parameter). Otherwise, plan two rounds.
ex16-7
MediumVerify the -factor differential privacy amplification (Theorem 16.4.2) for a concrete setting: sensitivity , target at the aggregate level. What is the required per-user dither (i) without AirComp (pre-sum digital) and (ii) with AirComp? Set .
Gaussian mechanism: .
Per-user dither in AirComp is .
Aggregate-level $\sigma^{\text{agg}}$
.
(i) Digital, without AirComp
Each user must add their own to their individual upload. Aggregate has noise β privacy still holds but noise is much larger.
(ii) AirComp with amplification
Each user adds . Aggregate dither is β the required level.
Privacy-utility gain
The aggregate has the same in both cases, but AirComp uses less per-user dither. The useful gradient survives in the aggregate much better under AirComp.
ex16-8
MediumAirComp computes the second moment via and . But for computing the variance (an unbiased estimator), one wants . Argue that the variance cannot be computed nomographically in one round; then sketch a two-round AirComp.
Variance requires first.
Two rounds: AirComp mean, then AirComp of .
Why one round fails
Variance is . The second term depends on all non-trivially (not via a sum of pre-processing). In particular, is quadratic in the sum β not nomographic.
Two-round AirComp
Round 1: compute . Round 2: users compute locally; AirComp these. Result: empirical variance .
Inheritance of MSE
The Round-2 estimate inherits additional MSE from the Round-1 noise in : perturbation propagates through the squared-residual computation. The total MSE is (first round on variance, second round's propagation).
Operational
Compound-step AirComp doubles the channel-use cost and MSE. For FL of moments, prefer first-moment aggregates (gradient mean) and accept the variance is a secondary-round quantity.
ex16-9
MediumA user has (worst of users, others have ). Dropping this user improves MSE from to β a reduction. Alternatively, the weak user can boost transmit power by at battery cost. Under what condition is "drop" optimal vs. "boost"?
Opex cost: battery life vs. data exclusion.
Think in terms of total task utility.
Drop analysis
Dropping leaves 19 users, aggregating . MSE drops , but data is β a bias if the aggregate matters globally.
Boost analysis
Boost by (7 dB): all users included, MSE still better than status quo. Battery drain for the weak user is .
Optimal decision
Depends on (i) whether the excluded user's data is biased (favor boost) and (ii) battery constraints (favor drop). Quantitatively: let = value of including user per round; = cost of battery drain. Drop if ; boost if . In practice, most FL deployments favor rotation across rounds to amortize costs.
ex16-10
HardThe access point has receive antennas (not one). Show that the access point can now recover each individual (no AirComp privacy). Sketch the math.
Use the invertibility of a non-degenerate channel matrix.
The matrix of channel gains can be inverted.
Signal model with MIMO receiver
Let be user 's channel to the receive antennas. The received signal is , where is the matrix with columns , is the user-transmit-scaling vector, and is the source vector.
Zero-forcing decoding
If has full rank (which it does w.p. 1 for random fading channels), the receiver can invert: . Dividing by recovers each (with noise).
Privacy collapses
The access point now directly observes each (up to noise). The "superposition privacy" of single-antenna AirComp is gone β MIMO fundamentally changes the threat model.
Engineering implication
Always assume the AP may one day upgrade to MIMO. Layer cryptographic aggregation (Chapter 10) or information- theoretically secure federated learning (Chapter 17) on top for defense in depth.
ex16-11
HardThe exact maximum is continuous but not smoothly nomographic (its nomographic representation per Kolmogorov-Arnold requires terms). Estimate the required number of AirComp channel uses to compute the exact max versus the smooth-max approximation with tolerance .
Smooth-max: for .
Kolmogorov's bound: .
Exact max via Kolmogorov
The max is nomographic over AirComp rounds: each round computes one term of Kolmogorov's sum. Each round incurs its own AirComp MSE.
Smooth-max (single round)
gives aggregate ; gives smooth-max. One channel use.
Trade-off
Approximation error: converges to as , but becomes numerically unstable (large amplifies differences).
Operational
For reasonable tolerances ( of the max value), suffices. One round vs. rounds β communication savings at the cost of approximation error. Choose based on exact-vs-approximate requirement.
ex16-12
HardUsers add Gaussian dither for DP. As transmit power increases, the aggregation SNR improves but the DP guarantee (at fixed ) does not change. Show that for fixed , increasing is Pareto-dominated by increasing and decreasing proportionally. Derive the per-user power-dither trade-off.
AirComp MSE with total dither contribution.
-DP fixes .
Aggregate dither constraint
For aggregate-level -DP, the aggregate dither variance is fixed.
AirComp MSE decomposition
where (assuming aligned best-case channel).
Scaling
Increasing reduces the first term; the second term (DP floor) is independent of . As , MSE β an irreducible DP floor.
Trade-off Pareto frontier
At fixed , the operator can trade (battery cost) against the inability to reduce MSE below the DP floor. If MSE floor is above the downstream tolerance, reduce DP guarantee (increase , relax privacy) to reduce and permit lower MSE.
Engineering
The Pareto frontier is sharp: AirComp with DP is a power- and-privacy-constrained system. Production FL must specify both simultaneously.
ex16-13
HardDigital uplink aggregates gradients each quantized to bits. Assume each quantizer contributes uniform quantization noise with variance . Compare digital aggregate MSE () with AirComp MSE at comparable bandwidth.
Digital uses channel uses per aggregation; AirComp uses 1.
Compare at equal channel-use budget.
Digital MSE
Aggregate noise: . Bandwidth: channel uses.
AirComp MSE at equal bandwidth
With the same channel-use budget , AirComp can do rounds β each of MSE . With averaging, the effective MSE scales as β typically much smaller than .
Operational
For modest and , digital is fine. For large and modest MSE tolerance, AirComp's bandwidth efficiency dominates.
Realistic break-even
For : , aggregate . AirComp with β comparable. Above , AirComp typically wins on bandwidth by or more.
ex16-14
HardShow that the AirComp MSE is convex in and decreasing in . Use this to derive the water-filling structure of optimal power allocation across multiple AirComp rounds with a total energy budget.
MSE is .
Convexity of is standard.
MSE structure
where . As , MSE .
Convexity
. Convex in .
Multi-round optimization
With rounds and total energy , distribute as uniformly (by convexity / AM-GM). Total MSE . Minimize over : only one round is optimal (more rounds hurts).
Refinement: time-varying channels
Under time-varying channels ( varying across rounds), water-filling assigns more power to good-channel rounds, breaking uniformity. Optimal .
ex16-15
ChallengeFor a nomographic function with specific , what is the fundamental MSE lower bound? Zero-forcing achieves ; is this optimal? Discuss the open problem and possible improvements via non-linear receivers.
ZF is linear; is non-linear.
Post-processing can be Bayesian.
ZF baseline
. Linear receiver.
Non-linear receiver
Bayesian post-processor: . For Gaussian and sub-quadratic , MMSE can outperform ZF by using priors on the source.
Cramer-Rao bound
Per the Cramer-Rao inequality, unbiased estimation of has MSE β matching ZF up to a constant. Biased estimation (e.g., shrinkage) can go below β but introduces bias.
Open problem
The minimum MSE for nomographic AirComp is characterized only in special cases (linear , Gaussian sources). The general optimal receiver is open.
Research directions
Approximate message passing (AMP) receivers, deep-learning post-processors, and structured-prior Bayesian methods all offer potential improvements. Chapter 18 revisits this frontier.