Exercises
ex17-1
EasyGiven users, gradient dimension , symbol rate /s, and digital uses bits/symbol. Compute the per-round channel-use count for each aggregator.
Digital: orthogonal symbols.
AirComp: symbols (one per coordinate).
Digital
symbols.
AirComp
symbols.
Ratio
Digital is longer.
ex17-2
EasyGiven , compute the predicted convergence floor per Theorem 17.2.1.
Floor .
Variance
.
Floor
.
ex17-3
EasyFor users, , what is the matched aggregation MSE target (where the two variance terms equal)?
Equality: .
Setup
.
Target
.
Operational
AirComp with MSE is over-engineered; MSE is under-engineered. Design around 200.
ex17-4
EasyUsers have Rayleigh-fading channels , per-user budget , source variance . For MSE target and noise , compute the threshold and the expected fraction of users included in .
.
for exponential.
Threshold
.
Fraction
.
Operational
About of users included per round. The remaining are excluded; mitigate via - fairness.
ex17-5
MediumFor Rayleigh channels, if each user participates in at least of their proportional-share rounds, estimate the average MSE relative to unconstrained. Use the exponential channel's -percentile.
Percentile: .
Factor where depends on channel.
Percentile
.
Compute $\kappa$
Under unconstrained: in large . For , this is around . .
Factor
. (Different derivation β calibrate empirically.)
Operational
About more convergence rounds to reach the same aggregate floor. In exchange: unbiased participation.
ex17-6
MediumUser has energy budget and participates in 5 rounds with . Compute the water-filling power allocation.
.
Solve for .
Order by $\gamma$
Sort descending: .
Try all 5 active
. .
Check feasibility
All powers positive: . All positive β.
Verify sum
. β
Operational
Strong rounds (high ) receive most power; the weak round (0.3) receives only β nearly idle. Water-filling naturally emphasizes good channel realizations.
ex17-7
MediumFor an FL task with target loss gap , , , , initial gap , compute the required round count .
Floor: .
.
Variance
.
Floor
. Below target.
Round count
. So rounds.
ex17-8
MediumIn the CommIT IT-secure scheme (Theorem 17.4.1), derive the mask variance needed so that the per-user MI leak is at most nats. Use .
.
Solve for .
Bound
Require .
Solve
.
Operational
times . Mask dominates per-user representation variance by an order of magnitude, giving 0.01-nat-per-round privacy.
Note
The masks don't degrade the aggregate MSE (they cancel). The mask variance is a pure privacy knob.
ex17-9
MediumCompare convergence over rounds with (a) constant and (b) decreasing . Which reaches a smaller final loss gap?
(a): Theorem 17.2.1, reaches a floor.
(b): Theorem 17.2.2, converges but slowly.
(a) Constant
Exponential decay to floor of (computed above). At : both terms of Theorem 17.2.1 matter; need rounds to reach floor. At , loss .
(b) Decreasing
. Actually slightly larger than (a) at .
Conclusion
For , constant is slightly better. For , (b) would outperform (a). Decreasing is asymptotically optimal but pays a slower rate.
Operational
Short horizon β constant . Long horizon β decreasing. The crossover depends on the target loss gap.
ex17-10
HardFor Theorem 17.4.2's rate-privacy region, plot the boundary at and varying . Show that rate and privacy are decoupled at fixed .
Rate depends on , independent of .
Privacy depends on .
Rate at $|\eta|^2 = 1$
bits per channel use per scalar. Independent of .
Privacy vs. $\sigma_m^2$
. For : nats. : nats. : nats. : nats. : nats.
Boundary
The feasible region is a rectangle: rate fixed, privacy monotone in .
Decoupling
This is the key engineering benefit of the CommIT scheme: rate and privacy are separate design knobs β a privacy improvement is not purchased with a rate loss. Compare with schemes (e.g., digital + cryptographic SecAgg) where privacy adds communication cost.
ex17-11
HardFor a non-convex FL task (Theorem 17.2.3), compute the iterations needed to reach starting from . Use as parameters.
.
Setup
Require both terms . Gives: and .
Substitute
.
Operational
Non-convex FL takes rounds β much worse than for strong convexity. Aggregation MSE enters as , scaling the round count linearly.
Deep-learning implication
For deep-network FL, thousands of rounds are typical. Aggregation MSE affects round count linearly; halving MSE halves round count.
ex17-12
HardSuppose threshold scheduling systematically excludes user 1, whose gradient satisfies (a bias relative to the true gradient). Derive the convergence bias introduced.
Missing user 1 means server sees .
Expected update is biased.
Aggregate bias
(using w.l.o.g.). The estimate of the gradient is biased by .
Iterative amplification
Over rounds, the bias accumulates into a steady-state error of .
Operational
Systematically excluding any user with biased gradient introduces a persistent convergence bias. Even if MSE is controlled, the model is drawn toward an unrepresentative optimum. The fix: -fairness, even at MSE cost.
Regulatory angle
In applications with mandatory fairness (e.g., demographic parity in healthcare), systematic exclusion violates law. Fairness is not just nice-to-have; it may be a compliance requirement.
ex17-13
HardSketch a hybrid scheme where the majority of the round uses AirComp but occasional digital ACK/checksum sub-rounds provide integrity. Compare its total overhead with pure digital.
Digital checksum every rounds.
AirComp for rounds.
Hybrid protocol
Rounds 1 to : AirComp aggregation. Round : digital upload, use a cryptographic digest (Merkle root or hash) to verify that previous aggregations were not spoofed.
Overhead calculation
Digital round costs symbols; AirComp rounds cost each. Total over rounds: symbols.
Comparison
Pure digital: symbols. Hybrid: symbols. Ratio: . For : β four times more efficient than pure digital.
Trade-off
Hybrid gets AirComp's bandwidth gain plus periodic integrity checks. Detection of spoofing is delayed by at most rounds. Trade against detection latency.
Open design problem
Optimal given attacker power, FL convergence impact, and integrity requirement: open. Chapter 18 revisits.
ex17-14
HardTheorem 17.2.1 gives a noise floor. Is the bound tight? Consider the example of FL with i.i.d.\ Gaussian losses and Gaussian aggregation noise β does the bound match simulations?
For exact Gaussian, the SGD recursion is exactly solvable.
Compare analytical bound with simulated expected loss.
Gaussian setting
; gradients are Gaussian with mean and covariance . Aggregation adds Gaussian noise with covariance .
Exact recursion
where is Gaussian with covariance . satisfies a linear recursion converging to = for .
Comparison with bound
Theorem 17.2.1 bound: (using ). Simulated: . The bound is tight in this regime.
Operational
In regimes close to the Gaussian model, the bound is tight. For non-Gaussian, bounded gradients can be tighter than , giving a slightly smaller floor. Practical measurement: simulate a realistic FL task; compare with bound; typically within .
ex17-15
ChallengeThe joint wireless-FL problem (Definition 17.3.1) is non-convex. Discuss the decomposition, the known suboptimality gap, and what would be needed to close it.
Per-round Pareto optimality doesn't imply global optimality.
Channel correlations across rounds add complexity.
The joint structure
Variables: . Objective: . Constraints: per-user energy, per-round power.
Known decomposition
(1) Per-round: threshold + water-filling (Pareto per round). (2) Meta: choose target MSE from convergence.
Suboptimality
Empirically, loss vs. optimal in many regimes. The gap comes from failing to coordinate across rounds: if a good-channel round is expected soon, spending less now to preserve energy pays off.
Closing the gap
Methods: dynamic programming on channel-state trajectory (exponential in state space); Lyapunov optimization for stationary channels (tractable, gives competitive bounds); reinforcement learning for non-stationary channels (heuristic, no guarantees).
Open research
Characterize the optimal joint-design for non-stationary channels with energy and fairness constraints. This is an open problem at the intersection of optimization and information theory β the Chapter 18 frontier.