Exercises

ex-ch09-01

Easy

Distinguish between federated learning and data-center distributed SGD. Give three differences.

ex-ch09-02

Easy

A FedAvg deployment has n=1000n = 1000 users, C=0.1C = 0.1, E=5E = 5 local epochs, model size d=107d = 10^7 scalars, and T=200T = 200 rounds. Compute the aggregate uplink traffic.

ex-ch09-03

Easy

Why does FedAvg save communication compared to vanilla distributed SGD? What is the savings factor?

ex-ch09-04

Easy

Why is 8-bit quantization not considered a privacy mechanism?

ex-ch09-05

Medium

State and briefly prove: for i.i.d. user data, FedAvg with EE local epochs achieves O(1/T)O(1/T) convergence on a μ\mu-strongly-convex objective.

ex-ch09-06

Medium

Derive the variance floor of stochastic bb-bit quantization on a gradient scalar with bounded magnitude.

ex-ch09-07

Medium

Implement pseudocode for Top-KK SGD with error feedback. Explain why error feedback is needed for convergence.

ex-ch09-08

Medium

For a model with d=108d = 10^8 parameters, compute the communication savings from (a) 4-bit quantization, (b) top-1% sparsification, (c) both combined.

ex-ch09-09

Medium

Explain the DLG (Deep Leakage from Gradients) attack and why it succeeds on single-sample gradients.

ex-ch09-10

Medium

Explain why federated learning's "data stays on device" architecture is not sufficient for privacy.

ex-ch09-11

Hard

Prove (informally) that FedAvg on non-IID data with large EE converges to a biased fixed point. Characterize the bias.

ex-ch09-12

Hard

Compute the aggregate uplink traffic of a realistic FL deployment: n=106n = 10^6 users, C=0.01C = 0.01, E=5E = 5, model size d=109d = 10^9 (a small foundation model), 8-bit quantization, top-1% sparsification, T=100T = 100 rounds.

ex-ch09-13

Hard

Design a FL experiment to empirically verify the gradient-inversion attack. Specify the dataset, architecture, and evaluation metric.

ex-ch09-14

Hard

Analyze the tension between compression and privacy: argue that aggressive compression can actually hurt privacy because it makes the gradient distinguish training samples more rather than less.

ex-ch09-15

Challenge

Open problem. Derive an information-theoretic lower bound on the leakage of gradients in FL as a function of model architecture (depth, width), loss function, and batch size. Characterize when gradient-inversion reconstruction becomes information-theoretically infeasible.