Prerequisites & Notation

Before You Begin

Chapter 9 opens Part III by establishing the federated- learning (FL) paradigm that the rest of the book builds on. The prerequisites are the distributed-SGD architecture of Chapter 1, basic SGD convergence intuition, and the gradient-coding framework of Chapter 6 (which is reused for FL straggler handling).

  • Distributed SGD architecture (Chapter 1 §1.3)(Review ch01)

    Self-check: State the per-round communication cost of synchronous distributed SGD in terms of nn, dd, bb.

  • SGD convergence rates on strongly-convex losses(Review ch05)

    Self-check: What is the asymptotic convergence rate of SGD on a μ\mu-strongly-convex, LL-smooth objective?

  • Gradient inversion attacks (Chapter 1 §1.3)(Review ch01)

    Self-check: Why is a plaintext gradient not a privacy-preserving primitive for federated learning?

  • Quantization and rate-distortion fundamentals(Review ch06)

    Self-check: For a bb-bit uniform quantizer on a real-valued scalar with Gaussian distribution, what is the approximate distortion?

  • Coded gradient computation (Chapter 6)(Review ch06)

    Self-check: How does (s,N)(s, N)-gradient coding tolerate ss stragglers?

Notation for This Chapter

Chapter 9 introduces FL-specific notation. We use nn (lowercase) for the number of users in FL — distinguishing from NN (uppercase) used for workers in coded computing (Chapters 5–8). Each FL user has a local dataset Dk\mathcal{D}_k and holds model parameters wt\mathbf{w}_t after broadcasting.

SymbolMeaningIntroduced
nnNumber of users (clients) in FL — lowercase to distinguish from workers NNs01
C[0,1]C \in [0, 1]Client participation rate per round — fraction of nn users selecteds02
Dk\mathcal{D}_kLocal dataset of user kk (private, stays on device)s01
wt\mathbf{w}_tGlobal model parameters at round tts02
wt(k)\mathbf{w}_t^{(k)}User kk's locally-updated model after EE local epochss02
EENumber of local epochs per rounds02
gk\mathbf{g}_kUser kk's local gradient on Dk\mathcal{D}_k (same as Chapter 1)s01
F(w)=(1/n)kFk(w)F(\mathbf{w}) = (1/n) \sum_k F_k(\mathbf{w})Global objective — average of user-local objectivess01
ϵ\epsilonQuantization / sparsification relative error tolerance (Chapter 6 §6.3)s03
ddModel dimensionality (parameters)s01