Prerequisites & Notation

Before You Begin

Chapter 9 opens Part III by establishing the federated- learning (FL) paradigm that the rest of the book builds on. The prerequisites are the distributed-SGD architecture of Chapter 1, basic SGD convergence intuition, and the gradient-coding framework of Chapter 6 (which is reused for FL straggler handling).

Distributed SGD architecture (Chapter 1 §1.3)(Review ch01)
Self-check: State the per-round communication cost of synchronous distributed SGD in terms of $n$ , $d$ , $b$ .
SGD convergence rates on strongly-convex losses(Review ch05)
Self-check: What is the asymptotic convergence rate of SGD on a $\mu$ -strongly-convex, $L$ -smooth objective?
Gradient inversion attacks (Chapter 1 §1.3)(Review ch01)
Self-check: Why is a plaintext gradient not a privacy-preserving primitive for federated learning?
Quantization and rate-distortion fundamentals(Review ch06)
Self-check: For a $b$ -bit uniform quantizer on a real-valued scalar with Gaussian distribution, what is the approximate distortion?
Coded gradient computation (Chapter 6)(Review ch06)
Self-check: How does $(s, N)$ -gradient coding tolerate $s$ stragglers?

Notation for This Chapter

Chapter 9 introduces FL-specific notation. We use $n$ (lowercase) for the number of users in FL — distinguishing from $N$ (uppercase) used for workers in coded computing (Chapters 5–8). Each FL user has a local dataset $\mathcal{D}_k$ and holds model parameters $\mathbf{w}_t$ after broadcasting.

Symbol	Meaning	Introduced
$n$	Number of users (clients) in FL — lowercase to distinguish from workers $N$	s01
$C \in [0, 1]$	Client participation rate per round — fraction of $n$ users selected	s02
$\mathcal{D}_k$	Local dataset of user $k$ (private, stays on device)	s01
$\mathbf{w}_t$	Global model parameters at round $t$	s02
$\mathbf{w}_t^{(k)}$	User $k$ 's locally-updated model after $E$ local epochs	s02
$E$	Number of local epochs per round	s02
$\mathbf{g}_k$	User $k$ 's local gradient on $\mathcal{D}_k$ (same as Chapter 1)	s01
$F(\mathbf{w}) = (1/n) \sum_k F_k(\mathbf{w})$	Global objective — average of user-local objectives	s01
$\epsilon$	Quantization / sparsification relative error tolerance (Chapter 6 §6.3)	s03
$d$	Model dimensionality (parameters)	s01

← Ch 8 The Federated Learning Paradigm