Prerequisites & Notation
Before You Begin
Chapter 6 specializes the polynomial-code framework of Chapter 5 to the gradient-aggregation problem of distributed SGD. The prerequisites are the matrix-multiplication background of Chapter 5, the basics of stochastic gradient descent (Chapter 1 §1.3), and standard convex/non-convex optimization vocabulary. Readers comfortable with the polynomial-code construction will find the gradient-coding extension immediate.
- Polynomial codes for matrix multiplication (Chapter 5)(Review ch05)
Self-check: State the recovery threshold of the polynomial code for with partitions.
- Distributed SGD architecture (Chapter 1 §1.3)(Review ch01)
Self-check: For a model with parameters and users, what is the per-round aggregate uplink cost?
- Smooth (Lipschitz-gradient) and convex / strongly-convex objectives(Review ch05)
Self-check: Define the convergence rate of SGD on a -strongly convex, -smooth objective.
- Order statistics for stragglers (Chapter 1 §1.2)(Review ch01)
Self-check: Why does grow only logarithmically when but converges to a constant when ?
Notation for This Chapter
Gradient-coding notation. We write for the number of users (matching the federated-learning conventions of Chapter 1) but use when the chapter inherits Chapter 5's matrix- multiplication framing. Each worker holds a fraction of the data; the encoding distributes "redundancy" across workers so that any responses suffice to reconstruct the full gradient sum.
| Symbol | Meaning | Introduced |
|---|---|---|
| Model parameters at iteration | s01 | |
| Local gradient computed by worker on its data partition | s01 | |
| Aggregated gradient — the master's target | s01 | |
| Number of workers | s01 | |
| Recovery threshold (any responses suffice) | s02 | |
| Per-worker straggler tolerance: each worker runs partitions | s02 | |
| Encoding matrix (assigns linear combos of partial gradients to workers) | s02 | |
| Learning rate | s01 | |
| Number of training rounds | s01 | |
| Model dimensionality | s01 |