Prerequisites & Notation

Before You Begin

Chapter 6 specializes the polynomial-code framework of Chapter 5 to the gradient-aggregation problem of distributed SGD. The prerequisites are the matrix-multiplication background of Chapter 5, the basics of stochastic gradient descent (Chapter 1 §1.3), and standard convex/non-convex optimization vocabulary. Readers comfortable with the polynomial-code construction will find the gradient-coding extension immediate.

  • Polynomial codes for matrix multiplication (Chapter 5)(Review ch05)

    Self-check: State the recovery threshold KK of the polynomial code for ATB\mathbf{A}^T \mathbf{B} with (p,q)(p, q) partitions.

  • Distributed SGD architecture (Chapter 1 §1.3)(Review ch01)

    Self-check: For a model with dd parameters and nn users, what is the per-round aggregate uplink cost?

  • Smooth (Lipschitz-gradient) and convex / strongly-convex objectives(Review ch05)

    Self-check: Define the convergence rate of SGD on a μ\mu-strongly convex, LL-smooth objective.

  • Order statistics for stragglers (Chapter 1 §1.2)(Review ch01)

    Self-check: Why does E[T(K)]\mathbb{E}[T_{(K)}] grow only logarithmically when K=NK = N but converges to a constant when K=αNK = \alpha N?

Notation for This Chapter

Gradient-coding notation. We write nn for the number of users (matching the federated-learning conventions of Chapter 1) but use NN when the chapter inherits Chapter 5's matrix- multiplication framing. Each worker holds a fraction 1/N1/N of the data; the encoding distributes "redundancy" across workers so that any KK responses suffice to reconstruct the full gradient sum.

SymbolMeaningIntroduced
wtRd\mathbf{w}_t \in \mathbb{R}^dModel parameters at iteration tts01
gk(wt)\mathbf{g}_k(\mathbf{w}_t)Local gradient computed by worker kk on its data partitions01
Gt=kgk\mathbf{G}_t = \sum_k \mathbf{g}_kAggregated gradient — the master's targets01
NNNumber of workerss01
KKRecovery threshold (any KK responses suffice)s02
ssPer-worker straggler tolerance: each worker runs ss partitionss02
BRN×N\mathbf{B} \in \mathbb{R}^{N \times N}Encoding matrix (assigns linear combos of partial gradients to workers)s02
η\etaLearning rates01
TTNumber of training roundss01
ddModel dimensionalitys01