Chapter Summary
Chapter Summary
Key Points
- 1.
Distributed gradient aggregation is a matrix-vector product. The aggregate is , where stacks the partial gradients. Coded gradient computation is therefore Chapter 5's polynomial codes specialized to the matrix-vector setting.
- 2.
Tandon et al.'s cyclic gradient coding achieves at storage . Each worker stores partitions, computes partial gradients, and sends a single linear combination. The cyclic structure ensures any responses span , recovering the all-ones combination = the full sum. The recovery threshold matches the information-theoretic lower bound at this storage level.
- 3.
Approximate gradient coding trades exactness for storage. Random sparse encoding with density achieves -relative-error aggregation with per-worker storage — sub-linear in for fixed straggler fraction. The cost is a asymptotic floor in SGD convergence (for strongly-convex losses).
- 4.
Coded gradient computation slots into a larger FL protocol stack. It composes additively with secure aggregation (Chapter 10) and is the foundational layer for ByzSecAgg (Chapter 11). The polynomial-code framework supports this composition via the linearity of all involved primitives.
- 5.
Choose the variant by the constraint. Plain SGD when stragglers are rare; exact coded gradient when storage is abundant and exact aggregation is required; approximate coded gradient when storage is tight and bounded error is acceptable; ByzSecAgg when privacy and Byzantine resilience are also needed.
Looking Ahead
Part II concludes with Chapter 7 (coded data shuffling — a CommIT-tagged result by Wan / Tuninetti / Caire) and Chapter 8 (coded convolution / tensor operations). Both build on the polynomial-code foundation of Chapters 5–6 and extend the coded-computing framework to the data-movement and higher-order tensor problems that arise in modern ML pipelines. Part III then turns to the privacy axis of the golden thread, starting with Chapter 9's federated-learning overview and culminating in Chapter 11's ByzSecAgg result — which directly composes the coded-gradient computation of this chapter with the secret-sharing framework of Chapter 3.