Chapter Summary

Key Points

1.
Distributed gradient aggregation is a matrix-vector product. The aggregate $\\mathbf{G} = \\sum_k \\mathbf{g}_k$ is $\\mathbf{G}_{\\text{stack}} \\mathbf{1}_N$ , where $\\mathbf{G}_{\\text{stack}}$ stacks the partial gradients. Coded gradient computation is therefore Chapter 5's polynomial codes specialized to the matrix-vector setting.
2.
Tandon et al.'s cyclic gradient coding achieves $K = N - s$ at storage $(s + 1)/N$ . Each worker stores $s + 1$ partitions, computes $s + 1$ partial gradients, and sends a single linear combination. The cyclic structure ensures any $N - s$ responses span $\\mathbb{R}^N$ , recovering the all-ones combination = the full sum. The recovery threshold matches the information-theoretic lower bound at this storage level.
3.
Approximate gradient coding trades exactness for storage. Random sparse encoding with $\\rho = \\Theta(\\frac{1}{(N-s)\\epsilon})$ density achieves $\\epsilon$ -relative-error aggregation with per-worker storage $\\Theta(\\log N / \\epsilon)$ — sub-linear in $N$ for fixed straggler fraction. The cost is a $\\mathcal{O}(\\epsilon)$ asymptotic floor in SGD convergence (for strongly-convex losses).
4.
Coded gradient computation slots into a larger FL protocol stack. It composes additively with secure aggregation (Chapter 10) and is the foundational layer for ByzSecAgg (Chapter 11). The polynomial-code framework supports this composition via the linearity of all involved primitives.
5.
Choose the variant by the constraint. Plain SGD when stragglers are rare; exact coded gradient when storage is abundant and exact aggregation is required; approximate coded gradient when storage is tight and bounded error is acceptable; ByzSecAgg when privacy and Byzantine resilience are also needed.

Looking Ahead

Part II concludes with Chapter 7 (coded data shuffling — a CommIT-tagged result by Wan / Tuninetti / Caire) and Chapter 8 (coded convolution / tensor operations). Both build on the polynomial-code foundation of Chapters 5–6 and extend the coded-computing framework to the data-movement and higher-order tensor problems that arise in modern ML pipelines. Part III then turns to the privacy axis of the golden thread, starting with Chapter 9's federated-learning overview and culminating in Chapter 11's ByzSecAgg result — which directly composes the coded-gradient computation of this chapter with the secret-sharing framework of Chapter 3.

System Considerations and Federated Learning Exercises