Chapter Summary

Chapter Summary

Key Points

  • 1.

    Distributed gradient aggregation is a matrix-vector product. The aggregate mathbfG=sumkmathbfgk\\mathbf{G} = \\sum_k \\mathbf{g}_k is mathbfGtextstackmathbf1N\\mathbf{G}_{\\text{stack}} \\mathbf{1}_N, where mathbfGtextstack\\mathbf{G}_{\\text{stack}} stacks the partial gradients. Coded gradient computation is therefore Chapter 5's polynomial codes specialized to the matrix-vector setting.

  • 2.

    Tandon et al.'s cyclic gradient coding achieves K=NsK = N - s at storage (s+1)/N(s + 1)/N. Each worker stores s+1s + 1 partitions, computes s+1s + 1 partial gradients, and sends a single linear combination. The cyclic structure ensures any NsN - s responses span mathbbRN\\mathbb{R}^N, recovering the all-ones combination = the full sum. The recovery threshold matches the information-theoretic lower bound at this storage level.

  • 3.

    Approximate gradient coding trades exactness for storage. Random sparse encoding with rho=Theta(frac1(Ns)epsilon)\\rho = \\Theta(\\frac{1}{(N-s)\\epsilon}) density achieves epsilon\\epsilon-relative-error aggregation with per-worker storage Theta(logN/epsilon)\\Theta(\\log N / \\epsilon) — sub-linear in NN for fixed straggler fraction. The cost is a mathcalO(epsilon)\\mathcal{O}(\\epsilon) asymptotic floor in SGD convergence (for strongly-convex losses).

  • 4.

    Coded gradient computation slots into a larger FL protocol stack. It composes additively with secure aggregation (Chapter 10) and is the foundational layer for ByzSecAgg (Chapter 11). The polynomial-code framework supports this composition via the linearity of all involved primitives.

  • 5.

    Choose the variant by the constraint. Plain SGD when stragglers are rare; exact coded gradient when storage is abundant and exact aggregation is required; approximate coded gradient when storage is tight and bounded error is acceptable; ByzSecAgg when privacy and Byzantine resilience are also needed.

Looking Ahead

Part II concludes with Chapter 7 (coded data shuffling — a CommIT-tagged result by Wan / Tuninetti / Caire) and Chapter 8 (coded convolution / tensor operations). Both build on the polynomial-code foundation of Chapters 5–6 and extend the coded-computing framework to the data-movement and higher-order tensor problems that arise in modern ML pipelines. Part III then turns to the privacy axis of the golden thread, starting with Chapter 9's federated-learning overview and culminating in Chapter 11's ByzSecAgg result — which directly composes the coded-gradient computation of this chapter with the secret-sharing framework of Chapter 3.