Chapter Summary

Chapter Summary

Key Points

  • 1.

    MapReduce identifies the shuffle as the dominant cost. With NN workers and per-worker storage μ=1/N\mu = 1/N, the uncoded shuffle sends V(11/N)V(1 - 1/N) bits across the network — essentially the entire intermediate file. This is the cost that coded shuffling (Chapter 7) is designed to attack, and the first instance of the computation / communication trade-off that runs through Part II.

  • 2.

    The synchronous straggler penalty grows logarithmically with NN. For i.i.d. exponential service times, the expected wait for the slowest of NN workers is HN/λlnN/λH_N / \lambda \sim \ln N / \lambda. Tolerating a constant fraction of stragglers — only needing any K=αNK = \alpha N responses — converts the logarithmic tail into a constant ln(1α)/λ-\ln(1-\alpha)/\lambda, motivating the coded-computing programme of Chapters 5–8.

  • 3.

    Distributed SGD is dominated by gradient traffic, not data movement. The per-round communication cost is 2ndb2 n d b bits for nn workers, model size dd, and bb bits per scalar. For modern deep networks this is orders of magnitude larger than the dataset, making gradient compression, secure aggregation, and over-the-air computation engineering necessities.

  • 4.

    Gradient inversion attacks break the "FL is private by default" myth. The gradient gk=(w;ξk)\mathbf{g}_k = \nabla \ell(\mathbf{w}; \xi_k) contains enough information to reconstruct individual training samples. Privacy in federated learning must come from a cryptographic or information-theoretic protocol on top, not from the architecture alone.

  • 5.

    Three challenges, one thread. Communication efficiency, robustness to stragglers / Byzantine workers, and privacy are coupled trade-offs. Every chapter of this book quantifies one edge of this triangle and identifies the optimal achievable operating point. The formal information-theoretic framework that makes this precise begins in Chapter 2.

Looking Ahead

Chapter 2 translates the system-designer language of loads, latencies, and threat models into the formal vocabulary of network information theory. Workers become encoders, the master becomes a decoder, and μ\mu, Δ\Delta, KK, BB become rates in a coding problem. Once the framework is in place, the results of Parts II–V can be stated as achievability-converse theorems with sharp operational meaning.