Chapter Summary
Chapter Summary
Key Points
- 1.
MapReduce identifies the shuffle as the dominant cost. With workers and per-worker storage , the uncoded shuffle sends bits across the network — essentially the entire intermediate file. This is the cost that coded shuffling (Chapter 7) is designed to attack, and the first instance of the computation / communication trade-off that runs through Part II.
- 2.
The synchronous straggler penalty grows logarithmically with . For i.i.d. exponential service times, the expected wait for the slowest of workers is . Tolerating a constant fraction of stragglers — only needing any responses — converts the logarithmic tail into a constant , motivating the coded-computing programme of Chapters 5–8.
- 3.
Distributed SGD is dominated by gradient traffic, not data movement. The per-round communication cost is bits for workers, model size , and bits per scalar. For modern deep networks this is orders of magnitude larger than the dataset, making gradient compression, secure aggregation, and over-the-air computation engineering necessities.
- 4.
Gradient inversion attacks break the "FL is private by default" myth. The gradient contains enough information to reconstruct individual training samples. Privacy in federated learning must come from a cryptographic or information-theoretic protocol on top, not from the architecture alone.
- 5.
Three challenges, one thread. Communication efficiency, robustness to stragglers / Byzantine workers, and privacy are coupled trade-offs. Every chapter of this book quantifies one edge of this triangle and identifies the optimal achievable operating point. The formal information-theoretic framework that makes this precise begins in Chapter 2.
Looking Ahead
Chapter 2 translates the system-designer language of loads, latencies, and threat models into the formal vocabulary of network information theory. Workers become encoders, the master becomes a decoder, and , , , become rates in a coding problem. Once the framework is in place, the results of Parts II–V can be stated as achievability-converse theorems with sharp operational meaning.