References & Further Reading
References
- J. Dean and S. Ghemawat, MapReduce: Simplified Data Processing on Large Clusters, 2008
The paper that made MapReduce a household name in distributed systems. Presents the runtime design that underpins every discussion of shuffle cost in this chapter.
- S. Li, M. A. Maddah-Ali, Q. Yu, and A. S. Avestimehr, A Fundamental Tradeoff Between Computation and Communication in Distributed Computing, 2018
The seminal paper that proved the exact computation–communication tradeoff for coded distributed computing. Our Section 1.1 uses the uncoded baseline they derived as a foil for Chapter 7.
- K. Lee, M. Lam, R. Pedarsani, D. Papailiopoulos, and K. Ramchandran, Speeding Up Distributed Machine Learning Using Codes, 2018
The definitive paper introducing coded computation for straggler mitigation in distributed ML. Section 1.2's latency calculations follow their formulation.
- J. Dean and L. A. Barroso, The Tail at Scale, 2013
Empirical motivation for why the 99.9th-percentile latency — not the median — is the quantity worth optimizing in distributed systems. This is the production-engineering perspective behind the straggler discussion in Section 1.2.
- M. Li, D. G. Andersen, J. W. Park, A. J. Smola, A. Ahmed, V. Josifovski, J. Long, E. J. Shekita, and B.-Y. Su, Scaling Distributed Machine Learning with the Parameter Server, 2014
Canonical reference for the parameter-server architecture used throughout Section 1.3. Quantifies the per-round communication cost that motivates gradient compression and secure aggregation.
- L. Zhu, Z. Liu, and S. Han, Deep Leakage from Gradients, 2019. [Link]
The first influential demonstration that plaintext gradients reconstruct individual training samples. This paper single- handedly moved the federated-learning community to adopt secure aggregation as a baseline privacy requirement.
- J. Konečný, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon, Federated Learning: Strategies for Improving Communication Efficiency, 2016. [Link]
The reference for FL gradient compression strategies (quantization, structured updates, sparsification). Motivates why compression is orthogonal to privacy in Section 1.3.
- K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan, S. Patel, D. Ramage, A. Segal, and K. Seth, Practical Secure Aggregation for Privacy-Preserving Machine Learning, 2017
The production-scale secure-aggregation protocol used in Google Gboard and many follow-on FL systems. Our Section 1.4 lower bound on privacy-preserving aggregation follows the formalization they introduced.
- H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, Communication-Efficient Learning of Deep Networks from Decentralized Data, 2017. [Link]
The FedAvg paper that defined federated learning as a research field. Frames the privacy / efficiency / convergence trade-offs that motivate Parts III–V of this book.
- P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings, and others, Advances and Open Problems in Federated Learning, 2021
The definitive survey of federated learning. Worth reading end-to-end before tackling Part III — it frames the entire landscape and identifies the problems that are addressed formally later in this book.
Further Reading
Additional resources for building intuition on the system and information-theoretic aspects of distributed computing.
Coding for distributed computing — a full survey
S. Li and A. S. Avestimehr, *Coded Computing: Mitigating Fundamental Bottlenecks in Large-Scale Distributed Computing and Machine Learning*, Foundations and Trends in Communications, 2020
The most complete survey of the coded-computing literature at the time of writing — essential reading before the technical results of Chapters 5–8.
A practical perspective on distributed-training systems
Book SCI-PYTHON, Chapter 13 (Performance) and Chapter 14 (Acceleration)
The implementation-level view that complements this book's information-theoretic perspective. Shows how modern frameworks (PyTorch Distributed, TF, JAX) expose the aggregation and compression knobs discussed in Section 1.3.
Gradient-inversion attacks — state of the art
H. Yin et al., *See through Gradients: Image Batch Recovery via GradInversion*, CVPR 2021
Walks through the state of the art in gradient inversion, demonstrating attacks on batch sizes of 48 and on ImageNet- scale inputs. Cements why the privacy of Section 1.3 cannot come for free from architecture alone.
Multi-user information theory — the formal language of Chapter 2
Book ITA, Chapters 14 (MAC), 15 (Broadcast), 21 (Network Coding)
Chapter 2 of this book re-frames distributed computing as a network-information-theory problem. Readers who want the general multi-user capacity theorems should work through the cross-referenced ITA chapters first.