References & Further Reading

References

J. Dean and S. Ghemawat, MapReduce: Simplified Data Processing on Large Clusters, 2008
The paper that made MapReduce a household name in distributed systems. Presents the runtime design that underpins every discussion of shuffle cost in this chapter.
S. Li, M. A. Maddah-Ali, Q. Yu, and A. S. Avestimehr, A Fundamental Tradeoff Between Computation and Communication in Distributed Computing, 2018
The seminal paper that proved the exact computation–communication tradeoff for coded distributed computing. Our Section 1.1 uses the uncoded baseline they derived as a foil for Chapter 7.
K. Lee, M. Lam, R. Pedarsani, D. Papailiopoulos, and K. Ramchandran, Speeding Up Distributed Machine Learning Using Codes, 2018
The definitive paper introducing coded computation for straggler mitigation in distributed ML. Section 1.2's latency calculations follow their formulation.
J. Dean and L. A. Barroso, The Tail at Scale, 2013
Empirical motivation for why the 99.9th-percentile latency — not the median — is the quantity worth optimizing in distributed systems. This is the production-engineering perspective behind the straggler discussion in Section 1.2.
M. Li, D. G. Andersen, J. W. Park, A. J. Smola, A. Ahmed, V. Josifovski, J. Long, E. J. Shekita, and B.-Y. Su, Scaling Distributed Machine Learning with the Parameter Server, 2014
Canonical reference for the parameter-server architecture used throughout Section 1.3. Quantifies the per-round communication cost that motivates gradient compression and secure aggregation.
L. Zhu, Z. Liu, and S. Han, Deep Leakage from Gradients, 2019. [Link]
The first influential demonstration that plaintext gradients reconstruct individual training samples. This paper single- handedly moved the federated-learning community to adopt secure aggregation as a baseline privacy requirement.
J. Konečný, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon, Federated Learning: Strategies for Improving Communication Efficiency, 2016. [Link]
The reference for FL gradient compression strategies (quantization, structured updates, sparsification). Motivates why compression is orthogonal to privacy in Section 1.3.
K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan, S. Patel, D. Ramage, A. Segal, and K. Seth, Practical Secure Aggregation for Privacy-Preserving Machine Learning, 2017
The production-scale secure-aggregation protocol used in Google Gboard and many follow-on FL systems. Our Section 1.4 lower bound on privacy-preserving aggregation follows the formalization they introduced.
H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, Communication-Efficient Learning of Deep Networks from Decentralized Data, 2017. [Link]
The FedAvg paper that defined federated learning as a research field. Frames the privacy / efficiency / convergence trade-offs that motivate Parts III–V of this book.
P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings, and others, Advances and Open Problems in Federated Learning, 2021
The definitive survey of federated learning. Worth reading end-to-end before tackling Part III — it frames the entire landscape and identifies the problems that are addressed formally later in this book.