References & Further Reading

References

  1. H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, Communication-Efficient Learning of Deep Networks from Decentralized Data, 2017. [Link]

    The headline paper introducing federated learning and the FedAvg algorithm. Foundational reference for this chapter.

  2. X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang, On the Convergence of FedAvg on Non-IID Data, 2020. [Link]

    Rigorous convergence analysis of FedAvg on both IID and non-IID data. Main reference for §9.2's theorems.

  3. P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, and others, Advances and Open Problems in Federated Learning, 2021

    The definitive survey of federated learning. Essential background reading for Parts III–V of this book.

  4. K. Bonawitz, H. Eichner, W. Grieskamp, and others, Towards Federated Learning at Scale: System Design, 2019. [Link]

    Google's engineering perspective on production federated learning. Describes the Gboard deployment and the constraints that shape algorithmic choices.

  5. J. Konečný, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon, Federated Learning: Strategies for Improving Communication Efficiency, 2016. [Link]

    Pre-FedAvg paper introducing quantization, structured updates, and random masking for communication-efficient FL. Primary reference for §9.3.

  6. D. Alistarh, D. Grubic, J. Li, R. Tomioka, and M. Vojnovic, QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding, 2017. [Link]

    Rigorous convergence analysis of stochastic gradient quantization. Basis for §9.3's quantization theorem.

  7. S. U. Stich, J.-B. Cordonnier, and M. Jaggi, Sparsified SGD with Memory, 2018. [Link]

    Top-$K$ sparsification with error feedback. Proves convergence-preserving compression. Main reference for §9.3's sparsification treatment.

  8. L. Zhu, Z. Liu, and S. Han, Deep Leakage from Gradients, 2019. [Link]

    The landmark gradient-inversion paper. Should be read in full before trusting any "FL is private by design" claim. Primary reference for §9.4.

  9. H. Yin, A. Mallya, A. Vahdat, J. M. Alvarez, J. Kautz, and P. Molchanov, See through Gradients: Image Batch Recovery via GradInversion, 2021. [Link]

    Extends gradient inversion to batch sizes up to 48 on ImageNet. Shows that small batches do not protect privacy.

  10. K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan, S. Patel, D. Ramage, A. Segal, and K. Seth, Practical Secure Aggregation for Privacy-Preserving Machine Learning, 2017

    Forward reference: the secure-aggregation protocol that Chapter 10 develops. First read of this paper shapes how one thinks about FL privacy.

  11. T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith, Federated Optimization in Heterogeneous Networks, 2020. [Link]

    FedProx — a generalization of FedAvg that handles non-IID data more robustly. Addresses the client drift issue of §9.2.

  12. S. P. Karimireddy, S. Kale, M. Mohri, S. J. Reddi, S. U. Stich, and A. T. Suresh, SCAFFOLD: Stochastic Controlled Averaging for Federated Learning, 2020. [Link]

    Variance-reduced FL algorithm that mitigates client drift. State-of-the-art for convergence on non-IID data.

Further Reading

Resources for going deeper into federated learning and its challenges.

  • Comprehensive FL survey

    Kairouz, McMahan, et al., *Advances and Open Problems in Federated Learning*, FnT-ML 2021

    The definitive survey — essential reading before Parts III–V of this book. Covers convergence, privacy, fairness, personalization, and heterogeneity at book-length depth.

  • Gradient-inversion attacks — state of the art

    Geiping et al., *Inverting Gradients — How Easy Is It to Break Privacy in Federated Learning?*, NeurIPS 2020

    Extends gradient inversion to longer-trained models and ImageNet-scale inputs. Cements the case that plaintext FL does not provide information-theoretic privacy.

  • Non-IID federated learning

    Zhao et al., *Federated Learning with Non-IID Data*, arXiv:1806.00582, 2018

    Early empirical study demonstrating FedAvg degradation on non-IID splits. Relevant context for the §9.2 non-IID convergence theorem.

  • System-level FL engineering

    Bonawitz et al., *Towards Federated Learning at Scale: System Design*, MLSys 2019

    Google's production-engineering view. Complements this chapter's theoretical treatment with concrete engineering constraints.