Part 4: Extensions and Applications

Chapter 15: Coded Data Shuffling

Advanced~165 min

Learning Objectives

Define the data shuffling problem in distributed machine learning
State the CommIT Wan-Tuninetti-Caire result: coded shuffling reduces inter-epoch communication by factor $1 + Ks$
Understand the analogy: worker memory replaces cache; shuffled data replaces delivery
Derive the MAN-style coded shuffling scheme
Analyze straggler-tolerant gradient coding as a related coded-computing primitive
Connect coded shuffling to distributed ML system design (parameter server, all-reduce)

Sections

Prerequisites & Notation

nextSpecial

The Data Shuffling Problem

Coded Shuffling (CommIT Wan-Tuninetti-Caire)

Coded Computing and Gradient Coding

Deployment in Distributed ML Systems

Chapter Summary

Special

References & Further Reading

Special

Prerequisites

💬 Discussion

Loading discussions...