Workers as Encoders, Master as Decoder
From Engineering to Information Theory
Chapter 1 introduced distributed computing as a system-design problem. We now formalize it as a network information theory problem. The move is conceptually the same one Shannon made for point-to-point communication: strip the engineering away, keep the information flow, and ask what is fundamentally achievable regardless of implementation. The framework we build here will be the common language of the whole book — every subsequent chapter will specialize it (adding privacy, Byzantine robustness, a wireless channel, ...) but the skeleton stays the same.
The point is that once workers are encoders and the master is a decoder, the laws that govern source and channel coding (entropy bounds, cut-set converse, Fano's inequality) apply directly. This is what makes the trade-offs we identified in Chapter 1 provable — not merely plausible.
Definition: Distributed Computing as a Multi-Terminal Coding Problem
Distributed Computing as a Multi-Terminal Coding Problem
Let be the input dataset, modeled as a random variable taking values in a finite alphabet (typically uniform over its support so that ). A distributed computing scheme with workers consists of:
- Storage mapping. Each worker stores a function of the dataset, where for some . The storage function is fixed before the inputs are revealed.
- Encoders. Each worker applies a local encoder that takes (and any shared randomness) and produces a message in a finite alphabet.
- Decoder. The master applies a decoder that reconstructs the output .
- Correctness. The output must satisfy , where is the ground-truth function of the dataset.
The communication load is the normalized total message length , where is a reference intermediate-file size. The storage load is . A scheme is -achievable if there exists a sequence of storage mappings, encoders, and a decoder whose block error probability vanishes as the dataset size grows.
Storage Mapping
The (fixed) function that determines what worker stores given the dataset . Chosen before the inputs are revealed; may be stochastic (shared randomness) or deterministic. Uncoded storage uses identity mappings on disjoint partitions; coded storage uses MDS or polynomial codes.
Worker Encoder
Per-worker encoder that produces the message sent to the master. In coded matrix multiplication, is polynomial evaluation at the worker's assigned evaluation point; in secure aggregation, adds pairwise masks to the gradient.
The Distributed-Computing Network Model
How Each Problem Specializes the Model
The same network model captures every problem in Parts II–V, with different constraints and adversary models:
| Problem | Extra constraint / adversary | Extra output constraint |
|---|---|---|
| Coded matrix multiply (Ch 5) | Any messages suffice to decode | |
| Coded shuffling (Ch 7) | Storage fixed across workers | = full shuffled set |
| Secure aggregation (Ch 10) | Server is honest-but-curious | ; server learns nothing else |
| Private info. retrieval (Ch 13) | Any databases cannot learn query | = requested file |
| AirComp / wireless FL (Ch 16) | Multiple-access channel for uplink | ; noise tolerated |
Each row specializes the decoder's correctness constraint, the adversary's view, or the physical channel between workers and master. The mathematical skeleton — encoders, decoder, rate region — is the same throughout. This is what makes the information-theoretic formulation worth the effort: one framework, many specializations.
Theorem: Output Entropy Bounds Aggregate Message Entropy
Any distributed-computing scheme satisfying the correctness condition must satisfy and for any subset of workers whose responses the master uses for decoding, The second inequality is a cut-set bound: the traffic across the "cut" between and the master must be at least the conditional entropy of the output given the other workers.
Decoder produces a (nearly) deterministic function of the messages, so by the data-processing inequality . Applying the chain rule and non-negativity of mutual information yields the stated bound. The cut-set version is the specialization to any subset of messages.
From correctness to mutual information
Correctness gives . By Fano's inequality, , hence . Since is a function of , the data-processing inequality gives .
Apply chain rule
Using , we obtain . Taking the block-length limit gives the stated inequality.
Cut-set specialization
Conditioning on and repeating the argument with replacing gives .
Key Takeaway
The cut-set bound is the structural lower bound throughout this book. Every converse we prove — the coded-computing lower bound in Chapter 5, the secure-aggregation converse in Chapter 10, the PIR capacity upper bound in Chapter 13 — is a carefully-chosen cut-set argument specialized to the problem at hand. It is worth understanding the generic form well.
What the Model Does and Does Not Assume
Our formal model assumes:
- Noiseless, bit-pipe communication between each worker and the master. Parts V (AirComp) and the wireless-FL chapters replace this with a multiple-access channel.
- Full worker compliance with the protocol. Byzantine robustness is added in Chapter 11 by allowing up to workers to deviate arbitrarily.
- Honest-but-curious master (when privacy matters). Stronger adversarial models — e.g., a malicious server colluding with some workers — are relaxations introduced in Chapters 10–12.
The model is simplest when all three assumptions hold; each relaxation adds to the rate region in a predictable way. The central value of the formal framework is precisely this modularity.
Historical Note: From Network Coding to Coded Computing
2000–2018The move from thinking about distributed computing in system-design terms to thinking about it information-theoretically mirrors the arc that network coding took in the early 2000s. Ahlswede, Cai, Li, and Yeung's landmark paper (2000) proved that for multicast networks, coding across messages at intermediate nodes can achieve rates unreachable by routing alone. Their "butterfly network" example is structurally identical to the MapReduce shuffle: an all-to-all exchange in which coded messages at the edges allow the receivers to decode what they need from fewer total transmissions. The coded- computing programme of Part II of this book is, in a precise sense, network coding applied to the distributed-computation setting.
Bit-Pipes Are a Modeling Choice, Not a Law
Modeling worker-to-master links as noiseless bit-pipes is appropriate inside a data center or on a dedicated backhaul, but it is the worst choice for a wireless federated-learning deployment. Over a wireless multiple-access channel, users' signals superpose in the analog domain — the "channel" literally computes for free, modulo receiver noise. This observation is the seed of over-the-air computation (Chapter 16) and is an example of how relaxing one modeling assumption can reveal a fundamentally different rate region. Designers who keep the bit-pipe model when the channel is wireless forfeit a free order of magnitude in spectral efficiency.
- •
Wireless MAC superposition is analog; digital relaying reintroduces the per-user uplink cost
- •
Analog AirComp requires synchronization (subcarrier, CFO, and timing) but no heavy coding
- •
Current AirComp deployments (e.g., NEXT-G FL testbeds) achieve – spectrum savings over orthogonal uplink
Common Mistake: Load Is Not the Same as Rate
Mistake:
Treat the communication load as a Shannon rate in bits per channel use.
Correction:
The load is a normalized communication cost: total bits exchanged, divided by a reference intermediate-file size (or, in other conventions, per input bit). A Shannon rate is bits per channel use and depends on a channel model. For a noiseless bit-pipe the two agree up to units; for a wireless channel the rate is constrained by capacity while the load is not. This distinction will matter in Chapter 16.
Quick Check
In the formal network model of this section, what is the role of the cut-set inequality ?
It is an achievability bound telling us how to design the encoders.
It is a converse: any scheme must carry at least bits across the cut between and the master.
It requires the master to know the sources' joint distribution.
It shows that uncoded routing is always optimal.
Exactly: cut-set arguments give structural lower bounds on network information flow, and this book specializes the argument to each subsequent problem.