Ferkans — Interactive Telecom Tutor

From Engineering to Information Theory

Chapter 1 introduced distributed computing as a system-design problem. We now formalize it as a network information theory problem. The move is conceptually the same one Shannon made for point-to-point communication: strip the engineering away, keep the information flow, and ask what is fundamentally achievable regardless of implementation. The framework we build here will be the common language of the whole book — every subsequent chapter will specialize it (adding privacy, Byzantine robustness, a wireless channel, ...) but the skeleton stays the same.

The point is that once workers are encoders and the master is a decoder, the laws that govern source and channel coding (entropy bounds, cut-set converse, Fano's inequality) apply directly. This is what makes the trade-offs we identified in Chapter 1 provable — not merely plausible.

Definition:
Distributed Computing as a Multi-Terminal Coding Problem

Let $\mathcal{D}$ be the input dataset, modeled as a random variable taking values in a finite alphabet (typically uniform over its support so that $H(\mathcal{D}) = \log |\mathrm{supp}(\mathcal{D})|$ ). A distributed computing scheme with $N$ workers consists of:

Storage mapping. Each worker $k$ stores a function $\mathcal{D}_k = \varphi_k(\mathcal{D})$ of the dataset, where $|\mathcal{D}_k| \leq \mu \cdot |\mathcal{D}|$ for some $\mu \in [0, 1]$ . The storage function $\varphi_k$ is fixed before the inputs are revealed.
Encoders. Each worker applies a local encoder $E_k$ that takes $\mathcal{D}_k$ (and any shared randomness) and produces a message $X_k = E_k(\mathcal{D}_k)$ in a finite alphabet.
Decoder. The master applies a decoder $\psi$ that reconstructs the output $\hat Y = \psi(X_1, X_2, \ldots, X_N)$ .
Correctness. The output must satisfy $\Pr[\hat Y \neq Y] \to 0$ , where $Y = f(\mathcal{D})$ is the ground-truth function of the dataset.

The communication load is the normalized total message length $\Delta = \sum_k \log|\mathrm{supp}(X_k)| / |V|$ , where $|V|$ is a reference intermediate-file size. The storage load is $\mu$ . A scheme is $(\mu, \Delta)$ -achievable if there exists a sequence of storage mappings, encoders, and a decoder whose block error probability vanishes as the dataset size grows.

,

Storage Mapping

The (fixed) function $\varphi_k$ that determines what worker $k$ stores given the dataset $\mathcal{D}$ . Chosen before the inputs are revealed; may be stochastic (shared randomness) or deterministic. Uncoded storage uses identity mappings on disjoint partitions; coded storage uses MDS or polynomial codes.

Worker Encoder

Per-worker encoder $E_k: \mathcal{D}_k \mapsto X_k$ that produces the message sent to the master. In coded matrix multiplication, $E_k$ is polynomial evaluation at the worker's assigned evaluation point; in secure aggregation, $E_k$ adds pairwise masks to the gradient.

The Distributed-Computing Network Model — Dataset $\mathcal{D}$ is stored across $N$ workers via storage mappings $\varphi_k$ . Each worker runs an encoder $E_k$ and sends message $X_k$ to the master, which runs a decoder $\psi$ to produce $\hat Y$ . Computation load $\mu$ bounds the storage; communication load $\Delta$ bounds the total message traffic; recovery threshold $K$ bounds the number of responses the master waits for.

How Each Problem Specializes the Model

The same network model captures every problem in Parts II–V, with different constraints and adversary models:

Problem	Extra constraint / adversary	Extra output constraint
Coded matrix multiply (Ch 5)	Any $K$ messages suffice to decode	$Y = \mathbf{A}^T \mathbf{B}$
Coded shuffling (Ch 7)	Storage $\mu$ fixed across workers	$Y$ = full shuffled set
Secure aggregation (Ch 10)	Server is honest-but-curious	$Y = \sum_k \mathbf{g}_k$ ; server learns nothing else
Private info. retrieval (Ch 13)	Any $T$ databases cannot learn query	$Y$ = requested file
AirComp / wireless FL (Ch 16)	Multiple-access channel for uplink	$Y = \sum_k f_k(x_k)$ ; noise tolerated

Each row specializes the decoder's correctness constraint, the adversary's view, or the physical channel between workers and master. The mathematical skeleton — encoders, decoder, rate region — is the same throughout. This is what makes the information-theoretic formulation worth the effort: one framework, many specializations.

Theorem: Output Entropy Bounds Aggregate Message Entropy

Any distributed-computing scheme satisfying the correctness condition $\Pr[\hat Y \neq Y] \to 0$ must satisfy $\sum_{k=1}^N H(X_k) \;\geq\; H(Y),$ and for any subset $\mathcal{S} \subseteq \{1, \ldots, N\}$ of workers whose responses the master uses for decoding, $H(X_{\mathcal{S}}) \;\geq\; H(Y \mid X_{\mathcal{S}^c}).$ The second inequality is a cut-set bound: the traffic across the "cut" between $\mathcal{S}$ and the master must be at least the conditional entropy of the output given the other workers.

Decoder $\psi$ produces a (nearly) deterministic function of the messages, so by the data-processing inequality $I(Y; X_1, \ldots, X_N) \geq H(Y) - o(1)$ . Applying the chain rule and non-negativity of mutual information yields the stated bound. The cut-set version is the specialization to any subset of messages.

Proof

From correctness to mutual information

Correctness gives $\Pr[\hat Y \neq Y] \to 0$ . By Fano's inequality, $H(Y \mid \hat Y) = o(1)$ , hence $I(Y; \hat Y) = H(Y) - o(1)$ . Since $\hat Y$ is a function of $X_1, \ldots, X_N$ , the data-processing inequality gives $I(Y; X_1, \ldots, X_N) \geq H(Y) - o(1)$ .

Apply chain rule

Using $I(Y; X_1, \ldots, X_N) \leq H(X_1, \ldots, X_N) \leq \sum_k H(X_k)$ , we obtain $\sum_k H(X_k) \geq H(Y) - o(1)$ . Taking the block-length limit gives the stated inequality.

Cut-set specialization

Conditioning on $X_{\mathcal{S}^c}$ and repeating the argument with $H(Y \mid X_{\mathcal{S}^c})$ replacing $H(Y)$ gives $H(X_{\mathcal{S}}) \geq H(Y \mid X_{\mathcal{S}^c})$ . $\blacksquare$

Key Takeaway

The cut-set bound is the structural lower bound throughout this book. Every converse we prove — the coded-computing lower bound in Chapter 5, the secure-aggregation converse in Chapter 10, the PIR capacity upper bound in Chapter 13 — is a carefully-chosen cut-set argument specialized to the problem at hand. It is worth understanding the generic form well.

What the Model Does and Does Not Assume

Our formal model assumes:

Noiseless, bit-pipe communication between each worker and the master. Parts V (AirComp) and the wireless-FL chapters replace this with a multiple-access channel.
Full worker compliance with the protocol. Byzantine robustness is added in Chapter 11 by allowing up to $B$ workers to deviate arbitrarily.
Honest-but-curious master (when privacy matters). Stronger adversarial models — e.g., a malicious server colluding with some workers — are relaxations introduced in Chapters 10–12.

The model is simplest when all three assumptions hold; each relaxation adds to the rate region in a predictable way. The central value of the formal framework is precisely this modularity.

Historical Note: From Network Coding to Coded Computing

2000–2018

The move from thinking about distributed computing in system-design terms to thinking about it information-theoretically mirrors the arc that network coding took in the early 2000s. Ahlswede, Cai, Li, and Yeung's landmark paper (2000) proved that for multicast networks, coding across messages at intermediate nodes can achieve rates unreachable by routing alone. Their "butterfly network" example is structurally identical to the MapReduce shuffle: an all-to-all exchange in which coded messages at the edges allow the receivers to decode what they need from fewer total transmissions. The coded- computing programme of Part II of this book is, in a precise sense, network coding applied to the distributed-computation setting.

🔧Engineering Note

Bit-Pipes Are a Modeling Choice, Not a Law

Modeling worker-to-master links as noiseless bit-pipes is appropriate inside a data center or on a dedicated backhaul, but it is the worst choice for a wireless federated-learning deployment. Over a wireless multiple-access channel, users' signals superpose in the analog domain — the "channel" literally computes $\sum_k X_k$ for free, modulo receiver noise. This observation is the seed of over-the-air computation (Chapter 16) and is an example of how relaxing one modeling assumption can reveal a fundamentally different rate region. Designers who keep the bit-pipe model when the channel is wireless forfeit a free order of magnitude in spectral efficiency.

Practical Constraints

•
Wireless MAC superposition is analog; digital relaying reintroduces the per-user uplink cost
•
Analog AirComp requires synchronization (subcarrier, CFO, and timing) but no heavy coding
•
Current AirComp deployments (e.g., NEXT-G FL testbeds) achieve $5$ – $20\times$ spectrum savings over orthogonal uplink

📋 Ref: 3GPP TR 38.843 (AI/ML for NR air-interface)

Common Mistake: Load $\Delta$ Is Not the Same as Rate $R$

Mistake:

Treat the communication load $\Delta$ as a Shannon rate in bits per channel use.

Correction:

The load $\Delta$ is a normalized communication cost: total bits exchanged, divided by a reference intermediate-file size (or, in other conventions, per input bit). A Shannon rate is bits per channel use and depends on a channel model. For a noiseless bit-pipe the two agree up to units; for a wireless channel the rate is constrained by capacity while the load is not. This distinction will matter in Chapter 16.

Quick Check

In the formal network model of this section, what is the role of the cut-set inequality $H(X_{\mathcal{S}}) \geq H(Y \mid X_{\mathcal{S}^c})$ ?

It is an achievability bound telling us how to design the encoders.

It is a converse: any scheme must carry at least $H(Y \mid X_{\mathcal{S}^c})$ bits across the cut between $\mathcal{S}$ and the master.

It requires the master to know the sources' joint distribution.

It shows that uncoded routing is always optimal.

Correction:

It is a converse: any scheme must carry at least

H(Y \mid X_{\mathcal{S}^c})

bits across the cut between

\mathcal{S}

and the master.

Exactly: cut-set arguments give structural lower bounds on network information flow, and this book specializes the argument to each subsequent problem.

Workers as Encoders, Master as Decoder