Fundamental Quantities: Storage, Load, Threshold

Three Numbers That Characterize a Scheme

Every distributed-computing scheme in this book lives at a point (μ,Δ,K)(\mu, \Delta, K) in a three-dimensional design space. Whether we are coding matrix multiplication, shuffling gradients, or aggregating federated models, we ultimately want to locate our scheme in this space and compare it to the best achievable operating point.

This section gives the three quantities their precise information-theoretic definitions. In later chapters the space is augmented with a privacy parameter (Part III), a PIR rate (Part IV), or a distortion (Part V), but the (μ,Δ,K)(\mu, \Delta, K) triple is always the core.

Definition:

Computation (Storage) Load μ\mu

The computation load (equivalently, storage load) of a distributed-computing scheme is μ    k=1NH(Dk)NH(D)    [1N,1],\mu \;\triangleq\; \frac{\sum_{k=1}^N H(\mathcal{D}_k)} {N \cdot H(\mathcal{D})} \;\in\; \Bigl[\tfrac{1}{N},\, 1\Bigr], the average per-worker storage normalized by the full dataset entropy. μ=1/N\mu = 1/N corresponds to uncoded disjoint partitioning (each worker stores a 1/N1/N fraction with no overlap); μ=1\mu = 1 corresponds to full replication (every worker stores the entire dataset). Any intermediate value trades storage for redundancy.

In the deterministic case where D\mathcal{D} is a fixed file of length FF bits and Dk\mathcal{D}_k is a subfile of length FkF_k, the definition reduces to μ=kFk/(NF)\mu = \sum_k F_k / (N F).

Computation / Storage Load μ\mu

The average per-worker storage as a fraction of the full dataset size. μ=1/N\mu = 1/N is no redundancy (disjoint partition); μ=1\mu = 1 is full replication. Larger μ\mu enables coding gains but costs more memory/disk at each worker.

Definition:

Communication Load Δ\Delta

The communication load of a distributed-computing scheme is Δ    k=1NH(Xk)H(Yint),\Delta \;\triangleq\; \frac{\sum_{k=1}^N H(X_k)}{H(Y_{\text{int}})}, the aggregate message entropy normalized by the reference intermediate-file entropy H(Yint)H(Y_{\text{int}}) (typically the size of the intermediate-value file in MapReduce, or the size of the desired output in matrix multiplication). Smaller Δ\Delta means less network traffic per unit of useful output.

Different problems use slightly different normalizations. In PIR (Chapter 13) the analogous quantity is the PIR rate RPIR=F/DR_{\text{PIR}} = F / D where FF is the file size and DD the download size; Δ=1/RPIR\Delta = 1/R_{\text{PIR}} is then the natural communication cost. The essential idea — normalize by what the master actually needs — is common to all.

Communication Load Δ\Delta

Aggregate inter-worker traffic normalized by the reference output size. In coded shuffling Δ=(1μ)/(Nμ)\Delta = (1-\mu)/(N\mu); in uncoded shuffling Δ=1μ\Delta = 1 - \mu. The computation– communication tradeoff curve is Δ(μ)\Delta(\mu).

Definition:

Recovery Threshold KK

A distributed-computing scheme has recovery threshold KNK \leq N if the decoder satisfies Pr[Y^Yany K messages are received]0,\Pr\bigl[\hat Y \neq Y \bigm| \text{any } K \text{ messages are received}\bigr] \to 0, i.e., any KK of the NN encoded messages suffice to reconstruct the output. Smaller KK means more straggler tolerance — the master waits for fewer responses.

A scheme is optimal in the recovery-threshold sense if no other scheme with the same storage load μ\mu achieves a smaller KK.

Recovery Threshold KK

The minimum number of worker responses needed to reconstruct the output. Smaller KK gives better straggler tolerance. For polynomial- coded matrix multiplication (Chapter 5), KK matches an information-theoretic lower bound.

Example: The Uncoded Benchmark

For the uncoded MapReduce scheme of Chapter 1 (each of NN workers stores a disjoint 1/N1/N partition and transmits raw intermediate values), compute (μ,Δ,K)(\mu, \Delta, K).

Definition:

Computation and Communication Rates

It is often convenient to express storage and communication as rates — bits per input sample rather than fractions of the dataset. Let F=H(D)F = H(\mathcal{D}) be the dataset entropy. Define Rcomp    μF,Rcomm    ΔF.R_{\text{comp}} \;\triangleq\; \mu \cdot F, \qquad R_{\text{comm}} \;\triangleq\; \Delta \cdot F. RcompR_{\text{comp}} is the per-worker storage cost in bits; RcommR_{\text{comm}} is the aggregate communication cost in bits per input bit of dataset. Expressing results in rates makes the coding-theoretic intuition (Reed–Solomon, MDS, Shamir) more transparent.

Key Takeaway

Three numbers characterize a distributed-computing scheme: storage μ\mu, communication Δ\Delta, recovery threshold KK. Each chapter of this book is ultimately about locating the achievable region in this three-dimensional space. The converse of Section 2.4 will show that the uncoded baseline sits in the interior of the region — coded schemes can be strictly better on all three axes simultaneously.

Communication Load vs. Number of Workers

Plot the communication load Δ\Delta as a function of the number of workers NN, for three schemes: (i) uncoded shuffling (Δ=11/N\Delta = 1 - 1/N), (ii) coded shuffling at fixed storage μ\mu (Δ=(1μ)/(Nμ)\Delta = (1-\mu)/(N\mu)), and (iii) full replication (Δ=0\Delta = 0). Adjust μ\mu to see how the coded curve shifts. Notice that the coded curve decays like 1/N1/N while uncoded asymptotes to 11 — this is the coding gain we quantify in Chapters 5–7.

Parameters
0.25

Per-worker storage fraction

100

Maximum number of workers on the plot

Three Operating Regimes of the (μ,Δ)(\mu, \Delta) Plane

Regimeμ\muΔ\DeltaKKWhen used
Uncoded1/N1/N11/N1 - 1/NNNLegacy MapReduce, naive FL, no redundancy budget
Coded, minimum storage1/N1/N(N1)/N(N-1)/NNNNo improvement at min. storage (equivalent to uncoded)
Coded, intermediateμ(1/N,1)\mu \in (1/N, 1)(1μ)/(Nμ)(1-\mu)/(N\mu)K=(1μ)/(Nμ)+1K = \lceil (1-\mu)/(N\mu) + 1\rceilData-center FL, resilient coded computing (Chapters 5–7)
Full replication110011Small clusters, maximum straggler tolerance, luxury regime

Common Mistake: Coding at Minimum Storage Does Not Help

Mistake:

Claim that applying a coding scheme at minimum storage μ=1/N\mu = 1/N gives any improvement over uncoded.

Correction:

At μ=1/N\mu = 1/N every worker stores a disjoint 1/N1/N fraction — there is no redundancy to exploit, regardless of the coding. The coded-shuffling formula Δ(μ)=(1μ)/(Nμ)\Delta(\mu) = (1-\mu)/(N\mu) evaluated at μ=1/N\mu = 1/N gives (N1)/N=11/N(N-1)/N = 1 - 1/N, exactly the uncoded load. Coding gains require extra storage, quantified by μ>1/N\mu > 1/N. This is easy to verify from the formula, but surprisingly often miscommunicated in system-design discussions.

Quick Check

A coded shuffling scheme with N=20N = 20 workers uses storage load μ=0.25\mu = 0.25. What is the communication load Δ\Delta?

0.150.15

0.750.75

0.950.95

0.040.04