Exercises
ex-ch02-01
EasyA coded distributed-computing scheme uses workers with storage load . Compute the communication load and the gain factor over the uncoded baseline.
Use .
Uncoded load is .
Coded load
.
Gain
Uncoded: . Gain: .
ex-ch02-02
EasyState the four-step recipe for a cut-set converse, and identify which step introduces the storage constraint.
Four steps
(1) Identify the cut (subset of workers whose messages the master uses to decode).
(2) Apply the output-entropy bound: aggregate message entropy conditional output entropy given the other side.
(3) Symmetrize over all equivalent cuts (all subsets of the same size).
(4) Normalize by the output size to get the rate/load.
Where $\mu$ enters
In step 2, the conditional entropy is reduced by the storage that the workers in already hold — the reduction is governed by .
ex-ch02-03
EasyShow that at (full replication) the optimal communication load is , and interpret the result.
Plug in
.
Interpretation
At full replication every worker has the entire dataset, so no inter-worker communication is needed: each worker produces the full output independently and the master reads one copy. The cost is paid in per-worker storage equal to the full dataset.
ex-ch02-04
EasySuppose a scheme claims to achieve at storage on workers. Can this be correct?
Check the cut-set lower bound at .
Lower bound
.
Verdict
The claim violates the cut-set converse, which mandates at this . Either the scheme is wrong, the scheme cheats by using more storage than , or the problem does not match the coded-shuffling setting of §2.3.
ex-ch02-05
MediumDerive the formula by counting broadcasts in the coded-shuffling construction of §2.3. Assume is an integer.
Every subset of size indexes a broadcast group.
Each broadcast serves recipients simultaneously.
Number of broadcast groups
There are subsets of size worth considering. For each subset, the members collaboratively serve the remaining workers, broadcasting - wise XOR-combinations.
Broadcast count
By a counting argument (see Li et al. 2018 §III.B), the total number of broadcast messages is , and each message is a single chunk of the intermediate-value file.
Normalization
Dividing by the intermediate-file size and simplifying gives , matching the theorem. (A fully worked combinatorial derivation is Exercise IV-2 in Li et al.'s paper; this exercise mainly tests understanding of the mechanism.)
ex-ch02-06
MediumCompute the memory-shared communication load at for workers. (The discrete optimal curve is defined only at .)
Interpolate linearly between the two nearest discrete points.
lies on the segment — find its endpoint values and interpolate if needed.
Endpoints
At , the discrete points are () and directly (). Since is itself a discrete point, no interpolation is needed here.
If asked at $\mu = 0.25$
Interpolate between and . . This is an upper bound on the smooth convex curve at .
ex-ch02-07
MediumProve that the tradeoff curve is strictly convex in on .
Rewrite as .
Compute .
Rewrite
.
Differentiate
; for . Hence is strictly convex.
Implication
Strict convexity means that memory-sharing between two discrete points gives a strictly higher communication load than the smooth curve — the discrete optimal is optimal only at its grid points.
ex-ch02-08
MediumExplain precisely how the storage mapping in the network model of §2.1 is chosen before the inputs are revealed, and why this matters for the information-theoretic analysis.
Think about conditional entropies given the storage.
Consider what happens if could depend on .
Pre-commitment
The storage mappings are fixed protocols: worker always stores , regardless of the realization. This ensures that the joint distribution of is well-defined from the prior on alone.
Why it matters for the analysis
Cut-set bounds use conditional entropies like , which require a well-defined joint distribution. If storage could depend on the input in an adaptive way, the analysis would have to track this adaptation — possible but complicating. Fortunately, all results in this book are proved under the fixed-mapping assumption and extend naturally when needed.
ex-ch02-09
MediumConsider a non-symmetric setting where workers have different storage budgets . Guess a generalization of the tradeoff formula and discuss whether the cut-set argument still applies.
The cut-set argument does not require symmetry.
What is the average storage?
Generalization
Define . A plausible generalization is . Matching achievability requires a scheme that spreads load in proportion to the storage distribution — a non-trivial combinatorial problem.
What the cut-set argument gives directly
Fixing a cut between some subset and the master, the bound becomes , i.e., what can be reconstructed locally by the complement. Averaging over cuts weighted by the storage distribution gives the generalized converse — a good research-level exercise.
Caveat
The result for non-symmetric storage is not fully characterized in general; partial results exist for special storage profiles. This is one of the open problems of Chapter 18.
ex-ch02-10
MediumUse the cut-set recipe to give a cut-set lower bound on the communication in a federated-learning round where the server aggregates users' gradients .
Cut: server vs. all users.
The server's output is with entropy roughly scalars.
Step 1: Cut
The natural cut is between the server and the union of all users: all uplink traffic crosses this cut.
Step 2: Entropy bound
The server's output has entropy bits at -bit precision.
Step 3 + 4: Symmetrize and normalize
Since each user must contribute information about its gradient, symmetry gives per-user uplink at least , which for independent gradients is roughly per user — but the aggregate is still . This recovers the aggregation-cost theorem of Chapter 1 from the cut-set recipe.
ex-ch02-11
HardExtend the framework by adding a privacy parameter : the protocol must leak no information about any single worker's share to any set of colluding workers. Conjecture how the tradeoff curve changes and compare with the statement of the secure-aggregation theorem in Chapter 10.
Introduce shared randomness to pad messages.
The aggregate randomness cannot be learned from fewer than shares.
Informal argument
Privacy against colluders is achieved by adding random masks that cancel in the aggregate. Each mask contributes to the communication load; the aggregate load rises to roughly , where the second term is the privacy overhead per-user.
Matching achievability
Achievability uses ramp secret sharing (Chapter 3) and pairwise masking (Chapter 10). The exact tradeoff region is established in Caire et al.'s Optimality of Secure Aggregation with Uncoded Groupwise Keys paper, which we tag in Chapter 10. The conjecture here is qualitatively correct; the constants require the full construction.
Lesson
The cut-set recipe transfers, but each additional adversary parameter adds its own structural term. This is the "Three challenges, one thread" principle made quantitative.
ex-ch02-12
HardGive an example where the cut-set bound is not tight and argue briefly why. Consider a scenario from the coded-caching literature or from distributed storage.
Multi-unicast interference channels.
Non-symmetric cache placements.
Example: Non-symmetric caching
For asymmetric coded caching with unequal user memories, the cut-set bound can be improved via more delicate linear programming arguments (see Yu, Maddah-Ali, Avestimehr, 2017). The cut-set's gap to the true optimum is small in absolute terms but demonstrably non-zero.
Why the cut-set loses
Cut-set bounds assume the cut can be "fully" saturated with independent information. When multiple cuts share constraints (as in coded caching with asymmetric memories), the cut-set is not tight because it does not capture the joint constraints between cuts. Polyhedral methods (linear programming on entropy inequalities) give tighter bounds.
Research context
Characterizing when cut-set bounds are tight vs. loose is one of the persistent themes of multi-user information theory. For the vanilla coded-shuffling problem of this chapter, it happens to be tight; for richer settings it need not be.
ex-ch02-13
HardSuppose the tradeoff is achieved by a scheme with an -Reed–Solomon-like encoding. What is the decoder's computational complexity? Discuss the engineering implication.
Reed–Solomon decoding is with FFTs.
Coded-shuffling decoder uses only XORs in the finite field.
Decoder cost
Reed–Solomon decoding uses FFT-based polynomial interpolation with complexity per symbol. For the canonical coded-shuffling scheme, the decoding is much simpler — each recipient performs one XOR per broadcast message — hence operations per chunk.
Engineering implication
The coded-shuffling decoder is extremely cheap compared to general Reed–Solomon. This cost asymmetry (cheap decoding, moderate storage) is what makes coded shuffling practical even at the wireless edge. Polynomial codes for matrix multiplication (Chapter 5) have more expensive decoders but hit a stronger recovery threshold.
ex-ch02-14
HardProve or disprove: on a wireless multiple-access channel with additive white Gaussian noise, the achievable region strictly exceeds the bit-pipe region of Section 2.3 in the high-SNR regime.
Analog AirComp computes in the channel itself.
At high SNR, the MAC channel is nearly noiseless.
Sketch proof
Over an AWGN MAC at high SNR, all users can transmit simultaneously and the receiver obtains with small. For an aggregation task the receiver's output is itself, so the cost in "channel uses" is rather than — a factor- saving over the bit-pipe baseline.
Implication
The region on a wireless MAC can be strictly larger than on bit-pipes, because the channel itself performs aggregation. Chapter 16 makes this precise: the analog AirComp region sits above the digital region by a factor of in the high-SNR limit, at the cost of additional MSE.
ex-ch02-15
ChallengeDevelop a joint framework that incorporates (i) computation load , (ii) communication load , (iii) privacy parameter (colluding users), (iv) Byzantine tolerance (malicious workers). Conjecture the lower-order behaviour of each axis's contribution.
Decompose the total load into structural, privacy, and integrity components.
Consult Chapter 11 (ByzSecAgg) for the Byzantine baseline.
Heuristic decomposition
The first term is the coded-shuffling converse of §2.4. The second is the privacy overhead from ramp secret sharing (Chapter 11). The third is the Byzantine-correction overhead from Reed–Solomon in the presence of adversarial errors.
Status
The conjecture matches the achievability bounds of Chapters 10–12 and the ByzSecAgg CommIT contribution in Chapter 11, but a matching converse for the joint region is one of the open problems listed in Chapter 18. Interested readers are invited to formalize the decomposition and verify it against the existing constructions.
What this exercise is really asking
The exercise is a gentle invitation to treat the framework as the design space for the remainder of the book. Every later chapter specializes to one slice of this 4-dimensional region; understanding the joint structure is what the book is about.