Ferkans — Interactive Telecom Tutor

ex-ch02-01

Easy

A coded distributed-computing scheme uses $N = 25$ workers with storage load $\mu = 0.4$ . Compute the communication load $\Delta^*(\mu)$ and the gain factor over the uncoded baseline.

Show Hint

Use $\Delta^*(\mu) = (1-\mu)/(N\mu)$ .

Uncoded load is $1 - 1/N$ .

Solution

Coded load

$\Delta^*(\mu) = (1 - 0.4)/(25 \cdot 0.4) = 0.6 / 10 = 0.06$ .

Gain

Uncoded: $1 - 1/25 = 0.96$ . Gain: $0.96 / 0.06 = 16 = N\mu$ .

ex-ch02-02

Easy

State the four-step recipe for a cut-set converse, and identify which step introduces the storage constraint.

Solution

Four steps

(1) Identify the cut (subset of workers whose messages the master uses to decode).

(2) Apply the output-entropy bound: aggregate message entropy $\geq$ conditional output entropy given the other side.

(3) Symmetrize over all equivalent cuts (all subsets of the same size).

(4) Normalize by the output size to get the rate/load.

Where $\mu$ enters

In step 2, the conditional entropy $H(Y \mid X_{\mathcal{S}^c})$ is reduced by the storage that the workers in $\mathcal{S}^c$ already hold — the reduction is governed by $\mu$ .

ex-ch02-03

Easy

Show that at $\mu = 1$ (full replication) the optimal communication load is $\Delta^*(1) = 0$ , and interpret the result.

Solution

Plug in

$\Delta^*(1) = (1 - 1)/(N \cdot 1) = 0$ .

Interpretation

At full replication every worker has the entire dataset, so no inter-worker communication is needed: each worker produces the full output independently and the master reads one copy. The cost is paid in per-worker storage equal to the full dataset.

ex-ch02-04

Easy

Suppose a scheme claims to achieve $\Delta = 0.1$ at storage $\mu = 0.2$ on $N = 20$ workers. Can this be correct?

Show Hint

Check the cut-set lower bound at $(\mu=0.2, N=20)$ .

Solution

Lower bound

$\Delta^*(0.2) = (1 - 0.2)/(20 \cdot 0.2) = 0.8/4 = 0.2$ .

Verdict

The claim $\Delta = 0.1$ violates the cut-set converse, which mandates $\Delta \geq 0.2$ at this $\mu$ . Either the scheme is wrong, the scheme cheats by using more storage than $\mu = 0.2$ , or the problem does not match the coded-shuffling setting of §2.3.

ex-ch02-05

Medium

Derive the formula $\Delta^*(\mu) = (1-\mu)/(N\mu)$ by counting broadcasts in the coded-shuffling construction of §2.3. Assume $\mu N$ is an integer.

Show Hint

Every subset $\mathcal{T}$ of size $\mu N$ indexes a broadcast group.

Each broadcast serves $\mu N$ recipients simultaneously.

Solution

Number of broadcast groups

There are $\binom{N}{\mu N}$ subsets $\mathcal{T}$ of size $\mu N$ worth considering. For each subset, the members collaboratively serve the remaining $N - \mu N$ workers, broadcasting $\mu N$ - wise XOR-combinations.

Broadcast count

By a counting argument (see Li et al. 2018 §III.B), the total number of broadcast messages is $(N - \mu N) \binom{N}{\mu N + 1} / (\mu N + 1)$ , and each message is a single chunk of the intermediate-value file.

Normalization

Dividing by the intermediate-file size and simplifying gives $\Delta = (1 - \mu)/(N\mu)$ , matching the theorem. (A fully worked combinatorial derivation is Exercise IV-2 in Li et al.'s paper; this exercise mainly tests understanding of the mechanism.)

ex-ch02-06

Medium

Compute the memory-shared communication load at $\mu = 0.3$ for $N = 10$ workers. (The discrete optimal curve is defined only at $\mu \in \{0.1, 0.2, \ldots, 1.0\}$ .)

Show Hint

Interpolate linearly between the two nearest discrete points.

$\mu = 0.3$ lies on the segment $[0.2, 0.3]$ — find its endpoint values and interpolate if needed.

Solution

Endpoints

At $\mu = 0.3$ , the discrete points are $\mu = 0.2$ ( $\Delta^* = 0.8/2 = 0.4$ ) and $\mu = 0.3$ directly ( $\Delta^* = 0.7/3 \approx 0.233$ ). Since $\mu = 0.3$ is itself a discrete point, no interpolation is needed here.

If asked at $\mu = 0.25$

Interpolate between $(0.2, 0.4)$ and $(0.3, 0.233)$ . $\Delta(0.25) = (0.4 + 0.233)/2 = 0.317$ . This is an upper bound on the smooth convex curve $(1-\mu)/(N\mu) = 0.3$ at $\mu = 0.25$ .

ex-ch02-07

Medium

Prove that the tradeoff curve $\Delta^*(\mu) = (1-\mu)/(N\mu)$ is strictly convex in $\mu$ on $(0, 1]$ .

Show Hint

Rewrite as $\Delta^*(\mu) = 1/(N\mu) - 1/N$ .

Compute $d^2 \Delta / d\mu^2$ .

Solution

Rewrite

$\Delta^*(\mu) = \frac{1}{N\mu} - \frac{1}{N}$ .

Differentiate

$\Delta'(\mu) = -1/(N\mu^2)$ ; $\Delta''(\mu) = 2/(N\mu^3) > 0$ for $\mu \in (0, 1]$ . Hence $\Delta^*$ is strictly convex.

Implication

Strict convexity means that memory-sharing between two discrete points gives a strictly higher communication load than the smooth curve — the discrete optimal is optimal only at its grid points.

ex-ch02-08

Medium

Explain precisely how the storage mapping $\varphi_k$ in the network model of §2.1 is chosen before the inputs are revealed, and why this matters for the information-theoretic analysis.

Show Hint

Think about conditional entropies given the storage.

Consider what happens if $\varphi_k$ could depend on $\mathcal{D}$ .

Solution

Pre-commitment

The storage mappings are fixed protocols: worker $k$ always stores $\varphi_k(\mathcal{D})$ , regardless of the realization. This ensures that the joint distribution of $(\mathcal{D}_1, \ldots, \mathcal{D}_N)$ is well-defined from the prior on $\mathcal{D}$ alone.

Why it matters for the analysis

Cut-set bounds use conditional entropies like $H(Y \mid X_{\mathcal{S}^c})$ , which require a well-defined joint distribution. If storage could depend on the input in an adaptive way, the analysis would have to track this adaptation — possible but complicating. Fortunately, all results in this book are proved under the fixed-mapping assumption and extend naturally when needed.

ex-ch02-09

Medium

Consider a non-symmetric setting where workers have different storage budgets $\mu_1, \ldots, \mu_N$ . Guess a generalization of the tradeoff formula and discuss whether the cut-set argument still applies.

Show Hint

The cut-set argument does not require symmetry.

What is the average storage?

Solution

Generalization

Define $\bar \mu = (1/N)\sum_k \mu_k$ . A plausible generalization is $\Delta \geq (1-\bar\mu)/(N\bar\mu)$ . Matching achievability requires a scheme that spreads load in proportion to the storage distribution — a non-trivial combinatorial problem.

What the cut-set argument gives directly

Fixing a cut between some subset $\mathcal{S}$ and the master, the bound becomes $\sum_{k \in \mathcal{S}} H(X_k) \geq F(1 - \sum_{k \in \mathcal{S}^c} \mu_k)$ , i.e., what can be reconstructed locally by the complement. Averaging over cuts weighted by the storage distribution gives the generalized converse — a good research-level exercise.

Caveat

The result for non-symmetric storage is not fully characterized in general; partial results exist for special storage profiles. This is one of the open problems of Chapter 18.

ex-ch02-10

Medium

Use the cut-set recipe to give a cut-set lower bound on the communication in a federated-learning round where the server aggregates $n$ users' gradients $\mathbf{g}_k \in \mathbb{R}^d$ .

Show Hint

Cut: server vs. all $n$ users.

The server's output is $\sum_k \mathbf{g}_k$ with entropy roughly $d$ scalars.

Solution

Step 1: Cut

The natural cut is between the server and the union of all users: all uplink traffic crosses this cut.

Step 2: Entropy bound

The server's output $\mathbf{G} = \sum_k \mathbf{g}_k \in \mathbb{R}^d$ has entropy $H(\mathbf{G}) \approx d \cdot b$ bits at $b$ -bit precision.

Step 3 + 4: Symmetrize and normalize

Since each user must contribute information about its gradient, symmetry gives per-user uplink at least $H(\mathbf{g}_k) - H(\mathbf{g}_k \mid \mathbf{G})$ , which for independent gradients is roughly $d \cdot b / n$ per user — but the aggregate is still $\Omega(n d b)$ . This recovers the aggregation-cost theorem of Chapter 1 from the cut-set recipe.

ex-ch02-11

Hard

Extend the $(\mu, \Delta, K)$ framework by adding a privacy parameter $T$ : the protocol must leak no information about any single worker's share to any set of $\leq T$ colluding workers. Conjecture how the tradeoff curve changes and compare with the statement of the secure-aggregation theorem in Chapter 10.

Show Hint

Introduce shared randomness to pad messages.

The aggregate randomness cannot be learned from fewer than $T$ shares.

Solution

Informal argument

Privacy against $T$ colluders is achieved by adding random masks that cancel in the aggregate. Each mask contributes to the communication load; the aggregate load rises to roughly $(1-\mu)/(N\mu) + T/N$ , where the second term is the privacy overhead per-user.

Matching achievability

Achievability uses ramp secret sharing (Chapter 3) and pairwise masking (Chapter 10). The exact tradeoff region is established in Caire et al.'s Optimality of Secure Aggregation with Uncoded Groupwise Keys paper, which we tag in Chapter 10. The conjecture here is qualitatively correct; the constants require the full construction.

Lesson

The cut-set recipe transfers, but each additional adversary parameter adds its own structural term. This is the "Three challenges, one thread" principle made quantitative.

ex-ch02-12

Hard

Give an example where the cut-set bound is not tight and argue briefly why. Consider a scenario from the coded-caching literature or from distributed storage.

Show Hint

Multi-unicast interference channels.

Non-symmetric cache placements.

Solution

Example: Non-symmetric caching

For asymmetric coded caching with unequal user memories, the cut-set bound can be improved via more delicate linear programming arguments (see Yu, Maddah-Ali, Avestimehr, 2017). The cut-set's gap to the true optimum is small in absolute terms but demonstrably non-zero.

Why the cut-set loses

Cut-set bounds assume the cut can be "fully" saturated with independent information. When multiple cuts share constraints (as in coded caching with asymmetric memories), the cut-set is not tight because it does not capture the joint constraints between cuts. Polyhedral methods (linear programming on entropy inequalities) give tighter bounds.

Research context

Characterizing when cut-set bounds are tight vs. loose is one of the persistent themes of multi-user information theory. For the vanilla coded-shuffling problem of this chapter, it happens to be tight; for richer settings it need not be.

ex-ch02-13

Hard

Suppose the $(\mu, \Delta)$ tradeoff is achieved by a scheme with an $(N, K)$ -Reed–Solomon-like encoding. What is the decoder's computational complexity? Discuss the engineering implication.

Show Hint

Reed–Solomon decoding is $O(N \log^2 N)$ with FFTs.

Coded-shuffling decoder uses only XORs in the finite field.

Solution

Decoder cost

Reed–Solomon decoding uses FFT-based polynomial interpolation with complexity $O(N \log^2 N)$ per symbol. For the canonical coded-shuffling scheme, the decoding is much simpler — each recipient performs one XOR per broadcast message — hence $O(\mu N)$ operations per chunk.

Engineering implication

The coded-shuffling decoder is extremely cheap compared to general Reed–Solomon. This cost asymmetry (cheap decoding, moderate storage) is what makes coded shuffling practical even at the wireless edge. Polynomial codes for matrix multiplication (Chapter 5) have more expensive decoders but hit a stronger recovery threshold.

ex-ch02-14

Hard

Prove or disprove: on a wireless multiple-access channel with additive white Gaussian noise, the achievable $(\mu, \Delta)$ region strictly exceeds the bit-pipe region of Section 2.3 in the high-SNR regime.

Show Hint

Analog AirComp computes $\sum_k X_k$ in the channel itself.

At high SNR, the MAC channel is nearly noiseless.

Solution

Sketch proof

Over an AWGN MAC at high SNR, all $N$ users can transmit simultaneously and the receiver obtains $Y = \sum_k X_k + Z$ with $Z$ small. For an aggregation task the receiver's output is $\sum_k X_k$ itself, so the cost in "channel uses" is $\Theta(d)$ rather than $\Theta(N d)$ — a factor- $N$ saving over the bit-pipe baseline.

Implication

The $(\mu, \Delta)$ region on a wireless MAC can be strictly larger than on bit-pipes, because the channel itself performs aggregation. Chapter 16 makes this precise: the analog AirComp region sits above the digital region by a factor of $N$ in the high-SNR limit, at the cost of additional MSE.

ex-ch02-15

Challenge

Develop a joint $(\mu, \Delta, T, B)$ framework that incorporates (i) computation load $\mu$ , (ii) communication load $\Delta$ , (iii) privacy parameter $T$ (colluding users), (iv) Byzantine tolerance $B$ (malicious workers). Conjecture the lower-order behaviour of each axis's contribution.

Show Hint

Decompose the total load into structural, privacy, and integrity components.

Consult Chapter 11 (ByzSecAgg) for the Byzantine baseline.

Solution

Heuristic decomposition

$\Delta \;\geq\; \underbrace{\frac{1-\mu}{N\mu}}_{\text{structural}} \;+\; \underbrace{\frac{T}{N}}_{\text{privacy}} \;+\; \underbrace{\frac{2B}{N}}_{\text{Byzantine}}.$

The first term is the coded-shuffling converse of §2.4. The second is the privacy overhead from ramp secret sharing (Chapter 11). The third is the Byzantine-correction overhead from Reed–Solomon in the presence of $B$ adversarial errors.

Status

The conjecture matches the achievability bounds of Chapters 10–12 and the ByzSecAgg CommIT contribution in Chapter 11, but a matching converse for the joint region is one of the open problems listed in Chapter 18. Interested readers are invited to formalize the decomposition and verify it against the existing constructions.

What this exercise is really asking

The exercise is a gentle invitation to treat the $(\mu, \Delta, T, B)$ framework as the design space for the remainder of the book. Every later chapter specializes to one slice of this 4-dimensional region; understanding the joint structure is what the book is about.

Exercises

ex-ch02-01

Coded load

Gain

ex-ch02-02

Four steps

Where $\mu$ enters

ex-ch02-03

Plug in

Interpretation

ex-ch02-04

Lower bound

Verdict

ex-ch02-05

Number of broadcast groups

Broadcast count

Normalization

ex-ch02-06

Endpoints

If asked at $\mu = 0.25$

ex-ch02-07

Rewrite

Differentiate

Implication

ex-ch02-08

Pre-commitment

Why it matters for the analysis

ex-ch02-09

Generalization

What the cut-set argument gives directly

Caveat

ex-ch02-10

Step 1: Cut

Step 2: Entropy bound

Step 3 + 4: Symmetrize and normalize

ex-ch02-11

Informal argument

Matching achievability

Lesson

ex-ch02-12

Example: Non-symmetric caching

Why the cut-set loses

Research context

ex-ch02-13

Decoder cost

Engineering implication

ex-ch02-14

Sketch proof

Implication

ex-ch02-15

Heuristic decomposition

Status

What this exercise is really asking