Ferkans — Interactive Telecom Tutor

Why We Need a Converse for Polynomial Codes

Section 5.2 established that polynomial codes achieve $K = pq$ . The next question is whether any cleverer scheme, at the same storage level $\mu = 1/p + 1/q$ , could achieve $K < pq$ . A rigorous answer requires a converse: a proof that no such scheme exists, i.e., a lower bound that matches the achievability. If we do not have the converse, we do not know whether polynomial codes are the final word or merely a good construction waiting to be beaten.

This section develops the converse argument. The structure mirrors Chapter 2's cut-set converse: identify the cut, apply an entropy bound, symmetrize, normalize. The point is that the four-step recipe transfers from generic coded shuffling to the specific matrix-multiplication setting — an instance of the golden thread from Chapter 2.

Theorem: Matching Lower Bound: $K \geq pq$

Consider any distributed matrix-multiplication scheme with per-worker storage $\mu = 1/p + 1/q$ and independent, uniform $\mathbf{A} \in \mathbb{F}_q^{m \times d}$ , $\mathbf{B} \in \mathbb{F}_q^{m \times d'}$ . Suppose the scheme's decoder must succeed on every size- $K$ subset of worker responses. Then $K \;\geq\; pq.$ The polynomial-code construction of §5.2 achieves this bound with equality, so the construction is information-theoretically optimal.

The master must reconstruct $pq$ matrix blocks; each worker response is a single $\mathbb{F}_q^{(d/p) \times (d'/q)}$ - valued observation. Without structure, $pq$ observations are needed. The proof shows that the storage constraint $\mu = 1/p + 1/q$ does not enable the scheme to do better — every coding scheme at this storage level has at least $pq$ -dimensional ambiguity in $pq - 1$ responses.

Proof

Set up the cut

Fix a subset $\mathcal{T}$ of $K - 1$ responses. Let $X_\mathcal{T} = \{\tilde{\mathbf{C}}_k\}_{k \in \mathcal{T}}$ be their joint view and $Y = \mathbf{A}^T \mathbf{B}$ the target output. The cut separates the master (with access to $X_\mathcal{T}$ ) from the missing worker.

Apply the output-entropy bound

From Chapter 2's Output-Entropy Bound, any correct scheme must satisfy $H(Y \mid X_{\mathcal{T}^c}) \;\leq\; H(Y \mid X_{\text{(missing)}}) + o(|\mathcal{T}|).$ The storage constraint gives $H(X_{\text{missing}}) \leq \mu (H(\mathbf{A}) + H(\mathbf{B}))$ .

Count blocks

$\mathbf{A}$ and $\mathbf{B}$ together have entropy $(md + md') \log q$ (uniform over $\mathbb{F}_q^{m(d + d')}$ ). The output $\mathbf{A}^T \mathbf{B}$ has entropy $(dd') \log q$ . The per-worker output block has entropy $(d/p)(d'/q) \log q = dd' \log q / (pq)$ . Hence $pq$ worker responses are needed to cover the output's entropy.

Conclude

Any scheme with $K - 1 < pq$ responses has positive error probability in the worst case. Hence $K \geq pq$ . $\blacksquare$

,

Key Takeaway

Polynomial codes are information-theoretically optimal at storage $\mu = 1/p + 1/q$ . No scheme can simultaneously have smaller recovery threshold and the same storage. The result is tight and matches the cut-set converse. This is a genuine achievability-converse closure — one of the rare instances in the coded-computing literature.

What 'Optimal' Actually Means Here

The optimality statement is subtle. Polynomial codes achieve the minimum $K$ at their specific storage level $\mu = 1/p + 1/q$ . The tradeoff is two-dimensional: you can buy a smaller $K$ by increasing $\mu$ . For instance:

$\mu = 1/pq$ (fractional storage, each worker stores one block pair): $K = pq$ — uncoded minimum storage.
$\mu = 1$ (full replication): $K = 1$ — any single response reconstructs.

Polynomial codes sit between these extremes at $\mu = 1/p + 1/q$ . The shape of the tradeoff curve $K^*(\mu)$ is not fully characterized for all $\mu$ , but piecewise-linear interpolation gives a good upper bound on the achievable $K$ . Section 5.4 compares polynomial codes with MDS schemes that occupy different points on this curve.

Example: Converse at $p = q = 2$

For $p = q = 2$ (so $pq = 4$ ), show explicitly that any scheme using 3 workers' responses to reconstruct $\mathbf{A}^T \mathbf{B}$ must have error probability bounded away from zero.

Solution

Setup

Output $\mathbf{A}^T \mathbf{B}$ has 4 blocks, each with entropy $H = (d/p)(d'/q) \log q$ . Total output entropy: $4H$ .

Bound the reconstructable information

Each response is one matrix of size $(d/p) \times (d'/q)$ , so its entropy is at most $H$ . With 3 responses, the master can access at most $3H$ of independent information.

The gap

The output entropy is $4H$ ; the available information is $3H$ . The master must guess at least $H$ bits worth of "hidden" output blocks. The error probability is bounded below by Fano's inequality: $P_e \geq (H - \log(1 + 1)) / H \approx 1$ in the large- $H$ limit.

Implication

The scheme cannot decode with arbitrarily small error from 3 responses. Hence $K \geq 4 = pq$ .

Storage vs. Recovery Threshold: the $K^*(\mu)$ Frontier

Plot the achievable $(K, \\mu)$ operating points for three schemes: (i) uncoded disjoint storage, (ii) MDS-coded replication, (iii) polynomial codes with varying $(p, q)$ . The frontier shows how $K$ decreases as $\\mu$ increases. Each scheme occupies a specific point; polynomial codes dominate at their specific $\\mu = 1/p + 1/q$ .

Parameters

N

— workers20

Number of workers

pq

max16

Range of partition products to plot

Common Mistake: Optimality Is Per-Storage, Not Global

Mistake:

Claim polynomial codes are "the" optimal coded-computing scheme for matrix multiplication.

Correction:

Polynomial codes are optimal at their specific storage level $\mu = 1/p + 1/q$ . At lower storage (e.g., $\mu = 1/(pq)$ ), different schemes might have different $K$ ; at higher storage (e.g., $\mu = 1/2$ ), MDS-coded replication achieves $K = p + q - 1 \leq pq$ . The right statement is: polynomial codes achieve the minimum $K$ at their per-worker storage level. The full frontier across all $\mu$ is piecewise and requires multiple constructions.

🔧Engineering Note

Converse-Matching Achievability in Production

The fact that polynomial codes match the converse is not merely aesthetic — it tells system architects when to stop optimizing. If a hypothetical "better" scheme promises $K = pq - 1$ at the same storage, the information-theoretic bound tells us the promise is impossible. Engineering time spent looking for one is wasted. This is perhaps the single most practical consequence of the converse proof: it saves teams from pursuing chimeras. The same lesson will reappear for PIR (Chapter 13 — Sun-Jafar capacity is tight) and secure aggregation (Chapter 10 — Caire et al.'s optimality result).

Practical Constraints

•
At $\mu = 1/p + 1/q$ , $K = pq$ is tight — no cleverer scheme possible
•
Moving to higher storage ( $\mu > 1/p + 1/q$ ) can reduce $K$ further
•
Matching converse means no 'secret' gain at this storage

Historical Note: The Converse Got Cleaner Over Time

2017–2019

Yu, Maddah-Ali, and Avestimehr's original 2017 paper established the $K \geq pq$ bound via a counting argument specialized to matrix multiplication. Dutta, Cadambe, and Grover's follow-on work (2019) gave a cleaner cut-set-based proof that generalizes to other linear computations (matrix-vector products, batched linear algebra). The modern treatment — which we present in this section — uses the Chapter 2 output-entropy recipe, making the converse a case study in the "cut, bound, symmetrize, normalize" template.

,

Why This Matters: The Same Converse Drives Chapters 6, 10, 13

The four-step converse recipe applied here — cut, entropy bound, symmetrize, normalize — drives the converses in multiple later chapters. Chapter 6's gradient-coding converse, Chapter 10's secure-aggregation optimality (the Caire et al. CommIT contribution), and Chapter 13's Sun-Jafar PIR capacity are all instances of the same template, specialized to different computation tasks and threat models. Reading Section 5.3 carefully pays dividends many times over.

Quick Check

Which step of the cut-set recipe introduces the storage constraint $\mu$ in the polynomial-code converse?

Identifying the cut

Applying the output-entropy bound

Symmetrizing over all equivalent cuts

Normalizing by the output size