Ferkans — Interactive Telecom Tutor

Three Schemes, Three Storage-Threshold Points

The polynomial code of Section 5.2 is optimal at its specific storage level, but the full $(K, \mu)$ -tradeoff has other operating points that serve other system constraints. This section briefly compares the three standard schemes — uncoded replication, MDS-coded replication, and polynomial codes — and then extends polynomial codes to the $T$ -private setting, where we demand that no coalition of $T$ workers learns anything about $\mathbf{A}$ or $\mathbf{B}$ .

The $T$ -private extension is small from an algebraic standpoint (add $T$ random matrices to the encoding polynomial) but carries the full weight of the privacy-robustness-efficiency tradeoff: every unit of privacy costs one extra worker response. This is the golden thread, now instantiated in concrete code.

Three Schemes for Distributed Matrix Multiplication

Scheme	Per-worker storage $\mu$	Recovery threshold $K$	Decoder complexity	Best for
Uncoded replication ( $r$ -replicas of each $(i,j)$ )	$1/(pq)$	$pq$ (one copy per block)	$O(1)$ per block	Small clusters, low redundancy
MDS-coded replication (row/column codes)	$1/\min(p, q)$	$p + q - 1$	$O((p+q)^2)$ Reed-Solomon	Balanced storage / threshold
Polynomial code (this chapter)	$1/p + 1/q$	$pq$ (matches lower bound)	$O((pq)^2)$ Lagrange, or $O(pq \log^2 pq)$ FFT	Maximum straggler tolerance at this storage
Entangled polynomial code (MatDot)	$1/p + 1/q$	$p + q - 1$	$O(pq)$ Lagrange	Minimum recovery threshold — at higher storage cost

When Each Scheme Wins

Uncoded replication wins when the system has ample replicas per block and the tail-latency budget is generous. Production ML pipelines at FAANG-scale sometimes use naive replication because its implementation is trivial and the tail-latency overhead is absorbed by parallel workloads.
MDS-coded replication wins when recovery-threshold $K$ matters more than per-worker storage. For matrix-vector products (where $q = 1$ ), MDS is the canonical choice because $p + q - 1 = p$ equals the polynomial-code bound.
Polynomial codes win when straggler tolerance matters and per-worker storage is tight. The EC2 benchmark shows $3$ – $7\times$ speedup over uncoded at the same storage level, making this the default choice in modern coded-computing pipelines.
Entangled polynomial codes (MatDot) achieve the minimum recovery threshold $p + q - 1$ by relaxing the storage- optimality constraint. They live at a different point on the full tradeoff curve and are the right choice when the straggler pattern is extreme (e.g., 50% stragglers).

Definition:
$T$ -Private Polynomial Code

A $T$ -private polynomial code for matrix multiplication guarantees that any set of $T$ workers — even if they collude — learns no information (in the Shannon sense) about $\mathbf{A}$ or $\mathbf{B}$ .

The construction extends §5.2's polynomial code by appending $T$ random matrices to each encoding polynomial: $p_A(x) \;=\; \sum_{i=0}^{p-1} \mathbf{A}_{i+1}\, x^i \;+\; \sum_{\ell = p}^{p + T - 1} \mathbf{Z}_{A, \ell} \, x^\ell,$ $p_B(x) \;=\; \sum_{j=0}^{q-1} \mathbf{B}_{j+1}\, x^{pj + T} \;+\; \sum_{\ell = pq + T}^{pq + 2T - 1} \mathbf{Z}_{B, \ell} \, x^\ell,$ where $\mathbf{Z}_{A, \ell}, \mathbf{Z}_{B, \ell}$ are i.i.d. uniform random matrices over $\mathbb{F}_q$ . The product $p_C(x) = p_A(x)^T p_B(x)$ now has degree $pq + 2T - 1$ ; its coefficients include the desired $\mathbf{C}_{ij}$ plus $2T + 2Tp$ "random" cross terms.

The random terms serve as "one-time pad" against any size- $T$ coalition, just like in Shamir secret sharing (Chapter 3). The cost is $2T$ extra worker responses needed to interpolate the product polynomial.

,

Theorem: $T$ -Private Recovery Threshold

The $T$ -private polynomial code with partition counts $(p, q)$ and $N$ workers over $\mathbb{F}_q$ (with $q \geq N + 2T$ ) achieves:

$T$ -privacy. For any subset $\mathcal{U} \subseteq [N]$ with $|\mathcal{U}| \leq T$ , $I(\mathbf{A}, \mathbf{B}; \{\tilde{\mathbf{A}}_k, \tilde{\mathbf{B}}_k\}_{k \in \mathcal{U}}) = 0$ .
Correctness. Any $K = pq + 2T$ worker responses uniquely determine $p_C$ and hence $\mathbf{A}^T \mathbf{B}$ .
Storage. $\mu = (p + T)/p + (q + T)/q \cdot (1/pq) = 1/p + 1/q + T/p + T/q$ (with the $T$ -padding included).

The extra $T$ random terms in each encoding polynomial act as information-theoretic masks — a size- $T$ coalition can solve for all of them simultaneously but gains no information about the secrets. The overhead is $2T$ additional responses at the master (to interpolate the longer product polynomial). Compared to the public code's $K = pq$ , the private version pays a linear cost in the privacy parameter.

Proof

Privacy argument

A size- $T$ coalition observes $T$ evaluations of $p_A$ and $p_B$ . Each polynomial has $p + T$ (resp. $q + T$ ) "coefficients"; the observations are a Vandermonde-full- rank linear system in $\mathbf{A}$ plus $T$ random matrices. The randomness absorbs exactly $T$ degrees of freedom, making the $T$ observations look uniform over the masking space — independent of $\mathbf{A}$ .

Correctness argument

$p_C(x) = p_A(x)^T p_B(x)$ has degree $pq + 2T - 1$ . Any $pq + 2T$ distinct evaluations interpolate the polynomial uniquely, and $p_C$ 's coefficients in positions $\{i + pj + T : i \in [p], j \in [q]\}$ are exactly the desired output blocks $\mathbf{C}_{i+1, j+1}$ .

Straggler tolerance

The scheme tolerates up to $N - (pq + 2T)$ stragglers, i.e., a more expensive budget than the non-private version. Setting $T = 0$ recovers the standard polynomial code's $K = pq$ . $\blacksquare$

Example: $1$ -Private Code for $p = q = 2$ , $N = 7$

Construct a $1$ -private polynomial code for $p = q = 2$ with $N = 7$ workers. How many stragglers can be tolerated?

Solution

Encoding polynomials

$p_A(x) = \mathbf{A}_1 + x \mathbf{A}_2 + x^2 \mathbf{Z}_A$ (one random term, $T = 1$ ). $p_B(x) = \mathbf{B}_1 + x^3 \mathbf{B}_2 + x^6 \mathbf{Z}_B$ (shift indices to avoid collision with $p_A$ ).

Product polynomial

$p_C(x) = p_A(x)^T p_B(x)$ has degree $2 + 6 = 8$ . Coefficients include $\mathbf{C}_{11}, \mathbf{C}_{21}, \mathbf{C}_{12}, \mathbf{C}_{22}$ (the four desired blocks) plus cross terms with $\mathbf{Z}_A, \mathbf{Z}_B$ (random, not needed).

Recovery

Master needs $K = 2pq + 2T = 9$ evaluations for interpolation... wait — that's not right. Let's recount: $\deg p_C = pq + 2T - 1 = 4 + 2 - 1 = 5$ , so $K = 6$ evaluations suffice. With $N = 7$ , straggler budget is $N - K = 1$ in addition to $1$ -privacy.

Operational

One extra worker (compared to $K = 4$ non-private) in exchange for perfect privacy against any single worker. This is the canonical tradeoff used in Chapter 11's ByzSecAgg, where ramp secret sharing lets us amortize this cost across many gradient coordinates.

Privacy Cost: $K$ vs. $T$ in Private Polynomial Codes

For fixed $(p, q) = (4, 4)$ partitions, plot the recovery threshold $K = pq + 2T$ as a function of the privacy parameter $T$ . Each unit of $T$ costs $2$ extra workers — a linear privacy-efficiency tradeoff. The plot also shows the minimum $N$ required to host the scheme (which must be at least $K$ ).

Parameters

p

— A partitions4

q

— B partitions4

T_{\max}

— privacy threshold8

Key Takeaway

Adding $T$ -privacy costs $2T$ extra worker responses. The polynomial-code framework extends cleanly: the master's recovery threshold goes from $pq$ to $pq + 2T$ . This is the precise quantification of the privacy-efficiency tradeoff — each additional colluder the scheme tolerates costs two extra workers. The linearity is information- theoretic, not a design accident.

Common Mistake: Extended Field Size for Privacy

Mistake:

Apply a $1$ -private code over $\mathbb{F}_{16}$ with $N = 15$ workers and expect perfect privacy.

Correction:

The privacy guarantee holds information-theoretically over any field, but the recovery threshold $K = pq + 2T$ requires $q \geq N + 2T$ distinct evaluation points. For $N = 15, T = 1$ , the field must have at least $17$ nonzero elements, i.e., $q \geq 17$ . With $\mathbb{F}_{16}$ there are only $15$ nonzero elements — insufficient. Use $\mathbb{F}_{17}$ or an extension $\mathbb{F}_{2^5} = \mathbb{F}_{32}$ .

🎓CommIT Contribution(2023)

ByzSecAgg Builds on This Construction

T. Jahani-Nezhad, M. A. Maddah-Ali, G. Caire — IEEE Journal on Selected Areas in Information Theory

The $T$ -private polynomial-code extension of this section is the technical starting point for ByzSecAgg (Chapter 11), one of the CommIT group's signature contributions to this book. ByzSecAgg combines (i) ramp secret sharing (Chapter 3) for privacy-efficient sharing of long gradient vectors, (ii) a $T$ -private polynomial-code-like scheme for distance-based Byzantine detection, and (iii) vector commitments for integrity. The construction achieves communication complexity $O(n \log n + B d)$ for $n$ users, $d$ -dimensional gradients, and $B$ Byzantine workers — a substantial improvement over the $O(n^2)$ of prior secure-aggregation schemes.

Readers who want the full deployment of the $T$ -private polynomial code should work through Chapter 11; this commit_contribution block is a forward pointer, not a full treatment.

byzsecaggpolynomial-codest-privateforward-referenceView Paper →

🚨Critical Engineering Note

Numerical Stability in Floating-Point Polynomial Codes

While the mathematical construction is over $\mathbb{F}_q$ , production deployments often apply the same polynomial-code machinery over the reals / floats for speed. The issue is that Vandermonde matrices at integer evaluation points are notoriously ill-conditioned in floating-point — condition numbers grow exponentially with $K$ . Recent work (Fahim et al. 2021, Das et al. 2023) has proposed Chebyshev-style evaluation points and other reconditioning tricks to stabilize the decoder. The takeaway: if you implement polynomial codes in practice, do not use $\alpha_k = k$ ; use Chebyshev roots $\alpha_k = \cos((2k - 1)\pi / (2K))$ , or stay in an exact integer field. Ignoring this costs accuracy and can silently corrupt outputs.

Practical Constraints

•
Integer evaluation points give condition number $\sim K^K$ — unusable for $K > 20$
•
Chebyshev points yield condition number $\sim 2^K$ — much better but still bad for $K > 40$
•
Exact finite-field arithmetic avoids all conditioning issues

📋 Ref: Fahim, Cadambe, Grover 2021 IEEE T-IT

Historical Note: From Polynomial Codes to a Whole Sub-Field

2017–present

Polynomial codes (Yu et al. 2017) gave the coded-computing community its first clean, optimal, deterministic construction. The follow-on literature is substantial: MatDot codes (Dutta-Fahim-Cadambe 2019) for minimum recovery threshold; Entangled polynomial codes (Yu-Maddah-Ali-Avestimehr 2020) for the tradeoff frontier between polynomial and MatDot; and the $T$ -private extensions (Chen-Tandon 2018, Yang-Jafar 2019) that power the secure-aggregation work of Part III.

A decade later, coded computing is a recognized sub-field of information theory with its own dedicated journal sections and workshops. The Li / Avestimehr 2020 survey is the standard reference; Chapter 11's ByzSecAgg result is a CommIT-group contribution extending the line.

, ,

Quick Check

A $3$ -private polynomial code for $p = 3, q = 4$ matrix multiplication has what recovery threshold $K$ ?

$K = pq + 2T = 18$

$K = pq + T = 15$

$K = pq = 12$

$K = 2pq = 24$