Ferkans — Interactive Telecom Tutor

The Entangled Trick

Section 8.1 showed that convolution's recovery threshold $K = p + q - 1$ is linear in the partitions, not quadratic. The construction that achieves this is the entangled polynomial code introduced by Yu, Maddah-Ali, and Avestimehr (2020), which generalizes to the broader class of computations whose output has fewer degrees of freedom than matrix multiplication.

The point is that the standard polynomial code of Chapter 5 "wastes" degrees of freedom by encoding each output block as a separate coefficient of the product polynomial. The entangled variant assigns exponents so that related output blocks share coefficients — the algebraic identity of convolution is exploited directly in the encoding.

This section develops the entangled construction formally, proves the $K = p + q - 1$ recovery threshold, and characterizes the storage / communication tradeoff. Section 8.3 generalizes further to the Lagrange Coded Computing framework for arbitrary multivariate polynomials.

Definition:
Entangled Polynomial Code

Consider a convolution-like bilinear operation where the desired output is a vector (or tensor) $\mathbf{c}$ whose components are bilinear in two inputs $\mathbf{a} = (\mathbf{a}_1, \ldots, \mathbf{a}_p)$ and $\mathbf{b} = (\mathbf{b}_1, \ldots, \mathbf{b}_q)$ : $\mathbf{c}_k \;=\; \sum_{i + j = k} \mathbf{a}_i \mathbf{b}_j, \qquad k = 0, 1, \ldots, p + q - 2.$ The entangled polynomial code with $N$ workers (over $\mathbb{F}_q$ , $q \geq N$ ) consists of:

Encoding polynomials. $p_{\mathbf{a}}(x) = \sum_{i=0}^{p-1} \mathbf{a}_{i+1} x^i$ and $p_{\mathbf{b}}(x) = \sum_{j=0}^{q-1} \mathbf{b}_{j+1} x^j$ .
Worker storage. $\tilde{\mathbf{a}}_k = p_{\mathbf{a}}(\alpha_k)$ , $\tilde{\mathbf{b}}_k = p_{\mathbf{b}}(\alpha_k)$ for distinct nonzero $\alpha_k \in \mathbb{F}_q^*$ .
Worker computation. $\tilde{\mathbf{c}}_k = \tilde{\mathbf{a}}_k * \tilde{\mathbf{b}}_k$ , evaluating the bilinear operation on the encoded inputs. This is $\tilde{\mathbf{c}}_k = p_{\mathbf{c}}(\alpha_k)$ , the product polynomial evaluated at $\alpha_k$ .
Decoder. Master collects any $K = p + q - 1$ evaluations and interpolates the degree- $(p + q - 2)$ polynomial $p_{\mathbf{c}}$ via Lagrange. Its coefficients are the output blocks.

The key difference from Chapter 5's standard polynomial code: the exponents $i, j$ in $p_{\mathbf{a}}$ and $p_{\mathbf{b}}$ are both $(p - 1), (q - 1)$ respectively — with no "separation" factor $p$ in $\mathbf{b}$ 's exponents. The product polynomial has degree $p + q - 2$ , not $pq - 1$ . The convolution structure makes this possible; general matrix multiplication cannot use this trick at the same storage.

Entangled Polynomial-Code Encoding

Complexity:

O(N(p + q))

linear combinations per worker

Input: Matrices

\mathbf{A} = [\mathbf{A}_1, \ldots, \mathbf{A}_p]

(for convolution context,

\mathbf{a}

is the convolution

filter split into

p

pieces) and

\mathbf{B} = [\mathbf{B}_1, \ldots, \mathbf{B}_q]

,

N

distinct

\alpha_1, \ldots, \alpha_N \in \mathbb{F}_q^*

.

Output: Per-worker encoded matrices

\{(\tilde{\mathbf{A}}_k, \tilde{\mathbf{B}}_k)\}_{k=1}^N

.

1. for

k = 1, 2, \ldots, N

do

2.

\quad \tilde{\mathbf{A}}_k \leftarrow \sum_{i=0}^{p-1} \alpha_k^{i}\, \mathbf{A}_{i+1}

3.

\quad \tilde{\mathbf{B}}_k \leftarrow \sum_{j=0}^{q-1} \alpha_k^{j}\, \mathbf{B}_{j+1}

4. end for

5. return

\{(\tilde{\mathbf{A}}_k, \tilde{\mathbf{B}}_k)\}

Contrast with the standard polynomial code (Chapter 5 Alg. 5.2): $\mathbf{B}$ uses exponent $pj$ there, and $j$ here. This small change is the entire difference between $K = pq$ and $K = p + q - 1$ .

Theorem: Entangled Polynomial Code: $K = p + q - 1$

For convolution-like bilinear operations with inputs partitioned as $(p, q)$ , the entangled polynomial code with $N$ workers over $\mathbb{F}_q$ ( $q \geq N$ ) satisfies:

Correctness. Any $K = p + q - 1$ worker responses recover the entire output $\mathbf{c}$ .
Storage. Per-worker storage is $|\mathbf{A}|/p + |\mathbf{B}|/q$ — identical to the standard polynomial code.
Optimality. $K = p + q - 1$ is information-theoretically tight for convolution-like computations at this storage level.

The improvement over the $K = pq$ of general matrix multiplication is achieved without extra storage — exclusively by exploiting the reduced degree structure of the convolution output.

The convolution output has $p + q - 1$ distinct coefficients (indexed by the sum $i + j$ ), not $pq$ . The entangled encoding aligns the exponents so that each worker's response is one evaluation of a polynomial of degree $p + q - 2$ . Any $p + q - 1$ evaluations interpolate — matching the theoretical minimum.

The practical impact: for a $p = q = 16$ convolution, the recovery threshold drops from $K = 256$ (matrix mult) to $K = 31$ — an $8\times$ improvement in straggler tolerance at the same storage.

Proof

Correctness

$\tilde{\mathbf{c}}_k = \tilde{\mathbf{A}}_k * \tilde{\mathbf{B}}_k$ . Expanding using the convolution algebraic identity: $\tilde{\mathbf{c}}_k = (p_A \cdot p_B)(\alpha_k)$ , a polynomial of degree $p + q - 2$ whose $k$ -th coefficient is the $k$ -th convolution output $\mathbf{c}_k$ . Lagrange interpolation on $p + q - 1$ evaluations recovers the polynomial exactly, and therefore all output coefficients.

Converse

The convolution output has $p + q - 1$ distinct components, each requiring at least one informational-theoretic response to reconstruct. By a cut-set argument (§2.4 recipe), no scheme can use fewer than $p + q - 1$ responses. Hence $K^* \geq p + q - 1$ , matching achievability.

Storage

$\tilde{\mathbf{A}}_k$ and $\tilde{\mathbf{B}}_k$ each have the size of one block of the respective matrix. Per-worker storage is unchanged from the standard polynomial code. $\blacksquare$

Example: Entangled Code for $(p, q) = (3, 4)$

Construct the entangled polynomial code for a $(p = 3, q = 4)$ -partitioned convolution, $N = 10$ workers. Compare with the standard polynomial code's recovery threshold.

Solution

Encoding

$p_{\mathbf{A}}(x) = \mathbf{A}_1 + x \mathbf{A}_2 + x^2 \mathbf{A}_3$ (degree 2). $p_{\mathbf{B}}(x) = \mathbf{B}_1 + x \mathbf{B}_2 + x^2 \mathbf{B}_3 + x^3 \mathbf{B}_4$ (degree 3).

Product polynomial

$p_{\mathbf{c}}(x) = p_A(x) p_B(x)$ has degree 5 and $p + q - 1 = 6$ distinct coefficients.

Recovery threshold

$K_{\text{entangled}}^* = p + q - 1 = 6$ . With $N = 10$ workers, straggler budget $= N - K = 4$ .

Comparison with standard

Standard polynomial code: $K_{\text{standard}} = pq = 12$ . The entangled variant is $2\times$ more straggler- tolerant at the same storage — but only for convolution-like structured computations.

Entangled vs. Standard Polynomial-Code Frontier

Plot the storage-vs-recovery-threshold frontier for (i) standard polynomial codes ( $K = pq$ at storage $\mu = 1/p + 1/q$ ), and (ii) entangled polynomial codes ( $K = p + q - 1$ for convolution). Both occupy specific points in the two-dimensional tradeoff space; the entangled variant dominates only for structured outputs (convolution, matrix-vector products).

Parameters

N

— workers24

Number of workers

p

— max partitions (with

q = p

)6

Why the Output Structure Matters

The key difference between matrix multiplication and convolution is algebraic: matrix multiplication produces $pq$ independent output blocks (each $\mathbf{C}_{ij}$ is a genuinely distinct sum), while convolution produces $p + q - 1$ output coefficients (each $\mathbf{c}_k$ is a sum over a single index $i + j = k$ ). The output structure has fewer degrees of freedom, allowing fewer responses to recover it.

More generally: any bilinear operation whose output is a degree- $(p + q - 2)$ polynomial (in some variable) can be coded with $K = p + q - 1$ recovery threshold. Matrix multiplication's $K = pq$ is a "worst case" over bilinear operations; convolution, matrix-vector multiplication, and Kronecker products all fall into the "better" class.

Coded Bilinear Operations: Output Structure vs. Recovery Threshold

Operation	Output structure	Recovery threshold $K$	Code
Matrix mult. $\mathbf{A}^T \mathbf{B}$	$pq$ independent blocks	$K = pq$	Standard polynomial code (Ch. 5)
Matrix-vector $\mathbf{A}\mathbf{x}$	$p$ blocks (vector output)	$K = p$	Polynomial code with $q = 1$
Convolution $\mathbf{a} * \mathbf{b}$	$p + q - 1$ coefficients	$K = p + q - 1$	Entangled polynomial code (this chapter)
Tensor contraction, order 3	Varies by contraction type	$K = p + q - 1$ or higher	Lagrange Coded Computing (§8.3)

Common Mistake: Don't Use Entangled Codes for General Matrix Multiplication

Mistake:

Apply the entangled polynomial code to general matrix multiplication expecting $K = p + q - 1$ .

Correction:

The entangled code is tight only for bilinear operations whose output has at most $p + q - 1$ distinct values (convolutions, matrix-vector, Kronecker). For general matrix multiplication, which has $pq$ distinct output blocks, applying the entangled code produces an incorrect result — the $pq$ blocks cannot be recovered from $p + q - 1$ evaluations of a degree- $(p+q-2)$ polynomial. Use the standard polynomial code (Chapter 5) for general matrix multiplication.

🔧Engineering Note

Entangled Codes in Production

Entangled polynomial codes have seen some deployment for convolutional-heavy workloads (CNN training, image / audio signal processing in ML pipelines). The main advantage is the linear-in-partition recovery threshold. The main engineering barrier is that the encoding is tighter coupled to the operation's algebraic structure — a generic coded-computing framework must detect the operation type (matmul vs. convolution vs. tensor contraction) and select the appropriate code.

Frameworks that integrate coded computing (NVIDIA DALI, research PyTorch extensions) usually offer entangled codes for convolution specifically, and fall back to the standard polynomial code for general matmul. Users specify the operation via a tensor-operation API; the framework picks the right code.

Practical Constraints

•
Entangled code requires convolution-like output structure
•
Standard polynomial code still needed for general matmul
•
Framework must auto-detect operation type

📋 Ref: NVIDIA DALI; PyTorch CodedDist research fork

Key Takeaway

Entangled polynomial codes achieve $K = p + q - 1$ for convolution and matrix-vector products — linear in the partitions, vs. quadratic for general matrix multiplication. The gain comes from exploiting the reduced degree structure of the output polynomial. Use entangled codes when the output has convolution-like structure; stick to standard polynomial codes for general matrix multiplication.

Quick Check

Compared to the standard polynomial code for matrix multiplication with $(p, q) = (4, 6)$ , the entangled polynomial code for convolution at the same partitions:

Uses the same per-worker storage and achieves $K = 9$ instead of $K = 24$

Uses less per-worker storage but achieves the same $K$

Uses more storage and is never worth it

Is equivalent to MatDot codes