Entangled Polynomial Codes

The Entangled Trick

Section 8.1 showed that convolution's recovery threshold K=p+qβˆ’1K = p + q - 1 is linear in the partitions, not quadratic. The construction that achieves this is the entangled polynomial code introduced by Yu, Maddah-Ali, and Avestimehr (2020), which generalizes to the broader class of computations whose output has fewer degrees of freedom than matrix multiplication.

The point is that the standard polynomial code of Chapter 5 "wastes" degrees of freedom by encoding each output block as a separate coefficient of the product polynomial. The entangled variant assigns exponents so that related output blocks share coefficients β€” the algebraic identity of convolution is exploited directly in the encoding.

This section develops the entangled construction formally, proves the K=p+qβˆ’1K = p + q - 1 recovery threshold, and characterizes the storage / communication tradeoff. Section 8.3 generalizes further to the Lagrange Coded Computing framework for arbitrary multivariate polynomials.

Definition:

Entangled Polynomial Code

Consider a convolution-like bilinear operation where the desired output is a vector (or tensor) c\mathbf{c} whose components are bilinear in two inputs a=(a1,…,ap)\mathbf{a} = (\mathbf{a}_1, \ldots, \mathbf{a}_p) and b=(b1,…,bq)\mathbf{b} = (\mathbf{b}_1, \ldots, \mathbf{b}_q): ckβ€…β€Š=β€…β€Šβˆ‘i+j=kaibj,k=0,1,…,p+qβˆ’2.\mathbf{c}_k \;=\; \sum_{i + j = k} \mathbf{a}_i \mathbf{b}_j, \qquad k = 0, 1, \ldots, p + q - 2. The entangled polynomial code with NN workers (over Fq\mathbb{F}_q, qβ‰₯Nq \geq N) consists of:

  • Encoding polynomials. pa(x)=βˆ‘i=0pβˆ’1ai+1xip_{\mathbf{a}}(x) = \sum_{i=0}^{p-1} \mathbf{a}_{i+1} x^i and pb(x)=βˆ‘j=0qβˆ’1bj+1xjp_{\mathbf{b}}(x) = \sum_{j=0}^{q-1} \mathbf{b}_{j+1} x^j.
  • Worker storage. a~k=pa(Ξ±k)\tilde{\mathbf{a}}_k = p_{\mathbf{a}}(\alpha_k), b~k=pb(Ξ±k)\tilde{\mathbf{b}}_k = p_{\mathbf{b}}(\alpha_k) for distinct nonzero Ξ±k∈Fqβˆ—\alpha_k \in \mathbb{F}_q^*.
  • Worker computation. c~k=a~kβˆ—b~k\tilde{\mathbf{c}}_k = \tilde{\mathbf{a}}_k * \tilde{\mathbf{b}}_k, evaluating the bilinear operation on the encoded inputs. This is c~k=pc(Ξ±k)\tilde{\mathbf{c}}_k = p_{\mathbf{c}}(\alpha_k), the product polynomial evaluated at Ξ±k\alpha_k.
  • Decoder. Master collects any K=p+qβˆ’1K = p + q - 1 evaluations and interpolates the degree-(p+qβˆ’2)(p + q - 2) polynomial pcp_{\mathbf{c}} via Lagrange. Its coefficients are the output blocks.

The key difference from Chapter 5's standard polynomial code: the exponents i,ji, j in pap_{\mathbf{a}} and pbp_{\mathbf{b}} are both (pβˆ’1),(qβˆ’1)(p - 1), (q - 1) respectively β€” with no "separation" factor pp in b\mathbf{b}'s exponents. The product polynomial has degree p+qβˆ’2p + q - 2, not pqβˆ’1pq - 1. The convolution structure makes this possible; general matrix multiplication cannot use this trick at the same storage.

Entangled Polynomial-Code Encoding

Complexity: O(N(p+q))O(N(p + q)) linear combinations per worker
Input: Matrices A=[A1,…,Ap]\mathbf{A} = [\mathbf{A}_1, \ldots, \mathbf{A}_p]
(for convolution context, a\mathbf{a} is the convolution
filter split into pp pieces) and
B=[B1,…,Bq]\mathbf{B} = [\mathbf{B}_1, \ldots, \mathbf{B}_q], NN
distinct Ξ±1,…,Ξ±N∈Fqβˆ—\alpha_1, \ldots, \alpha_N \in \mathbb{F}_q^*.
Output: Per-worker encoded matrices
{(A~k,B~k)}k=1N\{(\tilde{\mathbf{A}}_k, \tilde{\mathbf{B}}_k)\}_{k=1}^N.
1. for k=1,2,…,Nk = 1, 2, \ldots, N do
2. A~kβ†βˆ‘i=0pβˆ’1Ξ±ki Ai+1\quad \tilde{\mathbf{A}}_k \leftarrow \sum_{i=0}^{p-1} \alpha_k^{i}\, \mathbf{A}_{i+1}
3. B~kβ†βˆ‘j=0qβˆ’1Ξ±kj Bj+1\quad \tilde{\mathbf{B}}_k \leftarrow \sum_{j=0}^{q-1} \alpha_k^{j}\, \mathbf{B}_{j+1}
4. end for
5. return {(A~k,B~k)}\{(\tilde{\mathbf{A}}_k, \tilde{\mathbf{B}}_k)\}

Contrast with the standard polynomial code (Chapter 5 Alg. 5.2): B\mathbf{B} uses exponent pjpj there, and jj here. This small change is the entire difference between K=pqK = pq and K=p+qβˆ’1K = p + q - 1.

Theorem: Entangled Polynomial Code: K=p+qβˆ’1K = p + q - 1

For convolution-like bilinear operations with inputs partitioned as (p,q)(p, q), the entangled polynomial code with NN workers over Fq\mathbb{F}_q (qβ‰₯Nq \geq N) satisfies:

  1. Correctness. Any K=p+qβˆ’1K = p + q - 1 worker responses recover the entire output c\mathbf{c}.
  2. Storage. Per-worker storage is ∣A∣/p+∣B∣/q|\mathbf{A}|/p + |\mathbf{B}|/q β€” identical to the standard polynomial code.
  3. Optimality. K=p+qβˆ’1K = p + q - 1 is information-theoretically tight for convolution-like computations at this storage level.

The improvement over the K=pqK = pq of general matrix multiplication is achieved without extra storage β€” exclusively by exploiting the reduced degree structure of the convolution output.

The convolution output has p+qβˆ’1p + q - 1 distinct coefficients (indexed by the sum i+ji + j), not pqpq. The entangled encoding aligns the exponents so that each worker's response is one evaluation of a polynomial of degree p+qβˆ’2p + q - 2. Any p+qβˆ’1p + q - 1 evaluations interpolate β€” matching the theoretical minimum.

The practical impact: for a p=q=16p = q = 16 convolution, the recovery threshold drops from K=256K = 256 (matrix mult) to K=31K = 31 β€” an 8Γ—8\times improvement in straggler tolerance at the same storage.

Example: Entangled Code for (p,q)=(3,4)(p, q) = (3, 4)

Construct the entangled polynomial code for a (p=3,q=4)(p = 3, q = 4)-partitioned convolution, N=10N = 10 workers. Compare with the standard polynomial code's recovery threshold.

Entangled vs. Standard Polynomial-Code Frontier

Plot the storage-vs-recovery-threshold frontier for (i) standard polynomial codes (K=pqK = pq at storage ΞΌ=1/p+1/q\mu = 1/p + 1/q), and (ii) entangled polynomial codes (K=p+qβˆ’1K = p + q - 1 for convolution). Both occupy specific points in the two-dimensional tradeoff space; the entangled variant dominates only for structured outputs (convolution, matrix-vector products).

Parameters
24

Number of workers

6

Why the Output Structure Matters

The key difference between matrix multiplication and convolution is algebraic: matrix multiplication produces pqpq independent output blocks (each Cij\mathbf{C}_{ij} is a genuinely distinct sum), while convolution produces p+qβˆ’1p + q - 1 output coefficients (each ck\mathbf{c}_k is a sum over a single index i+j=ki + j = k). The output structure has fewer degrees of freedom, allowing fewer responses to recover it.

More generally: any bilinear operation whose output is a degree-(p+qβˆ’2)(p + q - 2) polynomial (in some variable) can be coded with K=p+qβˆ’1K = p + q - 1 recovery threshold. Matrix multiplication's K=pqK = pq is a "worst case" over bilinear operations; convolution, matrix-vector multiplication, and Kronecker products all fall into the "better" class.

Coded Bilinear Operations: Output Structure vs. Recovery Threshold

OperationOutput structureRecovery threshold KKCode
Matrix mult. ATB\mathbf{A}^T \mathbf{B}pqpq independent blocksK=pqK = pqStandard polynomial code (Ch. 5)
Matrix-vector Ax\mathbf{A}\mathbf{x}pp blocks (vector output)K=pK = pPolynomial code with q=1q = 1
Convolution aβˆ—b\mathbf{a} * \mathbf{b}p+qβˆ’1p + q - 1 coefficientsK=p+qβˆ’1K = p + q - 1Entangled polynomial code (this chapter)
Tensor contraction, order 3Varies by contraction typeK=p+qβˆ’1K = p + q - 1 or higherLagrange Coded Computing (Β§8.3)

Common Mistake: Don't Use Entangled Codes for General Matrix Multiplication

Mistake:

Apply the entangled polynomial code to general matrix multiplication expecting K=p+qβˆ’1K = p + q - 1.

Correction:

The entangled code is tight only for bilinear operations whose output has at most p+qβˆ’1p + q - 1 distinct values (convolutions, matrix-vector, Kronecker). For general matrix multiplication, which has pqpq distinct output blocks, applying the entangled code produces an incorrect result β€” the pqpq blocks cannot be recovered from p+qβˆ’1p + q - 1 evaluations of a degree-(p+qβˆ’2)(p+q-2) polynomial. Use the standard polynomial code (Chapter 5) for general matrix multiplication.

πŸ”§Engineering Note

Entangled Codes in Production

Entangled polynomial codes have seen some deployment for convolutional-heavy workloads (CNN training, image / audio signal processing in ML pipelines). The main advantage is the linear-in-partition recovery threshold. The main engineering barrier is that the encoding is tighter coupled to the operation's algebraic structure β€” a generic coded-computing framework must detect the operation type (matmul vs. convolution vs. tensor contraction) and select the appropriate code.

Frameworks that integrate coded computing (NVIDIA DALI, research PyTorch extensions) usually offer entangled codes for convolution specifically, and fall back to the standard polynomial code for general matmul. Users specify the operation via a tensor-operation API; the framework picks the right code.

Practical Constraints
  • β€’

    Entangled code requires convolution-like output structure

  • β€’

    Standard polynomial code still needed for general matmul

  • β€’

    Framework must auto-detect operation type

πŸ“‹ Ref: NVIDIA DALI; PyTorch CodedDist research fork

Key Takeaway

Entangled polynomial codes achieve K=p+qβˆ’1K = p + q - 1 for convolution and matrix-vector products β€” linear in the partitions, vs. quadratic for general matrix multiplication. The gain comes from exploiting the reduced degree structure of the output polynomial. Use entangled codes when the output has convolution-like structure; stick to standard polynomial codes for general matrix multiplication.

Quick Check

Compared to the standard polynomial code for matrix multiplication with (p,q)=(4,6)(p, q) = (4, 6), the entangled polynomial code for convolution at the same partitions:

Uses the same per-worker storage and achieves K=9K = 9 instead of K=24K = 24

Uses less per-worker storage but achieves the same KK

Uses more storage and is never worth it

Is equivalent to MatDot codes