IA for Distributed Matrix Multiplication

Why Matrix Multiplication Looks Like Interference

Distributed matrix multiplication is the canonical computation we will study in detail in Chapter 5. The question is: given A∈FqmΓ—d\mathbf{A} \in \mathbb{F}_q^{m \times d} and B∈FqmΓ—d\mathbf{B} \in \mathbb{F}_q^{m \times d}, compute ATB\mathbf{A}^T \mathbf{B} using NN workers, each storing only fractions of A\mathbf{A} and B\mathbf{B}.

The point is that each worker's local product is a sum of many sub-products aiTbj\mathbf{a}_i^T \mathbf{b}_j β€” most of which are "interferers" (entries of ATB\mathbf{A}^T\mathbf{B} the master needs from other workers, not this one). The master must invert NN such observations to recover the desired ATB\mathbf{A}^T\mathbf{B} block by block. The structure is exactly that of an interference channel where the cross-channel matrices are determined by the coding scheme: choose them to make the interference align, and the recovery becomes efficient.

This section formalizes the analogy and gives the IA-based achievability for a basic version of coded matrix multiplication β€” the generalization that hits the optimal recovery threshold is polynomial coding, the subject of Chapter 5. Reading this section first explains why polynomial codes work; Chapter 5 will work out how.

Definition:

Coded Matrix Multiplication as an Interference Channel

Partition A=[A1,…,Ap]\mathbf{A} = [\mathbf{A}_1, \ldots, \mathbf{A}_p] and B=[B1,…,Bq]\mathbf{B} = [\mathbf{B}_1, \ldots, \mathbf{B}_q] column-wise into pp and qq equal-size blocks respectively. The desired product has pqpq blocks Cijβ€…β€Š=β€…β€ŠAiTBj,i∈[p], j∈[q].\mathbf{C}_{ij} \;=\; \mathbf{A}_i^T \mathbf{B}_j, \qquad i \in [p],\, j \in [q]. A storage scheme assigns to worker kk a pair of encoded matrices A~k=βˆ‘iΞ±kiAi\tilde{\mathbf{A}}_k = \sum_i \alpha_{ki} \mathbf{A}_i and B~k=βˆ‘jΞ²kjBj\tilde{\mathbf{B}}_k = \sum_j \beta_{kj} \mathbf{B}_j (over Fq\mathbb{F}_q). Worker kk computes C~kβ€…β€Š=β€…β€ŠA~kTB~kβ€…β€Š=β€…β€Šβˆ‘i,jΞ±kiΞ²kj Cij,\tilde{\mathbf{C}}_k \;=\; \tilde{\mathbf{A}}_k^T \tilde{\mathbf{B}}_k \;=\; \sum_{i, j} \alpha_{ki} \beta_{kj} \, \mathbf{C}_{ij}, which is a linear combination of all pqpq desired blocks. The master receives one such combination from each worker, giving an NN-dimensional linear system in pqpq unknowns: C~β€…β€Š=β€…β€ŠM c,\tilde{\mathbf{C}} \;=\; \mathbf{M} \, \mathbf{c}, where c∈(Fqd2)pq\mathbf{c} \in (\mathbb{F}_q^{d^2})^{pq} stacks the desired blocks and M∈FqNΓ—pq\mathbf{M} \in \mathbb{F}_q^{N \times pq} is the encoding matrix with entries Mk,(i,j)=Ξ±kiΞ²kj\mathbf{M}_{k,(i,j)} = \alpha_{ki} \beta_{kj}.

The interference-channel analogy is exact: the master is the receiver, the pqpq desired blocks are the messages, the workers are the symbol slots, and the encoding matrix M\mathbf{M} plays the role of the channel matrix. The master "decodes" by inverting M\mathbf{M} on any KK rows.

Encoding Matrix M\mathbf{M}

The NΓ—pqN \times pq matrix whose (k,(i,j))(k, (i,j)) entry is the coefficient of Cij=AiTBj\mathbf{C}_{ij} = \mathbf{A}_i^T \mathbf{B}_j in worker kk's response. The number of rows the master needs in order to invert M\mathbf{M} is the recovery threshold KK (Chapter 5).

Theorem: IA-Based Recovery Threshold for Coded Matrix Multiplication

For the distributed matrix-multiplication scheme above with partition counts p,qp, q, the recovery threshold KK β€” the minimum number of worker responses sufficient to reconstruct ATB\mathbf{A}^T \mathbf{B} β€” satisfies Kβ€…β€Šβ‰₯β€…β€Špq,K \;\geq\; pq, with equality achieved by a generic random encoding matrix M∈FqNΓ—pq\mathbf{M} \in \mathbb{F}_q^{N \times pq} over a sufficiently large field Fq\mathbb{F}_q, qβ‰₯pqq \geq pq. The proof is a direct application of finite-field IA: any pqpq rows of a generic M\mathbf{M} form an invertible square submatrix.

The polynomial-code construction of Chapter 5 achieves K=pqK = pq deterministically β€” without random sampling β€” by choosing Ξ±ki=Ξ±ki\alpha_{ki} = \alpha_k^i and Ξ²kj=Ξ±kpj\beta_{kj} = \alpha_k^{pj} for distinct Ξ±k∈Fqβˆ—\alpha_k \in \mathbb{F}_q^*. The connection to IA is that this deterministic choice makes the encoding matrix a Vandermonde matrix, which has the same generic-rank property by construction.

Each worker contributes one linear equation in the pqpq unknown blocks. Generically, pqpq equations are needed to invert. The IA insight is that we can choose the encoding matrix so that any pqpq rows are jointly informative (no coincidental rank deficiencies) β€” the coding analogue of "interferers all aligning into the same subspace at every receiver".

,

Example: p=q=2p = q = 2, N=4N = 4 Workers

Set p=q=2p = q = 2 (so the desired product has pq=4pq = 4 blocks). With N=4N = 4 workers, what is the minimum recovery threshold, and how does the polynomial code achieve it?

Recovery Threshold KK vs. Workers NN at Various Partitions

Plot the recovery threshold K=pqK = pq for coded matrix multiplication as a function of the partition counts (p,q)(p, q) and the number of workers NN. As NN grows beyond KK, the straggler tolerance Nβˆ’KN - K improves, but the per-worker storage stays at ΞΌ=1/p+1/q\mu = 1/p + 1/q (twice the disjoint baseline of 1/pβ‹…1/q1/p \cdot 1/q). The plot shows the trade-off between recovery threshold (smaller is better for stragglers) and storage (smaller is better for memory).

Parameters
16

Total number of workers

4

Number of column blocks of A

4

Number of column blocks of B

Why This Matters: Polynomial Codes Are the Optimal IA Construction

The deterministic polynomial-code construction of Chapter 5 achieves the same recovery threshold K=pqK = pq as the generic random IA scheme of this section, but with two advantages: (i) zero failure probability (the Vandermonde matrix is always invertible), (ii) explicit decoding via Lagrange interpolation (cheaper than general matrix inversion). The theoretical content is identical; the engineering story is that explicit constructions matter when reproducibility and complexity are at stake.

Common Mistake: Recovery Threshold Is pqpq, Not p+qp + q

Mistake:

Confuse the partition counts pp and qq with their sum, expecting a recovery threshold of K=p+qK = p + q.

Correction:

The desired output has pqpq scalar blocks (the entries of ATB\mathbf{A}^T \mathbf{B} partitioned into a pΓ—qp \times q grid). Each must be reconstructed, so Kβ‰₯pqK \geq pq. This is why coded matrix multiplication has a quadratic recovery threshold in the partition counts β€” and why approximate schemes (Chapter 6's gradient coding) try to beat this by settling for an approximation of the product rather than every block.

πŸ”§Engineering Note

Why Polynomial Codes Replace Random IA in Practice

On a 24-node Amazon EC2 cluster, Yu et al. measured a 7Γ—7\times speedup of polynomial-coded matrix multiplication over uncoded for a 104Γ—10410^4 \times 10^4 matrix product. Random IA-based schemes achieve the same theoretical recovery threshold but suffer from (i) a non-zero failure probability over fields too small for exact alignment, (ii) the cost of inverting an arbitrary pqΓ—pqpq \times pq matrix vs. the O(K2)\mathcal{O}(K^2) Lagrange interpolation. In production, deterministic polynomial codes are the rule and randomized IA is reserved for theoretical existence proofs.

Practical Constraints
  • β€’

    Polynomial-code decoder runs in O(K2)\mathcal{O}(K^2) field ops; random IA needs O(K3)\mathcal{O}(K^3)

  • β€’

    Deterministic constructions allow worst-case analysis; random IA only with-high-probability

  • β€’

    Yu et al.: K=16K = 16, EC2, 7Γ—7\times speedup over uncoded

πŸ“‹ Ref: Yu/Maddah-Ali/Avestimehr 2017 NeurIPS

Key Takeaway

IA is the algebraic tool, polynomial codes are the deterministic construction. The IA framework explains why a recovery threshold of K=pqK = pq is achievable for coded matrix multiplication, by reducing the question to invertibility of generic submatrices. Polynomial codes (Chapter 5) make the construction explicit and remove all randomness from the analysis.

Quick Check

For coded matrix multiplication with column partitions p=3,q=4p = 3, q = 4, the optimal recovery threshold is:

K=7K = 7 (p+qp + q)

K=12K = 12 (pqpq)

K=1K = 1

K=NK = N