Ferkans — Interactive Telecom Tutor

Why Matrix Multiplication Looks Like Interference

Distributed matrix multiplication is the canonical computation we will study in detail in Chapter 5. The question is: given $\mathbf{A} \in \mathbb{F}_q^{m \times d}$ and $\mathbf{B} \in \mathbb{F}_q^{m \times d}$ , compute $\mathbf{A}^T \mathbf{B}$ using $N$ workers, each storing only fractions of $\mathbf{A}$ and $\mathbf{B}$ .

The point is that each worker's local product is a sum of many sub-products $\mathbf{a}_i^T \mathbf{b}_j$ — most of which are "interferers" (entries of $\mathbf{A}^T\mathbf{B}$ the master needs from other workers, not this one). The master must invert $N$ such observations to recover the desired $\mathbf{A}^T\mathbf{B}$ block by block. The structure is exactly that of an interference channel where the cross-channel matrices are determined by the coding scheme: choose them to make the interference align, and the recovery becomes efficient.

This section formalizes the analogy and gives the IA-based achievability for a basic version of coded matrix multiplication — the generalization that hits the optimal recovery threshold is polynomial coding, the subject of Chapter 5. Reading this section first explains why polynomial codes work; Chapter 5 will work out how.

Definition:
Coded Matrix Multiplication as an Interference Channel

Partition $\mathbf{A} = [\mathbf{A}_1, \ldots, \mathbf{A}_p]$ and $\mathbf{B} = [\mathbf{B}_1, \ldots, \mathbf{B}_q]$ column-wise into $p$ and $q$ equal-size blocks respectively. The desired product has $pq$ blocks $\mathbf{C}_{ij} \;=\; \mathbf{A}_i^T \mathbf{B}_j, \qquad i \in [p],\, j \in [q].$ A storage scheme assigns to worker $k$ a pair of encoded matrices $\tilde{\mathbf{A}}_k = \sum_i \alpha_{ki} \mathbf{A}_i$ and $\tilde{\mathbf{B}}_k = \sum_j \beta_{kj} \mathbf{B}_j$ (over $\mathbb{F}_q$ ). Worker $k$ computes $\tilde{\mathbf{C}}_k \;=\; \tilde{\mathbf{A}}_k^T \tilde{\mathbf{B}}_k \;=\; \sum_{i, j} \alpha_{ki} \beta_{kj} \, \mathbf{C}_{ij},$ which is a linear combination of all $pq$ desired blocks. The master receives one such combination from each worker, giving an $N$ -dimensional linear system in $pq$ unknowns: $\tilde{\mathbf{C}} \;=\; \mathbf{M} \, \mathbf{c},$ where $\mathbf{c} \in (\mathbb{F}_q^{d^2})^{pq}$ stacks the desired blocks and $\mathbf{M} \in \mathbb{F}_q^{N \times pq}$ is the encoding matrix with entries $\mathbf{M}_{k,(i,j)} = \alpha_{ki} \beta_{kj}$ .

The interference-channel analogy is exact: the master is the receiver, the $pq$ desired blocks are the messages, the workers are the symbol slots, and the encoding matrix $\mathbf{M}$ plays the role of the channel matrix. The master "decodes" by inverting $\mathbf{M}$ on any $K$ rows.

Encoding Matrix $\mathbf{M}$

The $N \times pq$ matrix whose $(k, (i,j))$ entry is the coefficient of $\mathbf{C}_{ij} = \mathbf{A}_i^T \mathbf{B}_j$ in worker $k$ 's response. The number of rows the master needs in order to invert $\mathbf{M}$ is the recovery threshold $K$ (Chapter 5).

Theorem: IA-Based Recovery Threshold for Coded Matrix Multiplication

For the distributed matrix-multiplication scheme above with partition counts $p, q$ , the recovery threshold $K$ — the minimum number of worker responses sufficient to reconstruct $\mathbf{A}^T \mathbf{B}$ — satisfies $K \;\geq\; pq,$ with equality achieved by a generic random encoding matrix $\mathbf{M} \in \mathbb{F}_q^{N \times pq}$ over a sufficiently large field $\mathbb{F}_q$ , $q \geq pq$ . The proof is a direct application of finite-field IA: any $pq$ rows of a generic $\mathbf{M}$ form an invertible square submatrix.

The polynomial-code construction of Chapter 5 achieves $K = pq$ deterministically — without random sampling — by choosing $\alpha_{ki} = \alpha_k^i$ and $\beta_{kj} = \alpha_k^{pj}$ for distinct $\alpha_k \in \mathbb{F}_q^*$ . The connection to IA is that this deterministic choice makes the encoding matrix a Vandermonde matrix, which has the same generic-rank property by construction.

Each worker contributes one linear equation in the $pq$ unknown blocks. Generically, $pq$ equations are needed to invert. The IA insight is that we can choose the encoding matrix so that any $pq$ rows are jointly informative (no coincidental rank deficiencies) — the coding analogue of "interferers all aligning into the same subspace at every receiver".

Proof

Lower bound — counting

The master must reconstruct $pq$ unknown matrix blocks. Each worker response is one $\mathbb{F}_q^{d^2}$ -vector — at most one linear equation per block-coordinate. Hence at least $pq$ responses are required, giving $K \geq pq$ .

Achievability — generic random encoding

Let $\mathbf{M}$ have i.i.d. uniform entries in $\mathbb{F}_q$ , $q \geq pq$ . Any $pq$ rows form a square random matrix; the probability of singularity is $\leq pq/q$ , vanishing as $q \to \infty$ . Hence with high probability the master can invert any $pq$ responses.

Achievability — polynomial codes

Choose $\alpha_{ki} = \alpha_k^i$ , $\beta_{kj} = \alpha_k^{pj}$ for distinct $\alpha_k \in \mathbb{F}_q^*$ . The encoding matrix becomes Vandermonde-like with all $pq \times pq$ submatrices invertible. The worker product is then $\tilde{\mathbf{C}}_k = p_{\mathbf{A}\mathbf{B}} (\alpha_k)$ where $p_{\mathbf{A}\mathbf{B}}$ is a polynomial of degree $pq - 1$ , and the master interpolates from any $pq$ evaluations to recover all coefficients, which are exactly the blocks $\mathbf{C}_{ij}$ . This is the construction of Chapter 5. $\blacksquare$

,

Example: $p = q = 2$ , $N = 4$ Workers

Set $p = q = 2$ (so the desired product has $pq = 4$ blocks). With $N = 4$ workers, what is the minimum recovery threshold, and how does the polynomial code achieve it?

Solution

Lower bound

From the theorem, $K \geq pq = 4$ . With only $N = 4$ workers and $K = 4$ , the master needs all responses — no straggler tolerance.

Polynomial-code construction

Choose evaluation points $\alpha_1 = 1, \alpha_2 = 2, \alpha_3 = 3, \alpha_4 = 4$ over $\mathbb{F}_5$ . Worker $k$ stores $\tilde{\mathbf{A}}_k = \mathbf{A}_1 + \alpha_k \mathbf{A}_2$ and $\tilde{\mathbf{B}}_k = \mathbf{B}_1 + \alpha_k^p \mathbf{B}_2 = \mathbf{B}_1 + \alpha_k^2 \mathbf{B}_2$ . Worker $k$ computes $\tilde{\mathbf{C}}_k = \mathbf{C}_{11} + \alpha_k \mathbf{C}_{21} + \alpha_k^2 \mathbf{C}_{12} + \alpha_k^3 \mathbf{C}_{22}$ .

Master decoding

The four worker responses are evaluations of the polynomial $p(x) = \mathbf{C}_{11} + x\mathbf{C}_{21} + x^2 \mathbf{C}_{12} + x^3 \mathbf{C}_{22}$ at $x = 1, 2, 3, 4$ . Lagrange interpolation recovers all four coefficients = all four blocks. $\blacksquare$

Adding redundancy

With $N = 6$ workers and the same polynomial-code construction, $K = 4$ is preserved — the master can tolerate up to $N - K = 2$ stragglers. This is exactly the gain coded matrix multiplication brings in Chapter 5.

Recovery Threshold $K$ vs. Workers $N$ at Various Partitions

Plot the recovery threshold $K = pq$ for coded matrix multiplication as a function of the partition counts $(p, q)$ and the number of workers $N$ . As $N$ grows beyond $K$ , the straggler tolerance $N - K$ improves, but the per-worker storage stays at $\mu = 1/p + 1/q$ (twice the disjoint baseline of $1/p \cdot 1/q$ ). The plot shows the trade-off between recovery threshold (smaller is better for stragglers) and storage (smaller is better for memory).

Parameters

N

— workers16

Total number of workers

p

— column partitions of A4

Number of column blocks of A

q

— column partitions of B4

Number of column blocks of B

Why This Matters: Polynomial Codes Are the Optimal IA Construction

The deterministic polynomial-code construction of Chapter 5 achieves the same recovery threshold $K = pq$ as the generic random IA scheme of this section, but with two advantages: (i) zero failure probability (the Vandermonde matrix is always invertible), (ii) explicit decoding via Lagrange interpolation (cheaper than general matrix inversion). The theoretical content is identical; the engineering story is that explicit constructions matter when reproducibility and complexity are at stake.

Common Mistake: Recovery Threshold Is $pq$ , Not $p + q$

Mistake:

Confuse the partition counts $p$ and $q$ with their sum, expecting a recovery threshold of $K = p + q$ .

Correction:

The desired output has $pq$ scalar blocks (the entries of $\mathbf{A}^T \mathbf{B}$ partitioned into a $p \times q$ grid). Each must be reconstructed, so $K \geq pq$ . This is why coded matrix multiplication has a quadratic recovery threshold in the partition counts — and why approximate schemes (Chapter 6's gradient coding) try to beat this by settling for an approximation of the product rather than every block.

🔧Engineering Note

Why Polynomial Codes Replace Random IA in Practice

On a 24-node Amazon EC2 cluster, Yu et al. measured a $7\times$ speedup of polynomial-coded matrix multiplication over uncoded for a $10^4 \times 10^4$ matrix product. Random IA-based schemes achieve the same theoretical recovery threshold but suffer from (i) a non-zero failure probability over fields too small for exact alignment, (ii) the cost of inverting an arbitrary $pq \times pq$ matrix vs. the $\mathcal{O}(K^2)$ Lagrange interpolation. In production, deterministic polynomial codes are the rule and randomized IA is reserved for theoretical existence proofs.

Practical Constraints

•
Polynomial-code decoder runs in $\mathcal{O}(K^2)$ field ops; random IA needs $\mathcal{O}(K^3)$
•
Deterministic constructions allow worst-case analysis; random IA only with-high-probability
•
Yu et al.: $K = 16$ , EC2, $7\times$ speedup over uncoded

📋 Ref: Yu/Maddah-Ali/Avestimehr 2017 NeurIPS

Key Takeaway

IA is the algebraic tool, polynomial codes are the deterministic construction. The IA framework explains why a recovery threshold of $K = pq$ is achievable for coded matrix multiplication, by reducing the question to invertibility of generic submatrices. Polynomial codes (Chapter 5) make the construction explicit and remove all randomness from the analysis.

Quick Check

For coded matrix multiplication with column partitions $p = 3, q = 4$ , the optimal recovery threshold is:

$K = 7$ ( $p + q$ )

$K = 12$ ( $pq$ )

$K = 1$

$K = N$