PyTorch Linear Algebra

Definition:
The torch.linalg Module

PyTorch provides torch.linalg, a namespace that mirrors NumPy's numpy.linalg with identical function signatures:

import torch

A = torch.randn(4, 4, dtype=torch.float64)
U, S, Vh = torch.linalg.svd(A)                # SVD
eigenvalues = torch.linalg.eigvalsh(A @ A.T)   # Hermitian eigenvalues
x = torch.linalg.solve(A, torch.randn(4, dtype=torch.float64))  # solve
n = torch.linalg.norm(A)                       # Frobenius norm

All torch.linalg functions:

Support GPU tensors (CUDA via cuSOLVER/cuBLAS).
Are differentiable via autograd.
Support batched operations on leading dimensions.

Definition:
Batched Linear Algebra

Most torch.linalg functions support batched inputs: if the input has shape $(\ldots, m, n)$ , the operation is applied independently to each matrix in the batch.

# Batch of 100 channel matrices, each 4x4
H = torch.randn(100, 4, 4, dtype=torch.complex128)

# SVD of all 100 matrices in one call
U, S, Vh = torch.linalg.svd(H)
# U: (100, 4, 4), S: (100, 4), Vh: (100, 4, 4)

# Solve 100 systems simultaneously
b = torch.randn(100, 4, 1, dtype=torch.complex128)
x = torch.linalg.solve(H, b)   # (100, 4, 1)

Batching is vastly more efficient than Python loops because the GPU kernel processes all matrices in parallel.

In MIMO wireless, you often have thousands of channel realizations to process. Batched SVD on GPU can handle 10,000 $4 \times 4$ matrices in the time it takes NumPy to do one $4 \times 4$ SVD on CPU.

Theorem: Differentiating Through SVD

For $\mathbf{A} = \mathbf{U}\boldsymbol{\Sigma}\mathbf{V}^H$ , the gradient of a scalar loss $L$ w.r.t. $\mathbf{A}$ involves:

$\frac{\partial L}{\partial \mathbf{A}} = \mathbf{U} \left(\frac{\partial L}{\partial \boldsymbol{\Sigma}} + \mathbf{F} \circ (\mathbf{U}^H \frac{\partial L}{\partial \mathbf{U}} - \frac{\partial L}{\partial \mathbf{V}} \mathbf{V}^H)\right) \mathbf{V}^H$

where $F_{ij} = (\sigma_j^2 - \sigma_i^2)^{-1}$ for $i \neq j$ . PyTorch handles this automatically, but the formula reveals a numerical instability when singular values are close or repeated.

Differentiating through eigenvalue decompositions is tricky because the eigenvectors are not unique (sign flips, rotations in degenerate subspaces). The $F$ matrix has poles where singular values collide.

Example: Batched SVD on GPU

Compute the SVD of 1000 random $8 \times 8$ complex matrices on GPU and verify the reconstruction $\mathbf{A} \approx \mathbf{U}\boldsymbol{\Sigma}\mathbf{V}^H$ .

Solution

Implementation

import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
A = torch.randn(1000, 8, 8, dtype=torch.complex128, device=device)

U, S, Vh = torch.linalg.svd(A)

# Reconstruct
A_hat = U @ torch.diag_embed(S.to(torch.complex128)) @ Vh

# Check
error = torch.norm(A - A_hat, dim=(-2, -1)).max()
print(f"Max reconstruction error: {error.item():.2e}")

Performance Note

On a modern GPU, batched SVD of 1000 $8 \times 8$ matrices takes about 2-5 ms, compared to 50-100 ms on CPU with NumPy loops.

Example: Optimizing Eigenvalues via Autograd

Find a symmetric matrix $\mathbf{A}$ that maximizes the smallest eigenvalue while keeping $\mathrm{tr}(\mathbf{A}) = 1$ , using gradient ascent through torch.linalg.eigvalsh.

Solution

Implementation

import torch

n = 4
# Parameterize as A = L @ L.T to ensure symmetry and PSD
L = torch.randn(n, n, dtype=torch.float64, requires_grad=True)
optimizer = torch.optim.Adam([L], lr=0.01)

for step in range(500):
    A = L @ L.T
    A = A / A.trace()                # normalize trace to 1
    eigvals = torch.linalg.eigvalsh(A)
    loss = -eigvals[0]                # maximize smallest eigenvalue
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if step % 100 == 0:
        print(f"Step {step}: lambda_min = {eigvals[0].item():.6f}")

# Optimal: A = I/n, all eigenvalues = 1/n
print(f"Final eigenvalues: {torch.linalg.eigvalsh(L @ L.T / (L @ L.T).trace())}")

Expected Result

The optimal solution is $\mathbf{A} = \frac{1}{n}\mathbf{I}$ , where all eigenvalues equal $1/n = 0.25$ . This maximizes the minimum eigenvalue subject to the trace constraint.

SVD Low-Rank Approximation

Adjust the rank $k$ and see how the truncated SVD approximation $\mathbf{A}_k = \sum_{i=1}^k \sigma_i \mathbf{u}_i \mathbf{v}_i^T$ converges to the original matrix.

Parameters

NumPy vs. PyTorch Linear Algebra

Operation	NumPy	PyTorch	Differentiable?
SVD	np.linalg.svd	torch.linalg.svd	Yes
Eigenvalues (Hermitian)	np.linalg.eigh	torch.linalg.eigh	Yes
Solve Ax=b	np.linalg.solve	torch.linalg.solve	Yes
Cholesky	np.linalg.cholesky	torch.linalg.cholesky	Yes
QR	np.linalg.qr	torch.linalg.qr	Yes
Norm	np.linalg.norm	torch.linalg.norm	Yes
Pseudoinverse	np.linalg.pinv	torch.linalg.pinv	Yes
Batched	No	Yes (leading dims)	Yes

Quick Check

What is the shape of S when calling torch.linalg.svd on a tensor of shape (batch, 4, 6)?

(batch, 4, 6)

(batch, 4)

(batch, 6)

(batch, 4, 4)

Correction:

(batch, 4)

SVD returns min(m,n) singular values per matrix, so S has shape (batch, 4).

Common Mistake: NaN Gradients from Degenerate SVD

Mistake:

Differentiating through SVD when singular values are repeated or near-zero produces NaN or Inf gradients because the backward formula involves $(\sigma_i^2 - \sigma_j^2)^{-1}$ .

Correction:

Add a small perturbation to the matrix before SVD when differentiating, or use torch.linalg.svd with caution and check for NaN gradients:

A_perturbed = A + 1e-6 * torch.eye(A.shape[-1], device=A.device)
U, S, Vh = torch.linalg.svd(A_perturbed)

Why This Matters: Batched SVD for MIMO Channel Analysis

In MIMO-OFDM systems, each subcarrier has its own channel matrix $\mathbf{H}_k$ . With 1024 subcarriers and 100 time slots, you need SVDs of 102,400 matrices. Batched torch.linalg.svd on GPU handles this in milliseconds, enabling real-time precoding and waterfilling capacity computation.

See full treatment in Chapter 35

Key Takeaway

The torch.linalg module mirrors NumPy's linear algebra API but adds GPU acceleration, autograd support, and batched operations. Batching is the key to performance: process thousands of matrices in a single kernel launch instead of Python loops.

Batched Operation

A linear algebra operation applied independently to each matrix in a batch, indexed by the leading dimensions of the tensor.

PyTorch Linear Algebra Operations

python

Comprehensive examples of torch.linalg: SVD, eigendecomposition, solve, Cholesky, QR, and batched operations with timing benchmarks.

# Code from: ch12/python/pytorch_linalg.py
# Load from backend supplements endpoint

Complex Tensors in PyTorch Interoperability: NumPy, CuPy, PyTorch

PyTorch Linear Algebra

Definition: The torch.linalg Module

Definition: Batched Linear Algebra

Theorem: Differentiating Through SVD

Example: Batched SVD on GPU

Implementation

Performance Note

Example: Optimizing Eigenvalues via Autograd

Implementation

Expected Result

SVD Low-Rank Approximation

Parameters

NumPy vs. PyTorch Linear Algebra

Quick Check

Common Mistake: NaN Gradients from Degenerate SVD

Why This Matters: Batched SVD for MIMO Channel Analysis

Key Takeaway

Batched Operation

PyTorch Linear Algebra Operations

Definition:
The torch.linalg Module

Definition:
Batched Linear Algebra