PyTorch Linear Algebra
Definition: The torch.linalg Module
The torch.linalg Module
PyTorch provides torch.linalg, a namespace that mirrors NumPy's
numpy.linalg with identical function signatures:
import torch
A = torch.randn(4, 4, dtype=torch.float64)
U, S, Vh = torch.linalg.svd(A) # SVD
eigenvalues = torch.linalg.eigvalsh(A @ A.T) # Hermitian eigenvalues
x = torch.linalg.solve(A, torch.randn(4, dtype=torch.float64)) # solve
n = torch.linalg.norm(A) # Frobenius norm
All torch.linalg functions:
- Support GPU tensors (CUDA via cuSOLVER/cuBLAS).
- Are differentiable via autograd.
- Support batched operations on leading dimensions.
Definition: Batched Linear Algebra
Batched Linear Algebra
Most torch.linalg functions support batched inputs: if the
input has shape , the operation is applied
independently to each matrix in the batch.
# Batch of 100 channel matrices, each 4x4
H = torch.randn(100, 4, 4, dtype=torch.complex128)
# SVD of all 100 matrices in one call
U, S, Vh = torch.linalg.svd(H)
# U: (100, 4, 4), S: (100, 4), Vh: (100, 4, 4)
# Solve 100 systems simultaneously
b = torch.randn(100, 4, 1, dtype=torch.complex128)
x = torch.linalg.solve(H, b) # (100, 4, 1)
Batching is vastly more efficient than Python loops because the GPU kernel processes all matrices in parallel.
In MIMO wireless, you often have thousands of channel realizations to process. Batched SVD on GPU can handle 10,000 matrices in the time it takes NumPy to do one SVD on CPU.
Theorem: Differentiating Through SVD
For , the gradient of a scalar loss w.r.t. involves:
where for . PyTorch handles this automatically, but the formula reveals a numerical instability when singular values are close or repeated.
Differentiating through eigenvalue decompositions is tricky because the eigenvectors are not unique (sign flips, rotations in degenerate subspaces). The matrix has poles where singular values collide.
Example: Batched SVD on GPU
Compute the SVD of 1000 random complex matrices on GPU and verify the reconstruction .
Implementation
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
A = torch.randn(1000, 8, 8, dtype=torch.complex128, device=device)
U, S, Vh = torch.linalg.svd(A)
# Reconstruct
A_hat = U @ torch.diag_embed(S.to(torch.complex128)) @ Vh
# Check
error = torch.norm(A - A_hat, dim=(-2, -1)).max()
print(f"Max reconstruction error: {error.item():.2e}")
Performance Note
On a modern GPU, batched SVD of 1000 matrices takes about 2-5 ms, compared to 50-100 ms on CPU with NumPy loops.
Example: Optimizing Eigenvalues via Autograd
Find a symmetric matrix that maximizes the smallest eigenvalue
while keeping , using gradient ascent through
torch.linalg.eigvalsh.
Implementation
import torch
n = 4
# Parameterize as A = L @ L.T to ensure symmetry and PSD
L = torch.randn(n, n, dtype=torch.float64, requires_grad=True)
optimizer = torch.optim.Adam([L], lr=0.01)
for step in range(500):
A = L @ L.T
A = A / A.trace() # normalize trace to 1
eigvals = torch.linalg.eigvalsh(A)
loss = -eigvals[0] # maximize smallest eigenvalue
optimizer.zero_grad()
loss.backward()
optimizer.step()
if step % 100 == 0:
print(f"Step {step}: lambda_min = {eigvals[0].item():.6f}")
# Optimal: A = I/n, all eigenvalues = 1/n
print(f"Final eigenvalues: {torch.linalg.eigvalsh(L @ L.T / (L @ L.T).trace())}")
Expected Result
The optimal solution is , where all eigenvalues equal . This maximizes the minimum eigenvalue subject to the trace constraint.
SVD Low-Rank Approximation
Adjust the rank and see how the truncated SVD approximation converges to the original matrix.
Parameters
NumPy vs. PyTorch Linear Algebra
| Operation | NumPy | PyTorch | Differentiable? |
|---|---|---|---|
| SVD | np.linalg.svd | torch.linalg.svd | Yes |
| Eigenvalues (Hermitian) | np.linalg.eigh | torch.linalg.eigh | Yes |
| Solve Ax=b | np.linalg.solve | torch.linalg.solve | Yes |
| Cholesky | np.linalg.cholesky | torch.linalg.cholesky | Yes |
| QR | np.linalg.qr | torch.linalg.qr | Yes |
| Norm | np.linalg.norm | torch.linalg.norm | Yes |
| Pseudoinverse | np.linalg.pinv | torch.linalg.pinv | Yes |
| Batched | No | Yes (leading dims) | Yes |
Quick Check
What is the shape of S when calling torch.linalg.svd on a tensor
of shape (batch, 4, 6)?
(batch, 4, 6)
(batch, 4)
(batch, 6)
(batch, 4, 4)
SVD returns min(m,n) singular values per matrix, so S has shape (batch, 4).
Common Mistake: NaN Gradients from Degenerate SVD
Mistake:
Differentiating through SVD when singular values are repeated or near-zero produces NaN or Inf gradients because the backward formula involves .
Correction:
Add a small perturbation to the matrix before SVD when
differentiating, or use torch.linalg.svd with caution and
check for NaN gradients:
A_perturbed = A + 1e-6 * torch.eye(A.shape[-1], device=A.device)
U, S, Vh = torch.linalg.svd(A_perturbed)
Why This Matters: Batched SVD for MIMO Channel Analysis
In MIMO-OFDM systems, each subcarrier has its own channel matrix
. With 1024 subcarriers and 100 time slots, you need
SVDs of 102,400 matrices. Batched torch.linalg.svd on GPU handles
this in milliseconds, enabling real-time precoding and waterfilling
capacity computation.
See full treatment in Chapter 35
Key Takeaway
The torch.linalg module mirrors NumPy's linear algebra API but adds
GPU acceleration, autograd support, and batched operations. Batching
is the key to performance: process thousands of matrices in a single
kernel launch instead of Python loops.
Batched Operation
A linear algebra operation applied independently to each matrix in a batch, indexed by the leading dimensions of the tensor.
PyTorch Linear Algebra Operations
# Code from: ch12/python/pytorch_linalg.py
# Load from backend supplements endpoint