Sparse Matrices with scipy.sparse

When Most of Your Matrix Is Zero

Many real-world matrices are sparse: the vast majority of entries are zero. A $10{,}000 \times 10{,}000$ dense matrix requires 800 MB of memory (double precision), but if only 0.01% of entries are nonzero, a sparse format stores just 80 KB of data plus indexing overhead.

Sparse matrices arise in finite element methods, graph Laplacians, network adjacency matrices, and large-scale channel models in massive MIMO where the propagation matrix has a sparse structure due to limited scattering.

Definition:
Sparse Matrix

A matrix is called sparse when the number of nonzero entries $\mathrm{nnz}$ is much smaller than the total number of entries $mn$ . The sparsity is: $\text{sparsity} = 1 - \frac{\mathrm{nnz}}{mn}$

Sparse storage formats exploit this by storing only the nonzero entries and their positions, reducing both memory and computation from $O(mn)$ to $O(\mathrm{nnz})$ .

Definition:
COO (Coordinate) Format

The COO (coordinate list) format stores three arrays:

row: row indices of nonzero entries
col: column indices of nonzero entries
data: values of nonzero entries

from scipy.sparse import coo_matrix

row = [0, 0, 1, 2, 2]
col = [0, 2, 1, 0, 2]
data = [1.0, 3.0, 2.0, 4.0, 5.0]
A = coo_matrix((data, (row, col)), shape=(3, 3))

COO is ideal for construction (easy to add entries) but inefficient for arithmetic and slicing. Convert to CSR or CSC for computation.

Definition:
CSR (Compressed Sparse Row) Format

The CSR format stores:

data: nonzero values (length nnz)
indices: column indices of nonzeros (length nnz)
indptr: row pointer array (length $m + 1$ ), where row $i$ has nonzeros at data[indptr[i]:indptr[i+1]]

from scipy.sparse import csr_matrix

A_csr = csr_matrix(A_coo)   # convert from COO
A_csr = csr_matrix((data, indices, indptr), shape=(m, n))

CSR is optimal for:

Row slicing: A[i, :]
Matrix-vector products: A @ x
Sparse solvers (LAPACK/ARPACK expect CSR or CSC)

Definition:
CSC (Compressed Sparse Column) Format

The CSC format is the column-oriented analog of CSR:

data: nonzero values
indices: row indices of nonzeros
indptr: column pointer array

from scipy.sparse import csc_matrix
A_csc = csc_matrix(A_coo)

CSC is optimal for column slicing (A[:, j]) and is the format used internally by many direct sparse solvers (e.g., SuperLU, CHOLMOD).

Definition:
Sparse Construction Functions

SciPy provides efficient constructors for common sparse patterns:

from scipy.sparse import eye, diags, block_diag, kron, random

I = eye(n, format='csr')                     # sparse identity
D = diags([1, -2, 1], [-1, 0, 1], shape=(n, n))  # tridiagonal
B = block_diag([A1, A2, A3])                 # block diagonal
K = kron(A, B, format='csr')                 # Kronecker product
S = random(1000, 1000, density=0.01)         # random sparse

The diags function is especially useful for constructing banded matrices like the finite-difference Laplacian.

Theorem: Sparse vs Dense: When Sparse Wins

For a matrix $\mathbf{A} \in \mathbb{R}^{n \times n}$ with $\mathrm{nnz}$ nonzero entries:

Sparse matrix-vector product costs $O(\mathrm{nnz})$ vs $O(n^2)$ dense
Sparse memory is $O(\mathrm{nnz})$ vs $O(n^2)$ dense

Sparse format overhead means dense is faster when $\mathrm{nnz} > c \cdot n^2$ for some constant $c \approx 0.1$ -- $0.3$ depending on the format and hardware.

Sparse formats trade computational simplicity for memory efficiency. The indirection through index arrays causes cache misses, so sparse operations have higher per-element cost. The crossover depends on the sparsity pattern and the specific operation.

Example: 1D Laplacian as a Sparse Tridiagonal Matrix

Construct the $n \times n$ second-difference matrix $\mathbf{L}$ with $-2$ on the diagonal and $1$ on the sub/super-diagonals. Solve the Poisson equation $-\mathbf{L}\mathbf{u} = \mathbf{f}$ for $n = 10{,}000$ using a sparse solver.

Solution

Construct sparse Laplacian

import numpy as np
from scipy.sparse import diags
from scipy.sparse.linalg import spsolve

n = 10_000
L = diags([1, -2, 1], [-1, 0, 1], shape=(n, n), format='csr')

Solve the system

f = np.ones(n)
u = spsolve(-L, f)
print(f"Solution shape: {u.shape}")
print(f"Max value: {u.max():.4f}")
# For uniform f, solution is a parabola

Memory comparison

dense_mb = n * n * 8 / 1e6
sparse_mb = (L.data.nbytes + L.indices.nbytes + L.indptr.nbytes) / 1e6
print(f"Dense: {dense_mb:.0f} MB")
print(f"Sparse: {sparse_mb:.3f} MB")
print(f"Ratio: {dense_mb / sparse_mb:.0f}x")

Example: Finding a Few Eigenvalues of a Large Sparse Matrix

Find the 5 smallest eigenvalues of a $5000 \times 5000$ sparse Laplacian matrix using scipy.sparse.linalg.eigsh.

Solution

Sparse eigensolver

from scipy.sparse import diags
from scipy.sparse.linalg import eigsh

n = 5000
L = diags([1, -2, 1], [-1, 0, 1], shape=(n, n), format='csr')
L = -L  # make positive semidefinite

# Find 5 smallest eigenvalues
eigenvalues, eigenvectors = eigsh(L, k=5, which='SM')
print("Smallest eigenvalues:")
for i, lam in enumerate(eigenvalues):
    print(f"  lambda_{i} = {lam:.6f}")

Compare with analytical solution

The eigenvalues of the discrete Laplacian are $\lambda_k = 2 - 2\cos(\pi k / (n+1))$ for $k = 1, \ldots, n$ . The smallest eigenvalues approach $\pi^2 k^2 / (n+1)^2$ .

Example: Converting Between Sparse Formats

Build a sparse matrix in COO format, convert to CSR and CSC, and benchmark matrix-vector products in each format.

Solution

Build and convert

import numpy as np
from scipy.sparse import random
import time

n = 5000
A_coo = random(n, n, density=0.01, format='coo')
A_csr = A_coo.tocsr()
A_csc = A_coo.tocsc()
x = np.random.randn(n)

for name, A in [('COO', A_coo), ('CSR', A_csr), ('CSC', A_csc)]:
    t0 = time.perf_counter()
    for _ in range(100):
        y = A @ x
    elapsed = (time.perf_counter() - t0) / 100
    print(f"{name}: {elapsed*1000:.2f} ms per matvec")

Result

CSR is typically fastest for matrix-vector products because it accesses rows sequentially. COO is slowest because it must sort by row to perform the product.

Sparse Format Performance Comparison

Compare memory usage and matvec speed of COO, CSR, and CSC formats as a function of matrix size and density.

Parameters

Maximum Matrix Size5000

Nonzero Density0.01

sparse matrix

A matrix where the vast majority of entries are zero. Stored in compressed formats (COO, CSR, CSC) that use $O(\mathrm{nnz})$ memory instead of $O(mn)$ .

Related: CSR (Compressed Sparse Row)

CSR (Compressed Sparse Row)

A sparse matrix format using three arrays: data (nonzero values), indices (column indices), and indptr (row pointers). Efficient for row slicing and matrix-vector products.

Related: sparse matrix

Common Mistake: Converting Sparse to Dense Unnecessarily

Mistake:

Calling .toarray() or .todense() on a large sparse matrix to use dense NumPy operations, defeating the purpose of sparse storage.

Correction:

Use scipy.sparse.linalg functions (spsolve, eigsh, svds, gmres, cg) that operate directly on sparse matrices. If you must convert, first check that the dense matrix fits in memory: n * m * 8 / 1e9 gives the size in GB.

Common Mistake: Doing Arithmetic in COO Format

Mistake:

Performing repeated matrix-vector products or element access on a COO matrix, which requires sorting and is much slower than CSR.

Correction:

Build in COO (easy to append entries), then convert to CSR or CSC for computation: A_csr = A_coo.tocsr(). The conversion is $O(\mathrm{nnz})$ .

Key Takeaway

Build sparse in COO, compute in CSR. Use scipy.sparse.diags, eye, and block_diag for structured matrices. Use spsolve and eigsh instead of converting to dense. A $10{,}000 \times 10{,}000$ sparse matrix with 1% density uses 1000x less memory than dense.

Sparse Matrix Operations

python

CSR/CSC/COO construction, format conversion, sparse solvers, and sparse eigenproblems with performance benchmarks.

# Code from: ch06/python/sparse_matrices.py
# Load from backend supplements endpoint

Sparse Eigenproblems

python

Using eigsh and svds for large sparse problems, shift-invert mode, and preconditioned iterative solvers.

# Code from: ch06/python/sparse_eigenproblems.py
# Load from backend supplements endpoint

Eigenvalue and Singular Value Decomposition Kronecker Products — Efficient Implementation

Sparse Matrices with scipy.sparse

When Most of Your Matrix Is Zero

Definition: Sparse Matrix

Definition: COO (Coordinate) Format

Definition: CSR (Compressed Sparse Row) Format

Definition: CSC (Compressed Sparse Column) Format

Definition: Sparse Construction Functions

Theorem: Sparse vs Dense: When Sparse Wins

Example: 1D Laplacian as a Sparse Tridiagonal Matrix

Construct sparse Laplacian

Solve the system

Memory comparison

Example: Finding a Few Eigenvalues of a Large Sparse Matrix

Sparse eigensolver

Compare with analytical solution

Example: Converting Between Sparse Formats

Build and convert

Result

Sparse Format Performance Comparison

Parameters

Sparse Storage Formats Compared

sparse matrix

CSR (Compressed Sparse Row)

Common Mistake: Converting Sparse to Dense Unnecessarily

Common Mistake: Doing Arithmetic in COO Format

Key Takeaway

Sparse Matrix Operations

Sparse Eigenproblems

Definition:
Sparse Matrix

Definition:
COO (Coordinate) Format

Definition:
CSR (Compressed Sparse Row) Format

Definition:
CSC (Compressed Sparse Column) Format

Definition:
Sparse Construction Functions