Sparse Matrices with scipy.sparse

When Most of Your Matrix Is Zero

Many real-world matrices are sparse: the vast majority of entries are zero. A 10,000Γ—10,00010{,}000 \times 10{,}000 dense matrix requires 800 MB of memory (double precision), but if only 0.01% of entries are nonzero, a sparse format stores just 80 KB of data plus indexing overhead.

Sparse matrices arise in finite element methods, graph Laplacians, network adjacency matrices, and large-scale channel models in massive MIMO where the propagation matrix has a sparse structure due to limited scattering.

Definition:

Sparse Matrix

A matrix is called sparse when the number of nonzero entries nnz\mathrm{nnz} is much smaller than the total number of entries mnmn. The sparsity is: sparsity=1βˆ’nnzmn\text{sparsity} = 1 - \frac{\mathrm{nnz}}{mn}

Sparse storage formats exploit this by storing only the nonzero entries and their positions, reducing both memory and computation from O(mn)O(mn) to O(nnz)O(\mathrm{nnz}).

Definition:

COO (Coordinate) Format

The COO (coordinate list) format stores three arrays:

  • row: row indices of nonzero entries
  • col: column indices of nonzero entries
  • data: values of nonzero entries
from scipy.sparse import coo_matrix

row = [0, 0, 1, 2, 2]
col = [0, 2, 1, 0, 2]
data = [1.0, 3.0, 2.0, 4.0, 5.0]
A = coo_matrix((data, (row, col)), shape=(3, 3))

COO is ideal for construction (easy to add entries) but inefficient for arithmetic and slicing. Convert to CSR or CSC for computation.

Definition:

CSR (Compressed Sparse Row) Format

The CSR format stores:

  • data: nonzero values (length nnz)
  • indices: column indices of nonzeros (length nnz)
  • indptr: row pointer array (length m+1m + 1), where row ii has nonzeros at data[indptr[i]:indptr[i+1]]
from scipy.sparse import csr_matrix

A_csr = csr_matrix(A_coo)   # convert from COO
A_csr = csr_matrix((data, indices, indptr), shape=(m, n))

CSR is optimal for:

  • Row slicing: A[i, :]
  • Matrix-vector products: A @ x
  • Sparse solvers (LAPACK/ARPACK expect CSR or CSC)

Definition:

CSC (Compressed Sparse Column) Format

The CSC format is the column-oriented analog of CSR:

  • data: nonzero values
  • indices: row indices of nonzeros
  • indptr: column pointer array
from scipy.sparse import csc_matrix
A_csc = csc_matrix(A_coo)

CSC is optimal for column slicing (A[:, j]) and is the format used internally by many direct sparse solvers (e.g., SuperLU, CHOLMOD).

Definition:

Sparse Construction Functions

SciPy provides efficient constructors for common sparse patterns:

from scipy.sparse import eye, diags, block_diag, kron, random

I = eye(n, format='csr')                     # sparse identity
D = diags([1, -2, 1], [-1, 0, 1], shape=(n, n))  # tridiagonal
B = block_diag([A1, A2, A3])                 # block diagonal
K = kron(A, B, format='csr')                 # Kronecker product
S = random(1000, 1000, density=0.01)         # random sparse

The diags function is especially useful for constructing banded matrices like the finite-difference Laplacian.

Theorem: Sparse vs Dense: When Sparse Wins

For a matrix A∈RnΓ—n\mathbf{A} \in \mathbb{R}^{n \times n} with nnz\mathrm{nnz} nonzero entries:

  • Sparse matrix-vector product costs O(nnz)O(\mathrm{nnz}) vs O(n2)O(n^2) dense
  • Sparse memory is O(nnz)O(\mathrm{nnz}) vs O(n2)O(n^2) dense

Sparse format overhead means dense is faster when nnz>cβ‹…n2\mathrm{nnz} > c \cdot n^2 for some constant cβ‰ˆ0.1c \approx 0.1--0.30.3 depending on the format and hardware.

Sparse formats trade computational simplicity for memory efficiency. The indirection through index arrays causes cache misses, so sparse operations have higher per-element cost. The crossover depends on the sparsity pattern and the specific operation.

Example: 1D Laplacian as a Sparse Tridiagonal Matrix

Construct the nΓ—nn \times n second-difference matrix L\mathbf{L} with βˆ’2-2 on the diagonal and 11 on the sub/super-diagonals. Solve the Poisson equation βˆ’Lu=f-\mathbf{L}\mathbf{u} = \mathbf{f} for n=10,000n = 10{,}000 using a sparse solver.

Example: Finding a Few Eigenvalues of a Large Sparse Matrix

Find the 5 smallest eigenvalues of a 5000Γ—50005000 \times 5000 sparse Laplacian matrix using scipy.sparse.linalg.eigsh.

Example: Converting Between Sparse Formats

Build a sparse matrix in COO format, convert to CSR and CSC, and benchmark matrix-vector products in each format.

Sparse Format Performance Comparison

Compare memory usage and matvec speed of COO, CSR, and CSC formats as a function of matrix size and density.

Parameters
5000
0.01

Sparse Storage Formats Compared

Sparse Storage Formats Compared

sparse matrix

A matrix where the vast majority of entries are zero. Stored in compressed formats (COO, CSR, CSC) that use O(nnz)O(\mathrm{nnz}) memory instead of O(mn)O(mn).

Related: CSR (Compressed Sparse Row)

CSR (Compressed Sparse Row)

A sparse matrix format using three arrays: data (nonzero values), indices (column indices), and indptr (row pointers). Efficient for row slicing and matrix-vector products.

Related: sparse matrix

Common Mistake: Converting Sparse to Dense Unnecessarily

Mistake:

Calling .toarray() or .todense() on a large sparse matrix to use dense NumPy operations, defeating the purpose of sparse storage.

Correction:

Use scipy.sparse.linalg functions (spsolve, eigsh, svds, gmres, cg) that operate directly on sparse matrices. If you must convert, first check that the dense matrix fits in memory: n * m * 8 / 1e9 gives the size in GB.

Common Mistake: Doing Arithmetic in COO Format

Mistake:

Performing repeated matrix-vector products or element access on a COO matrix, which requires sorting and is much slower than CSR.

Correction:

Build in COO (easy to append entries), then convert to CSR or CSC for computation: A_csr = A_coo.tocsr(). The conversion is O(nnz)O(\mathrm{nnz}).

Key Takeaway

Build sparse in COO, compute in CSR. Use scipy.sparse.diags, eye, and block_diag for structured matrices. Use spsolve and eigsh instead of converting to dense. A 10,000Γ—10,00010{,}000 \times 10{,}000 sparse matrix with 1% density uses 1000x less memory than dense.

Sparse Matrix Operations

python
CSR/CSC/COO construction, format conversion, sparse solvers, and sparse eigenproblems with performance benchmarks.
# Code from: ch06/python/sparse_matrices.py
# Load from backend supplements endpoint

Sparse Eigenproblems

python
Using eigsh and svds for large sparse problems, shift-invert mode, and preconditioned iterative solvers.
# Code from: ch06/python/sparse_eigenproblems.py
# Load from backend supplements endpoint