CuPy Sparse Matrices
cuSPARSE: Sparse Linear Algebra on the GPU
CuPy provides GPU-accelerated sparse matrix operations through
NVIDIA's cuSPARSE library. The cupyx.scipy.sparse module
mirrors SciPy's sparse API, supporting CSR, CSC, COO, and DIA
formats. For large sparse matrices (millions of rows), GPU
sparse matvec can be 5-20x faster than SciPy on CPU.
Definition: Sparse Matrix Formats on GPU
Sparse Matrix Formats on GPU
CuPy supports the standard sparse formats, all backed by cuSPARSE:
import cupy as cp
import cupyx.scipy.sparse as csp
# COO format β good for construction
row = cp.array([0, 1, 2, 0])
col = cp.array([0, 1, 2, 2])
data = cp.array([1.0, 2.0, 3.0, 4.0])
A_coo = csp.coo_matrix((data, (row, col)), shape=(3, 3))
# CSR format β best for matvec and row slicing
A_csr = A_coo.tocsr()
# CSC format β best for column slicing
A_csc = A_coo.tocsc()
The rule of thumb: build in COO, compute in CSR (same as SciPy, but on the GPU).
CuPy sparse matrices store their index and value arrays in GPU
memory. Use csp.csr_matrix(scipy_sparse_matrix) to transfer
a SciPy sparse matrix to the GPU.
Definition: Sparse Matrix-Vector Product on GPU
Sparse Matrix-Vector Product on GPU
The sparse matrix-vector product
for a CSR matrix uses cuSPARSE's csrmv kernel:
x = cp.random.randn(n)
y = A_csr @ x # or A_csr.dot(x)
The operation touches only the nonzero elements, giving complexity instead of for dense.
For sparse matrix-matrix products:
C = A_csr @ B_csr # sparse-sparse multiply
D = A_csr @ X_dense # sparse-dense multiply (SpMM)
Theorem: Sparse vs Dense Crossover on GPU
For a matrix with density , the sparse CSR matvec is faster than dense matvec when:
In practice, sparse GPU matvec wins when -- (1-5% density), depending on matrix size and GPU model. Below this threshold, the sparse format's irregular memory access pattern is outweighed by the reduction in data volume.
Dense matvec has perfect memory access patterns (coalesced reads) but touches elements. Sparse matvec touches only elements but with irregular access. The crossover depends on how much irregularity penalty the GPU incurs.
Example: Sparse Laplacian Matvec on GPU
Build the 2-D Laplacian stencil as a sparse CSR matrix on the GPU and apply it to a random vector. Compare speed against dense matvec.
Construction
import cupy as cp
import cupyx.scipy.sparse as csp
import numpy as np
import scipy.sparse as sp
n = 500 # grid size -> matrix is n^2 x n^2
N = n * n # 250,000
# Build on CPU with SciPy, transfer to GPU
L1d = sp.diags([-1, 2, -1], [-1, 0, 1], shape=(n, n))
I = sp.eye(n)
L2d = sp.kron(I, L1d) + sp.kron(L1d, I) # 2D Laplacian
# Transfer to GPU
L_gpu = csp.csr_matrix(L2d.tocsr())
x_gpu = cp.random.randn(N)
Benchmark
import time
# Sparse matvec on GPU
cp.cuda.Stream.null.synchronize()
t0 = time.perf_counter()
y_sparse = L_gpu @ x_gpu
cp.cuda.Stream.null.synchronize()
t_sparse = time.perf_counter() - t0
density = L2d.nnz / N**2
print(f"Matrix: {N}x{N}, nnz={L2d.nnz}, density={density:.6f}")
print(f"Sparse GPU matvec: {t_sparse*1e6:.0f} us")
The 2-D Laplacian has density ~, which for gives β far below the sparse crossover, making sparse GPU matvec the clear winner.
Sparse vs Dense GPU Matvec
Compare sparse CSR and dense matrix-vector product performance on GPU across different matrix sizes and densities.
Parameters
Quick Check
You are building a sparse matrix by inserting elements one at a time on the GPU. Which format should you use for construction?
CSR β it is the fastest for computation
COO β it allows O(1) element insertion
Dense β convert to sparse only after filling
COO stores (row, col, value) triples with no ordering requirement, making insertion trivial. Convert to CSR after construction for fast computation.
CSR (Compressed Sparse Row)
Sparse format storing values, column indices, and row pointers. Optimal for sparse matvec and row slicing on both CPU and GPU.
Related: COO (Coordinate Format)
COO (Coordinate Format)
Sparse format storing (row, col, value) triples. Best for construction; convert to CSR for computation.
Related: CSR (Compressed Sparse Row)
CuPy Sparse Matrices β Complete Examples
# Code from: ch11/python/s04_cupy_sparse.py
# Load from backend supplements endpoint