Chapter Summary

Key Points

1.
Always solve, never invert. Use np.linalg.solve(A, b) instead of np.linalg.inv(A) @ b. The solve approach is 3x faster, uses half the memory, and produces smaller numerical errors. Use slogdet for determinants of large matrices to avoid overflow.
2.
Match the decomposition to the matrix structure. Use eigh for Hermitian matrices (covariance, correlation, Gram matrices) — it is 3x faster than eig and guarantees real eigenvalues. Use SVD for any matrix when you need rank, condition number, pseudoinverse, or low-rank approximation. The Eckart-Young theorem guarantees SVD gives the optimal low-rank approximation.
3.
Build sparse in COO, compute in CSR. For matrices where most entries are zero, sparse formats reduce memory from $O(n^2)$ to $O(\mathrm{nnz})$ . Use scipy.sparse.diags for banded matrices, spsolve for sparse linear systems, and eigsh/svds for finding a few eigenvalues of large sparse matrices. Never convert sparse to dense just to use NumPy functions.
4.
Never form the full Kronecker product. The identity $(\mathbf{A} \otimes \mathbf{B})\mathrm{vec}(\mathbf{X}) = \mathrm{vec}(\mathbf{B}\mathbf{X}\mathbf{A}^T)$ converts an $O(n^4)$ operation into two $O(n^3)$ matrix multiplies. This pattern appears in the Kronecker MIMO channel model, 2D filtering, and multidimensional transforms.
5.
Exploit matrix structure for speed. Toeplitz matrices (convolution) can be applied in $O(n \log n)$ via FFT. Circulant matrices are diagonalized by the DFT — the mathematical foundation of OFDM. Use scipy.linalg.expm (not np.exp) for the matrix exponential. Distinguish element-wise operations from matrix functions.
6.
Least squares is the bridge between linear algebra and estimation theory. Use np.linalg.lstsq as the default for overdetermined systems. Add Tikhonov regularization ( $\alpha = \sigma_n^2$ ) for noisy, ill-conditioned problems — this is exactly the MMSE estimator. Use QR factorization instead of normal equations for numerical stability. Use total least squares when both sides of the equation are noisy.

Looking Ahead

Chapter 7 applies these linear algebra tools to optimization: gradient descent, Newton's method, and convex optimization with SciPy's minimize interface. The connection is direct — the Hessian matrix in Newton's method requires solving a linear system at every step, and the condition number of the Hessian determines convergence speed. Regularization from Section 6.6 reappears as penalty terms in regularized optimization.

Least Squares and Pseudoinverse Exercises