Least Squares and Pseudoinverse

When There Is No Exact Solution

Most real-world systems are overdetermined (m>nm > n): more equations than unknowns. The channel estimation equation Y=Xh+n\mathbf{Y} = \mathbf{X}\mathbf{h} + \mathbf{n} typically has more pilot observations than channel taps. Least squares finds the h\mathbf{h} that minimizes βˆ₯Yβˆ’Xhβˆ₯22\|\mathbf{Y} - \mathbf{X}\mathbf{h}\|_2^2 β€” the foundation of estimation theory in signal processing.

Definition:

Ordinary Least Squares (OLS)

Given an overdetermined system Axβ‰ˆb\mathbf{A}\mathbf{x} \approx \mathbf{b} with A∈CmΓ—n\mathbf{A} \in \mathbb{C}^{m \times n}, m>nm > n, the least-squares solution minimizes the residual:

x^=arg⁑min⁑xβˆ₯Axβˆ’bβˆ₯22\hat{\mathbf{x}} = \arg\min_{\mathbf{x}} \|\mathbf{A}\mathbf{x} - \mathbf{b}\|_2^2

The solution satisfies the normal equations: AHAx^=AHb\mathbf{A}^H\mathbf{A}\hat{\mathbf{x}} = \mathbf{A}^H\mathbf{b}

x_hat, residuals, rank, sv = np.linalg.lstsq(A, b, rcond=None)

lstsq returns:

  • x_hat: the least-squares solution
  • residuals: βˆ₯Ax^βˆ’bβˆ₯22\|\mathbf{A}\hat{\mathbf{x}} - \mathbf{b}\|_2^2 (if m>nm > n)
  • rank: effective rank of A\mathbf{A}
  • sv: singular values of A\mathbf{A}

Definition:

Moore-Penrose Pseudoinverse

The Moore-Penrose pseudoinverse A†\mathbf{A}^\dagger generalizes the inverse to rectangular and rank-deficient matrices. If A=UΞ£VH\mathbf{A} = \mathbf{U}\boldsymbol{\Sigma}\mathbf{V}^H, then:

A†=VΣ†UH\mathbf{A}^\dagger = \mathbf{V}\boldsymbol{\Sigma}^\dagger\mathbf{U}^H

where Ξ£ii†=1/Οƒi\Sigma^\dagger_{ii} = 1/\sigma_i for Οƒi>0\sigma_i > 0 and 00 otherwise.

A_pinv = np.linalg.pinv(A)
x_hat = A_pinv @ b   # equivalent to lstsq

For full-rank A\mathbf{A}:

  • m>nm > n: A†=(AHA)βˆ’1AH\mathbf{A}^\dagger = (\mathbf{A}^H\mathbf{A})^{-1}\mathbf{A}^H (left inverse)
  • m<nm < n: A†=AH(AAH)βˆ’1\mathbf{A}^\dagger = \mathbf{A}^H(\mathbf{A}\mathbf{A}^H)^{-1} (right inverse)

Definition:

Tikhonov (Ridge) Regularization

When A\mathbf{A} is ill-conditioned, OLS amplifies noise. Tikhonov regularization adds a penalty:

x^Tik=arg⁑min⁑xβˆ₯Axβˆ’bβˆ₯22+Ξ±βˆ₯xβˆ₯22\hat{\mathbf{x}}_{\mathrm{Tik}} = \arg\min_{\mathbf{x}} \|\mathbf{A}\mathbf{x} - \mathbf{b}\|_2^2 + \alpha\|\mathbf{x}\|_2^2

The closed-form solution is: x^Tik=(AHA+Ξ±I)βˆ’1AHb\hat{\mathbf{x}}_{\mathrm{Tik}} = (\mathbf{A}^H\mathbf{A} + \alpha\mathbf{I})^{-1}\mathbf{A}^H\mathbf{b}

alpha = 0.1
x_tik = np.linalg.solve(A.conj().T @ A + alpha * np.eye(n),
                         A.conj().T @ b)

This is exactly the MMSE estimator in wireless: with Ξ±=Οƒn2\alpha = \sigma_n^2, Tikhonov regularization gives the minimum mean-squared error (MMSE) estimate.

In machine learning, this is called ridge regression. The parameter Ξ±\alpha trades bias for variance.

Definition:

QR Factorization for Least Squares

The QR factorization A=QR\mathbf{A} = \mathbf{Q}\mathbf{R} where Q\mathbf{Q} is unitary and R\mathbf{R} is upper triangular provides a numerically stable way to solve least squares:

x^=Rβˆ’1QHb\hat{\mathbf{x}} = \mathbf{R}^{-1}\mathbf{Q}^H\mathbf{b}

from scipy.linalg import qr, solve_triangular

Q, R = qr(A, mode='economic')   # reduced QR
x_qr = solve_triangular(R, Q.conj().T @ b)

QR is more numerically stable than the normal equations (AHA)x=AHb(\mathbf{A}^H\mathbf{A})\mathbf{x} = \mathbf{A}^H\mathbf{b} because it avoids forming AHA\mathbf{A}^H\mathbf{A} (which squares the condition number).

Definition:

Total Least Squares (TLS)

When both A\mathbf{A} and b\mathbf{b} are noisy, ordinary least squares is biased. Total least squares minimizes:

min⁑ΔA,Ξ”bβˆ₯[Ξ”Aβˆ£Ξ”b]βˆ₯Fs.t.(A+Ξ”A)x=b+Ξ”b\min_{\Delta\mathbf{A}, \Delta\mathbf{b}} \|[\Delta\mathbf{A} \mid \Delta\mathbf{b}]\|_F \quad \text{s.t.} \quad (\mathbf{A} + \Delta\mathbf{A})\mathbf{x} = \mathbf{b} + \Delta\mathbf{b}

The solution uses the SVD of [A∣b][\mathbf{A} \mid \mathbf{b}]:

C = np.column_stack([A, b])
U, s, Vh = np.linalg.svd(C, full_matrices=False)
V = Vh.conj().T
x_tls = -V[:n, -1] / V[-1, -1]

Theorem: Normal Equations and the Least-Squares Solution

The least-squares solution x^\hat{\mathbf{x}} of Axβ‰ˆb\mathbf{A}\mathbf{x} \approx \mathbf{b} satisfies: AHAx^=AHb\mathbf{A}^H\mathbf{A}\hat{\mathbf{x}} = \mathbf{A}^H\mathbf{b}

When A\mathbf{A} has full column rank (rank(A)=n\mathrm{rank}(\mathbf{A}) = n), the solution is unique: x^=(AHA)βˆ’1AHb=A†b\hat{\mathbf{x}} = (\mathbf{A}^H\mathbf{A})^{-1}\mathbf{A}^H\mathbf{b} = \mathbf{A}^\dagger\mathbf{b}

Theorem: QR Is More Stable Than Normal Equations

Solving least squares via the normal equations squares the condition number: ΞΊ(AHA)=ΞΊ(A)2\kappa(\mathbf{A}^H\mathbf{A}) = \kappa(\mathbf{A})^2. The QR-based approach operates with ΞΊ(A)\kappa(\mathbf{A}), preserving roughly twice as many digits of accuracy.

Forming AHA\mathbf{A}^H\mathbf{A} explicitly loses information through cancellation. QR factorization avoids this by working directly with the orthogonal factor Q\mathbf{Q}.

Example: Least-Squares Channel Estimation

Estimate a wireless channel with L=5L = 5 taps from M=20M = 20 pilot observations using least squares. Compare OLS with Tikhonov regularization at different noise levels.

Example: Least Squares via QR Factorization

Solve an overdetermined system using QR factorization and compare the accuracy with the normal equations approach for an ill-conditioned matrix.

Example: Total Least Squares for Errors-in-Variables

Compare OLS and TLS when both the matrix A\mathbf{A} and observation b\mathbf{b} have noise.

Least Squares: OLS vs Tikhonov Regularization

Compare ordinary least squares with Tikhonov regularization. Adjust the noise level and regularization parameter to see how they affect the estimation error.

Parameters
10
0.1
50

Least Squares Methods Compared

MethodSolvesStabilityCostWhen to Use
Normal EquationsAHAx=AHb\mathbf{A}^H\mathbf{A}\mathbf{x} = \mathbf{A}^H\mathbf{b}Squares ΞΊ\kappamn2+n3/3mn^2 + n^3/3Well-conditioned, small nn
QR FactorizationRx=QHb\mathbf{R}\mathbf{x} = \mathbf{Q}^H\mathbf{b}Preserves ΞΊ\kappa2mn22mn^2Default choice for dense
SVD (lstsq)VӆUHb\mathbf{V}\boldsymbol{\Sigma}^\dagger\mathbf{U}^H\mathbf{b}Best stability2mn2+11n32mn^2 + 11n^3Rank-deficient systems
Tikhonov(AHA+Ξ±I)x=AHb(\mathbf{A}^H\mathbf{A} + \alpha\mathbf{I})\mathbf{x} = \mathbf{A}^H\mathbf{b}Regularizedmn2+n3/3mn^2 + n^3/3Ill-conditioned, noisy
Total LSSVD of [A∣b][\mathbf{A} \mid \mathbf{b}]Best when both noisy2m(n+1)22m(n+1)^2Errors in variables

least squares

The problem of finding x\mathbf{x} that minimizes βˆ₯Axβˆ’bβˆ₯22\|\mathbf{Ax} - \mathbf{b}\|_2^2. The solution is x^=A†b\hat{\mathbf{x}} = \mathbf{A}^\dagger\mathbf{b}.

Related: Moore-Penrose pseudoinverse

Moore-Penrose pseudoinverse

The unique matrix A†\mathbf{A}^\dagger satisfying four conditions (Penrose conditions). Generalizes the inverse to rectangular and rank-deficient matrices. Computed via SVD.

Related: least squares

Common Mistake: Normal Equations for Ill-Conditioned Systems

Mistake:

Solving least squares via x = np.linalg.solve(A.T @ A, A.T @ b) for an ill-conditioned matrix, which squares the condition number.

Correction:

Use np.linalg.lstsq(A, b) or QR factorization. The lstsq function uses SVD internally and handles rank deficiency gracefully.

Quick Check

In wireless communications, Tikhonov regularization with Ξ±=Οƒn2\alpha = \sigma_n^2 corresponds to which estimator?

MMSE (Minimum Mean Squared Error)

Zero-Forcing (ZF)

Maximum Likelihood (ML)

Matched Filter

Key Takeaway

Use np.linalg.lstsq as your default solver for overdetermined systems. For ill-conditioned problems, add Tikhonov regularization. For errors-in-variables, use total least squares. Never form the normal equations AHA\mathbf{A}^H\mathbf{A} directly unless the system is well-conditioned and speed is critical.

Why This Matters: MMSE Channel Estimation Is Tikhonov Regularization

The MMSE channel estimate is: h^MMSE=(XHX+Οƒn2I)βˆ’1XHy\hat{\mathbf{h}}_{\mathrm{MMSE}} = (\mathbf{X}^H\mathbf{X} + \sigma_n^2\mathbf{I})^{-1}\mathbf{X}^H\mathbf{y}

This is exactly Tikhonov regularization with Ξ±=Οƒn2\alpha = \sigma_n^2. At high SNR (Οƒn2β†’0\sigma_n^2 \to 0), MMSE β†’\to ZF (OLS). At low SNR, the regularization prevents noise amplification, giving superior estimation accuracy.

Least Squares Methods

python
Ordinary least squares, Tikhonov regularization, QR factorization, total least squares, and comparisons with wireless examples.
# Code from: ch06/python/least_squares.py
# Load from backend supplements endpoint