Ferkans — Interactive Telecom Tutor

When There Is No Exact Solution

Most real-world systems are overdetermined ( $m > n$ ): more equations than unknowns. The channel estimation equation $\mathbf{Y} = \mathbf{X}\mathbf{h} + \mathbf{n}$ typically has more pilot observations than channel taps. Least squares finds the $\mathbf{h}$ that minimizes $\|\mathbf{Y} - \mathbf{X}\mathbf{h}\|_2^2$ — the foundation of estimation theory in signal processing.

Definition:
Ordinary Least Squares (OLS)

Given an overdetermined system $\mathbf{A}\mathbf{x} \approx \mathbf{b}$ with $\mathbf{A} \in \mathbb{C}^{m \times n}$ , $m > n$ , the least-squares solution minimizes the residual:

$\hat{\mathbf{x}} = \arg\min_{\mathbf{x}} \|\mathbf{A}\mathbf{x} - \mathbf{b}\|_2^2$

The solution satisfies the normal equations: $\mathbf{A}^H\mathbf{A}\hat{\mathbf{x}} = \mathbf{A}^H\mathbf{b}$

x_hat, residuals, rank, sv = np.linalg.lstsq(A, b, rcond=None)

lstsq returns:

x_hat: the least-squares solution
residuals: $\|\mathbf{A}\hat{\mathbf{x}} - \mathbf{b}\|_2^2$ (if $m > n$ )
rank: effective rank of $\mathbf{A}$
sv: singular values of $\mathbf{A}$

Definition:
Moore-Penrose Pseudoinverse

The Moore-Penrose pseudoinverse $\mathbf{A}^\dagger$ generalizes the inverse to rectangular and rank-deficient matrices. If $\mathbf{A} = \mathbf{U}\boldsymbol{\Sigma}\mathbf{V}^H$ , then:

$\mathbf{A}^\dagger = \mathbf{V}\boldsymbol{\Sigma}^\dagger\mathbf{U}^H$

where $\Sigma^\dagger_{ii} = 1/\sigma_i$ for $\sigma_i > 0$ and $0$ otherwise.

A_pinv = np.linalg.pinv(A)
x_hat = A_pinv @ b   # equivalent to lstsq

For full-rank $\mathbf{A}$ :

$m > n$ : $\mathbf{A}^\dagger = (\mathbf{A}^H\mathbf{A})^{-1}\mathbf{A}^H$ (left inverse)
$m < n$ : $\mathbf{A}^\dagger = \mathbf{A}^H(\mathbf{A}\mathbf{A}^H)^{-1}$ (right inverse)

Definition:
Tikhonov (Ridge) Regularization

When $\mathbf{A}$ is ill-conditioned, OLS amplifies noise. Tikhonov regularization adds a penalty:

$\hat{\mathbf{x}}_{\mathrm{Tik}} = \arg\min_{\mathbf{x}} \|\mathbf{A}\mathbf{x} - \mathbf{b}\|_2^2 + \alpha\|\mathbf{x}\|_2^2$

The closed-form solution is: $\hat{\mathbf{x}}_{\mathrm{Tik}} = (\mathbf{A}^H\mathbf{A} + \alpha\mathbf{I})^{-1}\mathbf{A}^H\mathbf{b}$

alpha = 0.1
x_tik = np.linalg.solve(A.conj().T @ A + alpha * np.eye(n),
                         A.conj().T @ b)

This is exactly the MMSE estimator in wireless: with $\alpha = \sigma_n^2$ , Tikhonov regularization gives the minimum mean-squared error (MMSE) estimate.

In machine learning, this is called ridge regression. The parameter $\alpha$ trades bias for variance.

Definition:
QR Factorization for Least Squares

The QR factorization $\mathbf{A} = \mathbf{Q}\mathbf{R}$ where $\mathbf{Q}$ is unitary and $\mathbf{R}$ is upper triangular provides a numerically stable way to solve least squares:

$\hat{\mathbf{x}} = \mathbf{R}^{-1}\mathbf{Q}^H\mathbf{b}$

from scipy.linalg import qr, solve_triangular

Q, R = qr(A, mode='economic')   # reduced QR
x_qr = solve_triangular(R, Q.conj().T @ b)

QR is more numerically stable than the normal equations $(\mathbf{A}^H\mathbf{A})\mathbf{x} = \mathbf{A}^H\mathbf{b}$ because it avoids forming $\mathbf{A}^H\mathbf{A}$ (which squares the condition number).

Definition:
Total Least Squares (TLS)

When both $\mathbf{A}$ and $\mathbf{b}$ are noisy, ordinary least squares is biased. Total least squares minimizes:

$\min_{\Delta\mathbf{A}, \Delta\mathbf{b}} \|[\Delta\mathbf{A} \mid \Delta\mathbf{b}]\|_F \quad \text{s.t.} \quad (\mathbf{A} + \Delta\mathbf{A})\mathbf{x} = \mathbf{b} + \Delta\mathbf{b}$

The solution uses the SVD of $[\mathbf{A} \mid \mathbf{b}]$ :

C = np.column_stack([A, b])
U, s, Vh = np.linalg.svd(C, full_matrices=False)
V = Vh.conj().T
x_tls = -V[:n, -1] / V[-1, -1]

Theorem: Normal Equations and the Least-Squares Solution

The least-squares solution $\hat{\mathbf{x}}$ of $\mathbf{A}\mathbf{x} \approx \mathbf{b}$ satisfies: $\mathbf{A}^H\mathbf{A}\hat{\mathbf{x}} = \mathbf{A}^H\mathbf{b}$

When $\mathbf{A}$ has full column rank ( $\mathrm{rank}(\mathbf{A}) = n$ ), the solution is unique: $\hat{\mathbf{x}} = (\mathbf{A}^H\mathbf{A})^{-1}\mathbf{A}^H\mathbf{b} = \mathbf{A}^\dagger\mathbf{b}$

Proof

Derivation

Expand the objective $\|\mathbf{Ax} - \mathbf{b}\|^2 = \mathbf{x}^H\mathbf{A}^H\mathbf{A}\mathbf{x} - 2\Re(\mathbf{x}^H\mathbf{A}^H\mathbf{b}) + \mathbf{b}^H\mathbf{b}$ . Setting the gradient to zero: $\nabla_{\mathbf{x}^*} = \mathbf{A}^H\mathbf{A}\mathbf{x} - \mathbf{A}^H\mathbf{b} = \mathbf{0}$ , which gives the normal equations.

Theorem: QR Is More Stable Than Normal Equations

Solving least squares via the normal equations squares the condition number: $\kappa(\mathbf{A}^H\mathbf{A}) = \kappa(\mathbf{A})^2$ . The QR-based approach operates with $\kappa(\mathbf{A})$ , preserving roughly twice as many digits of accuracy.

Forming $\mathbf{A}^H\mathbf{A}$ explicitly loses information through cancellation. QR factorization avoids this by working directly with the orthogonal factor $\mathbf{Q}$ .

Example: Least-Squares Channel Estimation

Estimate a wireless channel with $L = 5$ taps from $M = 20$ pilot observations using least squares. Compare OLS with Tikhonov regularization at different noise levels.

Solution

Setup and estimate

import numpy as np

L = 5        # channel taps
M = 20       # pilot symbols
np.random.seed(42)

# True channel
h_true = np.array([0.8, 0.5+0.3j, -0.2+0.1j, 0.1, -0.05])

# Pilot matrix (random QPSK)
X = (np.random.choice([-1, 1], (M, L))
     + 1j * np.random.choice([-1, 1], (M, L))) / np.sqrt(2)

for snr_db in [5, 15, 30]:
    snr_lin = 10 ** (snr_db / 10)
    noise_var = np.linalg.norm(h_true)**2 / snr_lin
    n = np.sqrt(noise_var/2) * (np.random.randn(M) + 1j*np.random.randn(M))
    y = X @ h_true + n

    # OLS
    h_ols = np.linalg.lstsq(X, y, rcond=None)[0]

    # Tikhonov (MMSE)
    alpha = noise_var
    h_tik = np.linalg.solve(X.conj().T @ X + alpha * np.eye(L),
                             X.conj().T @ y)

    print(f"SNR={snr_db:2d} dB: "
          f"OLS MSE={np.mean(np.abs(h_ols-h_true)**2):.2e}, "
          f"Tik MSE={np.mean(np.abs(h_tik-h_true)**2):.2e}")

Interpretation

At low SNR, Tikhonov (MMSE) significantly outperforms OLS because it regularizes against noise amplification. At high SNR, both converge since the bias from regularization becomes the dominant error.

Example: Least Squares via QR Factorization

Solve an overdetermined system using QR factorization and compare the accuracy with the normal equations approach for an ill-conditioned matrix.

Solution

QR vs Normal Equations

import numpy as np
from scipy.linalg import qr, solve_triangular

m, n = 100, 10
np.random.seed(42)

# Create ill-conditioned A (condition number ~1e8)
U, _, Vh = np.linalg.svd(np.random.randn(m, n), full_matrices=False)
s = np.logspace(0, -8, n)  # singular values from 1 to 1e-8
A = U * s @ Vh

x_true = np.random.randn(n)
b = A @ x_true

# Method 1: Normal equations
x_normal = np.linalg.solve(A.T @ A, A.T @ b)

# Method 2: QR factorization
Q, R = qr(A, mode='economic')
x_qr = solve_triangular(R, Q.T @ b)

# Method 3: lstsq (uses SVD internally)
x_lstsq = np.linalg.lstsq(A, b, rcond=None)[0]

print(f"Condition number: {np.linalg.cond(A):.2e}")
print(f"Normal eq error:  {np.linalg.norm(x_normal - x_true):.2e}")
print(f"QR error:         {np.linalg.norm(x_qr - x_true):.2e}")
print(f"lstsq error:      {np.linalg.norm(x_lstsq - x_true):.2e}")

Example: Total Least Squares for Errors-in-Variables

Compare OLS and TLS when both the matrix $\mathbf{A}$ and observation $\mathbf{b}$ have noise.

Solution

OLS vs TLS

import numpy as np

m, n = 50, 3
np.random.seed(42)
A_true = np.random.randn(m, n)
x_true = np.array([1.0, -2.0, 0.5])
b_true = A_true @ x_true

noise_level = 0.1
A_noisy = A_true + noise_level * np.random.randn(m, n)
b_noisy = b_true + noise_level * np.random.randn(m)

# OLS (ignores noise in A)
x_ols = np.linalg.lstsq(A_noisy, b_noisy, rcond=None)[0]

# TLS (accounts for noise in both A and b)
C = np.column_stack([A_noisy, b_noisy])
_, _, Vh = np.linalg.svd(C, full_matrices=False)
V = Vh.conj().T
x_tls = -V[:n, -1] / V[-1, -1]

print(f"True x:  {x_true}")
print(f"OLS x:   {x_ols}")
print(f"TLS x:   {x_tls}")
print(f"OLS error: {np.linalg.norm(x_ols - x_true):.4f}")
print(f"TLS error: {np.linalg.norm(x_tls - x_true):.4f}")

Least Squares: OLS vs Tikhonov Regularization

Compare ordinary least squares with Tikhonov regularization. Adjust the noise level and regularization parameter to see how they affect the estimation error.

Parameters

SNR (dB)10

Regularization Parameter (alpha)0.1

Number of Observations50

Least Squares Methods Compared

Method	Solves	Stability	Cost	When to Use
Normal Equations	$\mathbf{A}^H\mathbf{A}\mathbf{x} = \mathbf{A}^H\mathbf{b}$	Squares $\kappa$	$mn^2 + n^3/3$	Well-conditioned, small $n$
QR Factorization	$\mathbf{R}\mathbf{x} = \mathbf{Q}^H\mathbf{b}$	Preserves $\kappa$	$2mn^2$	Default choice for dense
SVD (lstsq)	$\mathbf{V}\boldsymbol{\Sigma}^\dagger\mathbf{U}^H\mathbf{b}$	Best stability	$2mn^2 + 11n^3$	Rank-deficient systems
Tikhonov	$(\mathbf{A}^H\mathbf{A} + \alpha\mathbf{I})\mathbf{x} = \mathbf{A}^H\mathbf{b}$	Regularized	$mn^2 + n^3/3$	Ill-conditioned, noisy
Total LS	SVD of $[\mathbf{A} \mid \mathbf{b}]$	Best when both noisy	$2m(n+1)^2$	Errors in variables

least squares

The problem of finding $\mathbf{x}$ that minimizes $\|\mathbf{Ax} - \mathbf{b}\|_2^2$ . The solution is $\hat{\mathbf{x}} = \mathbf{A}^\dagger\mathbf{b}$ .

Related: Moore-Penrose pseudoinverse

Moore-Penrose pseudoinverse

The unique matrix $\mathbf{A}^\dagger$ satisfying four conditions (Penrose conditions). Generalizes the inverse to rectangular and rank-deficient matrices. Computed via SVD.

Related: least squares

Common Mistake: Normal Equations for Ill-Conditioned Systems

Mistake:

Solving least squares via x = np.linalg.solve(A.T @ A, A.T @ b) for an ill-conditioned matrix, which squares the condition number.

Correction:

Use np.linalg.lstsq(A, b) or QR factorization. The lstsq function uses SVD internally and handles rank deficiency gracefully.

Quick Check

In wireless communications, Tikhonov regularization with $\alpha = \sigma_n^2$ corresponds to which estimator?

MMSE (Minimum Mean Squared Error)

Zero-Forcing (ZF)

Maximum Likelihood (ML)

Matched Filter

Correction:

MMSE (Minimum Mean Squared Error)

The MMSE estimator adds noise variance as regularization, trading bias for variance reduction.

Key Takeaway

Use np.linalg.lstsq as your default solver for overdetermined systems. For ill-conditioned problems, add Tikhonov regularization. For errors-in-variables, use total least squares. Never form the normal equations $\mathbf{A}^H\mathbf{A}$ directly unless the system is well-conditioned and speed is critical.

Why This Matters: MMSE Channel Estimation Is Tikhonov Regularization

The MMSE channel estimate is: $\hat{\mathbf{h}}_{\mathrm{MMSE}} = (\mathbf{X}^H\mathbf{X} + \sigma_n^2\mathbf{I})^{-1}\mathbf{X}^H\mathbf{y}$

This is exactly Tikhonov regularization with $\alpha = \sigma_n^2$ . At high SNR ( $\sigma_n^2 \to 0$ ), MMSE $\to$ ZF (OLS). At low SNR, the regularization prevents noise amplification, giving superior estimation accuracy.

Least Squares Methods

python

Ordinary least squares, Tikhonov regularization, QR factorization, total least squares, and comparisons with wireless examples.

# Code from: ch06/python/least_squares.py
# Load from backend supplements endpoint

Least Squares and Pseudoinverse

When There Is No Exact Solution

Definition: Ordinary Least Squares (OLS)

Definition: Moore-Penrose Pseudoinverse

Definition: Tikhonov (Ridge) Regularization

Definition: QR Factorization for Least Squares

Definition: Total Least Squares (TLS)

Theorem: Normal Equations and the Least-Squares Solution

Derivation

Theorem: QR Is More Stable Than Normal Equations

Example: Least-Squares Channel Estimation

Setup and estimate

Interpretation

Example: Least Squares via QR Factorization

QR vs Normal Equations

Example: Total Least Squares for Errors-in-Variables

OLS vs TLS

Least Squares: OLS vs Tikhonov Regularization

Parameters

Least Squares Methods Compared

least squares

Moore-Penrose pseudoinverse

Common Mistake: Normal Equations for Ill-Conditioned Systems

Quick Check

Key Takeaway

Why This Matters: MMSE Channel Estimation Is Tikhonov Regularization

Least Squares Methods

Definition:
Ordinary Least Squares (OLS)

Definition:
Moore-Penrose Pseudoinverse

Definition:
Tikhonov (Ridge) Regularization

Definition:
QR Factorization for Least Squares

Definition:
Total Least Squares (TLS)