Ferkans — Interactive Telecom Tutor

The Need for a Generalized Inverse

When $\mathcal{A}$ is not invertible — because it has a nontrivial null space, or its range is not all of $\mathcal{Y}$ — we need a generalized inverse that produces the best possible solution. The Moore–Penrose pseudoinverse provides exactly this: among all least-squares solutions, it selects the one with minimum norm.

In imaging, this is the natural starting point: find the smallest reconstruction consistent with the data. The trouble, as we shall see, is that for ill-posed problems the pseudoinverse is unbounded and therefore useless in the presence of noise. This failure motivates the entire regularization theory of Sections 2.3–2.6.

Definition:
The Moore–Penrose Pseudoinverse

Let $\mathcal{A} \colon \mathcal{X} \to \mathcal{Y}$ be a bounded linear operator between Hilbert spaces. The Moore–Penrose pseudoinverse $\mathcal{A}^\dagger$ is the (possibly unbounded) operator defined on $\mathcal{D}(\mathcal{A}^\dagger) = \mathcal{R}(\mathcal{A}) \oplus \mathcal{R}(\mathcal{A})^\perp$ by

$\mathcal{A}^\dagger y = \arg\min_{x \in \mathcal{X}} \|x\| \quad \text{subject to} \quad \|\mathcal{A}x - y\| = \min.$

Equivalently, $\mathcal{A}^\dagger y$ is the minimum-norm least-squares solution: the element of smallest norm among all minimizers of $\|\mathcal{A}x - y\|$ .

For $y \in \mathcal{R}(\mathcal{A})$ , $\mathcal{A}^\dagger y$ is the unique element in $\mathcal{N}(\mathcal{A})^\perp$ satisfying $\mathcal{A}(\mathcal{A}^\dagger y) = y$ .

The four Moore–Penrose conditions characterise $\mathcal{A}^\dagger$ uniquely (writing $\mathbf{B} = \mathbf{A}^\dagger$ for matrices): (i) $\mathbf{A}\mathbf{B}\mathbf{A} = \mathbf{A}$ , (ii) $\mathbf{B}\mathbf{A}\mathbf{B} = \mathbf{B}$ , (iii) $(\mathbf{A}\mathbf{B})^* = \mathbf{A}\mathbf{B}$ , (iv) $(\mathbf{B}\mathbf{A})^* = \mathbf{B}\mathbf{A}$ .

,

Historical Note: Moore, Penrose, and the Generalized Inverse

1920–1955

Eliakim Hastings Moore introduced a generalized inverse for finite matrices in 1920, motivated by problems in projective geometry and the calculus of variations. His work went largely unnoticed for decades.

Roger Penrose independently rediscovered and axiomatised the same concept in 1955, giving the four conditions that now bear both names. Penrose's algebraic characterisation made the pseudoinverse tractable for computation, and with the advent of the SVD algorithm in the 1960s, it became a standard tool in numerical linear algebra and statistics.

In the infinite-dimensional setting relevant to imaging, the unboundedness of $\mathcal{A}^\dagger$ was the key observation that drove Tikhonov's regularization theory — making the pseudoinverse both the motivation and the target that regularization approximates.

Theorem: SVD Representation of the Pseudoinverse

Let $\mathcal{A} \colon \mathcal{X} \to \mathcal{Y}$ be a compact operator with singular system $\{(\sigma_k, v_k, u_k)\}_{k=1}^\infty$ . Then for $y \in \mathcal{D}(\mathcal{A}^\dagger)$ ,

$\mathcal{A}^\dagger y = \sum_{k=1}^{\infty} \frac{1}{\sigma_k}\, \langle y, u_k \rangle\, v_k.$

This series converges if and only if the Picard condition holds:

$\sum_{k=1}^{\infty} \frac{|\langle y, u_k \rangle|^2}{\sigma_k^2} < \infty.$

The SVD decomposes the action of $\mathcal{A}$ into independent channels: $\mathcal{A}$ maps $v_k$ to $\sigma_k u_k$ . Inversion requires dividing by $\sigma_k$ in each channel. The Picard condition says the data coefficients $\langle y, u_k \rangle$ must decay faster than $\sigma_k$ for the sum to converge. For exact data from a true solution, this holds; for noisy data, it generically fails.

Proof

Forward SVD expansion

Write $x = \sum_k \langle x, v_k \rangle v_k + x_0$ where $x_0 \in \mathcal{N}(\mathcal{A})$ . Then

$\mathcal{A} x = \sum_k \sigma_k \langle x, v_k \rangle u_k.$

The minimum-norm condition forces $x_0 = 0$ (we choose $x \in \mathcal{N}(\mathcal{A})^\perp$ ).

Inversion

Setting $\mathcal{A}x = y$ and taking inner products with $u_k$ :

$\langle y, u_k \rangle = \sigma_k \langle x, v_k \rangle \quad \Longrightarrow \quad \langle x, v_k \rangle = \frac{\langle y, u_k \rangle}{\sigma_k}.$

Hence $\mathcal{A}^\dagger y = \sum_k \sigma_k^{-1} \langle y, u_k \rangle v_k$ . Convergence in $\mathcal{X}$ requires $\sum_k |\sigma_k^{-1} \langle y, u_k \rangle|^2 < \infty$ , which is the Picard condition. $\blacksquare$

Definition:
The Picard Condition

Let $\mathcal{A}$ be a compact operator with singular system $\{(\sigma_k, v_k, u_k)\}$ . A datum $y \in \mathcal{Y}$ satisfies the Picard condition if

$\sum_{k=1}^{\infty} \frac{|\langle y, u_k \rangle|^2}{\sigma_k^2} < \infty.$

This is equivalent to $y \in \mathcal{D}(\mathcal{A}^\dagger)$ : the Fourier coefficients of $y$ with respect to the left singular vectors must decay sufficiently fast relative to the singular values.

For exact data $y = \mathcal{A} x^\dagger$ , the Picard condition is automatically satisfied: the coefficients are $\sigma_k\langle x^\dagger, v_k\rangle$ and the ratio gives $\langle x^\dagger, v_k\rangle \in \ell^2$ . For noisy data $y^\delta = y + \eta$ , the noise coefficients $\langle \eta, u_k\rangle$ do not decay, so the Picard condition fails and $\mathcal{A}^\dagger y^\delta$ diverges.

,

Theorem: Unboundedness of the Pseudoinverse for Compact Operators

Let $\mathcal{A} \colon \mathcal{X} \to \mathcal{Y}$ be a compact operator between infinite-dimensional Hilbert spaces with $\overline{\mathcal{R}(\mathcal{A})}$ infinite-dimensional. Then $\mathcal{A}^\dagger$ is unbounded.

Proof

Construct an unbounded sequence

Consider $y_k = u_k$ (the left singular vectors). Then $\|y_k\| = 1$ for all $k$ , but

$\|\mathcal{A}^\dagger y_k\| = \left\|\frac{v_k}{\sigma_k}\right\| = \frac{1}{\sigma_k} \to \infty$

as $k \to \infty$ (since $\sigma_k \to 0$ for compact operators on infinite-dimensional spaces).

Conclude

Since $\sup_k \|\mathcal{A}^\dagger y_k\|/\|y_k\| = \sup_k \sigma_k^{-1} = \infty$ , the operator $\mathcal{A}^\dagger$ is unbounded. It cannot be extended to a bounded operator on all of $\mathcal{Y}$ . $\blacksquare$

Example: Noise Amplification Through the Pseudoinverse

Consider an integral equation $\mathcal{A}x = y$ on $L^2([0,1])$ with singular values $\sigma_k = k^{-2}$ (mildly ill-posed). The true solution has coefficients $\langle x^\dagger, v_k \rangle = k^{-3}$ .

The data are corrupted by noise $\eta$ with i.i.d. coefficients $\langle \eta, u_k \rangle \sim \mathcal{N}(0, \delta^2)$ . Compute the expected reconstruction error $\mathbb{E}\|\mathcal{A}^\dagger y^\delta - x^\dagger\|^2$ .

Solution

Expand the error

$\mathcal{A}^\dagger y^\delta - x^\dagger = \mathcal{A}^\dagger \eta = \sum_k \frac{\langle \eta, u_k \rangle}{\sigma_k} v_k.KATEXPLACEHOLDER0END\|\mathcal{A}^\dagger \eta\|^2 = \sum_k \frac{|\langle \eta, u_k \rangle|^2}{\sigma_k^2}.$ $

Compute the expectation

$\mathbb{E}\|\mathcal{A}^\dagger \eta\|^2 = \sum_k \frac{\delta^2}{\sigma_k^2} = \delta^2 \sum_k k^4 = \infty.$ $The expected error is **infinite** — the pseudoinverse amplifies noise without bound. Even for$ \delta = 10^{-10}$, the reconstruction is meaningless.

Physical interpretation

The noise has equal energy across all singular components, but the pseudoinverse amplifies the $k$ -th component by $1/\sigma_k = k^2$ . The resulting error grows as $k^4$ , overwhelming the signal components which decay as $k^{-6}$ . Regularization — which damps the high-frequency amplification — is essential.

Noise Amplification Through the Pseudoinverse

Demonstrates the catastrophic noise amplification inherent in the pseudoinverse. A 1D signal is mapped through a compact forward operator with singular values $\sigma_k \propto k^{-p}$ , and white noise of level $\delta$ is added.

Left panel: SVD coefficients of the exact data (blue, decaying) and the noisy data (red, levelling off at $\delta$ ). The crossover point where noise dominates signal is clearly visible.

Right panel: The pseudoinverse reconstruction using the first $N$ components. When $N$ is too large, noise-dominated components destroy the reconstruction. This motivates truncated SVD and Tikhonov regularization (Section 2.4).

Increase the noise level to see how the safe truncation index decreases. This trade-off between resolution and stability is the fundamental dilemma of ill-posed problems.

Parameters

Noise level

\delta

0.05

SVD components

N

20

Singular value decay

\sigma_k \sim k^{-p}

2

Common Mistake: The Pseudoinverse Is Not a Reconstruction Method

Mistake:

Computing $\mathbf{A}^\dagger \mathbf{y}^\delta$ (the pseudoinverse applied to noisy data) as the reconstruction of an ill-conditioned imaging problem, expecting it to give a reasonable image.

Correction:

For ill-conditioned systems, $\mathbf{A}^\dagger \mathbf{y}^\delta$ catastrophically amplifies noise: the $k$ -th SVD component of the result is $\langle \mathbf{y}^\delta, \mathbf{u}_k\rangle / \sigma_k$ , which grows without bound as $\sigma_k \to 0$ . The pseudoinverse is a mathematical concept (the minimum-norm least-squares solution), not a practical algorithm for noisy data. Always apply regularization.

Picard Condition

A datum $y$ satisfies the Picard condition for the operator $\mathcal{A}$ if $\sum_k |\langle y, u_k\rangle|^2/\sigma_k^2 < \infty$ , which is necessary and sufficient for $\mathcal{A}^\dagger y$ to be well-defined.

Key Takeaway

The Moore–Penrose pseudoinverse $\mathcal{A}^\dagger$ gives the minimum-norm least-squares solution via $\mathcal{A}^\dagger y = \sum_k \sigma_k^{-1}\langle y,u_k\rangle v_k$ . For compact operators, $\mathcal{A}^\dagger$ is always unbounded — it cannot be used directly with noisy data. The Picard condition determines when $\mathcal{A}^\dagger y$ is well-defined: the data coefficients must decay faster than $\sigma_k$ . Noise violates this condition, driving the reconstruction error to infinity. This is the mathematical justification for regularization theory.

The Moore-Penrose Pseudoinverse