The Moore-Penrose Pseudoinverse

The Need for a Generalized Inverse

When A\mathcal{A} is not invertible — because it has a nontrivial null space, or its range is not all of Y\mathcal{Y} — we need a generalized inverse that produces the best possible solution. The Moore–Penrose pseudoinverse provides exactly this: among all least-squares solutions, it selects the one with minimum norm.

In imaging, this is the natural starting point: find the smallest reconstruction consistent with the data. The trouble, as we shall see, is that for ill-posed problems the pseudoinverse is unbounded and therefore useless in the presence of noise. This failure motivates the entire regularization theory of Sections 2.3–2.6.

Definition:

The Moore–Penrose Pseudoinverse

Let A ⁣:XY\mathcal{A} \colon \mathcal{X} \to \mathcal{Y} be a bounded linear operator between Hilbert spaces. The Moore–Penrose pseudoinverse A\mathcal{A}^\dagger is the (possibly unbounded) operator defined on D(A)=R(A)R(A)\mathcal{D}(\mathcal{A}^\dagger) = \mathcal{R}(\mathcal{A}) \oplus \mathcal{R}(\mathcal{A})^\perp by

Ay=argminxXxsubject toAxy=min.\mathcal{A}^\dagger y = \arg\min_{x \in \mathcal{X}} \|x\| \quad \text{subject to} \quad \|\mathcal{A}x - y\| = \min.

Equivalently, Ay\mathcal{A}^\dagger y is the minimum-norm least-squares solution: the element of smallest norm among all minimizers of Axy\|\mathcal{A}x - y\|.

For yR(A)y \in \mathcal{R}(\mathcal{A}), Ay\mathcal{A}^\dagger y is the unique element in N(A)\mathcal{N}(\mathcal{A})^\perp satisfying A(Ay)=y\mathcal{A}(\mathcal{A}^\dagger y) = y.

The four Moore–Penrose conditions characterise A\mathcal{A}^\dagger uniquely (writing B=A\mathbf{B} = \mathbf{A}^\dagger for matrices): (i) ABA=A\mathbf{A}\mathbf{B}\mathbf{A} = \mathbf{A}, (ii) BAB=B\mathbf{B}\mathbf{A}\mathbf{B} = \mathbf{B}, (iii) (AB)=AB(\mathbf{A}\mathbf{B})^* = \mathbf{A}\mathbf{B}, (iv) (BA)=BA(\mathbf{B}\mathbf{A})^* = \mathbf{B}\mathbf{A}.

,

Historical Note: Moore, Penrose, and the Generalized Inverse

1920–1955

Eliakim Hastings Moore introduced a generalized inverse for finite matrices in 1920, motivated by problems in projective geometry and the calculus of variations. His work went largely unnoticed for decades.

Roger Penrose independently rediscovered and axiomatised the same concept in 1955, giving the four conditions that now bear both names. Penrose's algebraic characterisation made the pseudoinverse tractable for computation, and with the advent of the SVD algorithm in the 1960s, it became a standard tool in numerical linear algebra and statistics.

In the infinite-dimensional setting relevant to imaging, the unboundedness of A\mathcal{A}^\dagger was the key observation that drove Tikhonov's regularization theory — making the pseudoinverse both the motivation and the target that regularization approximates.

Theorem: SVD Representation of the Pseudoinverse

Let A ⁣:XY\mathcal{A} \colon \mathcal{X} \to \mathcal{Y} be a compact operator with singular system {(σk,vk,uk)}k=1\{(\sigma_k, v_k, u_k)\}_{k=1}^\infty. Then for yD(A)y \in \mathcal{D}(\mathcal{A}^\dagger),

Ay=k=11σky,ukvk.\mathcal{A}^\dagger y = \sum_{k=1}^{\infty} \frac{1}{\sigma_k}\, \langle y, u_k \rangle\, v_k.

This series converges if and only if the Picard condition holds:

k=1y,uk2σk2<.\sum_{k=1}^{\infty} \frac{|\langle y, u_k \rangle|^2}{\sigma_k^2} < \infty.

The SVD decomposes the action of A\mathcal{A} into independent channels: A\mathcal{A} maps vkv_k to σkuk\sigma_k u_k. Inversion requires dividing by σk\sigma_k in each channel. The Picard condition says the data coefficients y,uk\langle y, u_k \rangle must decay faster than σk\sigma_k for the sum to converge. For exact data from a true solution, this holds; for noisy data, it generically fails.

Definition:

The Picard Condition

Let A\mathcal{A} be a compact operator with singular system {(σk,vk,uk)}\{(\sigma_k, v_k, u_k)\}. A datum yYy \in \mathcal{Y} satisfies the Picard condition if

k=1y,uk2σk2<.\sum_{k=1}^{\infty} \frac{|\langle y, u_k \rangle|^2}{\sigma_k^2} < \infty.

This is equivalent to yD(A)y \in \mathcal{D}(\mathcal{A}^\dagger): the Fourier coefficients of yy with respect to the left singular vectors must decay sufficiently fast relative to the singular values.

For exact data y=Axy = \mathcal{A} x^\dagger, the Picard condition is automatically satisfied: the coefficients are σkx,vk\sigma_k\langle x^\dagger, v_k\rangle and the ratio gives x,vk2\langle x^\dagger, v_k\rangle \in \ell^2. For noisy data yδ=y+ηy^\delta = y + \eta, the noise coefficients η,uk\langle \eta, u_k\rangle do not decay, so the Picard condition fails and Ayδ\mathcal{A}^\dagger y^\delta diverges.

,

Theorem: Unboundedness of the Pseudoinverse for Compact Operators

Let A ⁣:XY\mathcal{A} \colon \mathcal{X} \to \mathcal{Y} be a compact operator between infinite-dimensional Hilbert spaces with R(A)\overline{\mathcal{R}(\mathcal{A})} infinite-dimensional. Then A\mathcal{A}^\dagger is unbounded.

Example: Noise Amplification Through the Pseudoinverse

Consider an integral equation Ax=y\mathcal{A}x = y on L2([0,1])L^2([0,1]) with singular values σk=k2\sigma_k = k^{-2} (mildly ill-posed). The true solution has coefficients x,vk=k3\langle x^\dagger, v_k \rangle = k^{-3}.

The data are corrupted by noise η\eta with i.i.d. coefficients η,ukN(0,δ2)\langle \eta, u_k \rangle \sim \mathcal{N}(0, \delta^2). Compute the expected reconstruction error EAyδx2\mathbb{E}\|\mathcal{A}^\dagger y^\delta - x^\dagger\|^2.

Noise Amplification Through the Pseudoinverse

Demonstrates the catastrophic noise amplification inherent in the pseudoinverse. A 1D signal is mapped through a compact forward operator with singular values σkkp\sigma_k \propto k^{-p}, and white noise of level δ\delta is added.

Left panel: SVD coefficients of the exact data (blue, decaying) and the noisy data (red, levelling off at δ\delta). The crossover point where noise dominates signal is clearly visible.

Right panel: The pseudoinverse reconstruction using the first NN components. When NN is too large, noise-dominated components destroy the reconstruction. This motivates truncated SVD and Tikhonov regularization (Section 2.4).

Increase the noise level to see how the safe truncation index decreases. This trade-off between resolution and stability is the fundamental dilemma of ill-posed problems.

Parameters
0.05
20
2

Common Mistake: The Pseudoinverse Is Not a Reconstruction Method

Mistake:

Computing Ayδ\mathbf{A}^\dagger \mathbf{y}^\delta (the pseudoinverse applied to noisy data) as the reconstruction of an ill-conditioned imaging problem, expecting it to give a reasonable image.

Correction:

For ill-conditioned systems, Ayδ\mathbf{A}^\dagger \mathbf{y}^\delta catastrophically amplifies noise: the kk-th SVD component of the result is yδ,uk/σk\langle \mathbf{y}^\delta, \mathbf{u}_k\rangle / \sigma_k, which grows without bound as σk0\sigma_k \to 0. The pseudoinverse is a mathematical concept (the minimum-norm least-squares solution), not a practical algorithm for noisy data. Always apply regularization.

Picard Condition

A datum yy satisfies the Picard condition for the operator A\mathcal{A} if ky,uk2/σk2<\sum_k |\langle y, u_k\rangle|^2/\sigma_k^2 < \infty, which is necessary and sufficient for Ay\mathcal{A}^\dagger y to be well-defined.

Related: Well-Posed Problem, Degree of Ill-Posedness

Key Takeaway

The Moore–Penrose pseudoinverse A\mathcal{A}^\dagger gives the minimum-norm least-squares solution via Ay=kσk1y,ukvk\mathcal{A}^\dagger y = \sum_k \sigma_k^{-1}\langle y,u_k\rangle v_k. For compact operators, A\mathcal{A}^\dagger is always unbounded — it cannot be used directly with noisy data. The Picard condition determines when Ay\mathcal{A}^\dagger y is well-defined: the data coefficients must decay faster than σk\sigma_k. Noise violates this condition, driving the reconstruction error to infinity. This is the mathematical justification for regularization theory.