Singular Value Decomposition

Why SVD Is the Backbone of MIMO

If you remember only one matrix decomposition from this entire textbook, let it be the singular value decomposition (SVD). Every major result in multi-antenna wireless communications either uses SVD directly or is a consequence of it:

  • Channel decomposition. The SVD of the channel matrix H=UΣVH\mathbf{H} = \mathbf{U}\mathbf{\Sigma}\mathbf{V}^H converts a coupled MIMO channel into parallel, independent scalar sub-channels. The singular values σi\sigma_i are the gains of these sub-channels.
  • Capacity and water-filling. The MIMO capacity formula C=ilog2(1+piσi2/σn2)C = \sum_i \log_2(1 + p_i \sigma_i^2 / \sigma_n^2) is expressed entirely in terms of singular values. Power allocation (water-filling) assigns power to sub-channels according to their singular values.
  • Beamforming. The optimal transmit beamformer is the right singular vector v1\mathbf{v}_1 corresponding to the largest singular value σ1\sigma_1; the optimal receive combiner is the left singular vector u1\mathbf{u}_1.
  • Low-rank channel models. Real-world channels often have a small number of dominant scattering clusters, leading to a channel matrix with rapidly decaying singular values. The Eckart--Young theorem tells us the best rank-kk approximation in any unitarily invariant norm.
  • Condition number. The ratio κ=σ1/σr\kappa = \sigma_1 / \sigma_r determines how sensitive the channel inversion (zero-forcing) is to noise. An ill-conditioned channel (κ1\kappa \gg 1) means some sub-channels are nearly unusable.

Unlike eigendecomposition, the SVD applies to any matrix — square or rectangular, Hermitian or not. It is the universal tool of linear algebra, and mastering it is prerequisite to everything that follows in this book.

Definition:

Singular Values and Singular Vectors

Let ACm×n\mathbf{A} \in \mathbb{C}^{m \times n} with r=rank(A)r = \operatorname{rank}(\mathbf{A}).

The singular values of A\mathbf{A} are the non-negative real numbers σ1σ2σr>0,\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_r > 0, defined as the positive square roots of the nonzero eigenvalues of the Hermitian positive semidefinite matrix AHACn×n\mathbf{A}^H \mathbf{A} \in \mathbb{C}^{n \times n}. That is, if λ1λr>0\lambda_1 \geq \cdots \geq \lambda_r > 0 are the nonzero eigenvalues of AHA\mathbf{A}^H \mathbf{A}, then σi=λi\sigma_i = \sqrt{\lambda_i} for i=1,,ri = 1, \ldots, r.

The right singular vectors of A\mathbf{A} are the orthonormal eigenvectors v1,,vnCn\mathbf{v}_1, \ldots, \mathbf{v}_n \in \mathbb{C}^n of AHA\mathbf{A}^H \mathbf{A}, ordered so that AHAvi=σi2vi\mathbf{A}^H \mathbf{A} \mathbf{v}_i = \sigma_i^2 \mathbf{v}_i for iri \leq r, and AHAvi=0\mathbf{A}^H \mathbf{A} \mathbf{v}_i = \mathbf{0} for i>ri > r.

The left singular vectors u1,,umCm\mathbf{u}_1, \ldots, \mathbf{u}_m \in \mathbb{C}^m are defined by ui=1σiAvi\mathbf{u}_i = \frac{1}{\sigma_i} \mathbf{A} \mathbf{v}_i for i=1,,ri = 1, \ldots, r. The remaining mrm - r left singular vectors are any orthonormal basis of N(AH)=R(A)\mathcal{N}(\mathbf{A}^H) = \mathcal{R}(\mathbf{A})^{\perp}.

The singular values are intrinsic to A\mathbf{A}: they do not depend on the choice of bases. The singular vectors, however, are not unique when singular values are repeated (one may choose any orthonormal basis within the corresponding subspace), and each singular vector is determined only up to a unit-modulus scalar ejθe^{j\theta}.

Theorem: SVD Existence Theorem

Every matrix ACm×n\mathbf{A} \in \mathbb{C}^{m \times n} can be decomposed as A=UΣVH,\mathbf{A} = \mathbf{U}\mathbf{\Sigma}\mathbf{V}^H, where UCm×m\mathbf{U} \in \mathbb{C}^{m \times m} is unitary, VCn×n\mathbf{V} \in \mathbb{C}^{n \times n} is unitary, and ΣRm×n\mathbf{\Sigma} \in \mathbb{R}^{m \times n} is a (possibly rectangular) diagonal matrix with non-negative entries on the main diagonal, ordered as σ1σ2σmin(m,n)0.\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_{\min(m,n)} \geq 0. The diagonal entries σi\sigma_i are the singular values of A\mathbf{A}, the columns of U\mathbf{U} are the left singular vectors, and the columns of V\mathbf{V} are the right singular vectors.

The SVD says that every linear map, no matter how complicated, is secretly just a rotation, followed by axis-aligned scaling, followed by another rotation. The right singular vectors vi\mathbf{v}_i are the "input directions" that the matrix treats independently; the left singular vectors ui\mathbf{u}_i are the corresponding "output directions"; and the singular values σi\sigma_i are the gains along each direction. For a MIMO channel, this means: transmit along vi\mathbf{v}_i, receive along ui\mathbf{u}_i, and the effective scalar gain is σi\sigma_i.

SVD Geometry: Rotation--Scaling--Rotation

The SVD transforms the unit circle through three stages: VH\mathbf{V}^H rotates, Σ\mathbf{\Sigma} scales to an ellipse, and U\mathbf{U} rotates to the final image. The singular values are the semi-axis lengths of the ellipse.
For any matrix A=UΣVH\mathbf{A} = \mathbf{U}\mathbf{\Sigma}\mathbf{V}^H, the action on the unit circle reveals the geometric essence of the SVD.

SVD in 3D: Unit Sphere to Ellipsoid

The same SVD transformation in three dimensions: the unit sphere is stretched into an ellipsoid along the principal axes. The three singular values are the semi-axis lengths. The camera rotates to reveal the full 3D structure.
For a 3×33 \times 3 matrix, the SVD maps the unit sphere to an ellipsoid whose semi-axes have lengths σ1,σ2,σ3\sigma_1, \sigma_2, \sigma_3.

Geometric Interpretation: Rotation–Scaling–Rotation

The SVD A=UΣVH\mathbf{A} = \mathbf{U}\mathbf{\Sigma}\mathbf{V}^H reveals that every linear map decomposes into three elementary operations:

  1. Rotation (and reflection) in the domain: VH\mathbf{V}^H is a unitary transformation that rotates the input space Cn\mathbb{C}^n, aligning the coordinate axes with the right singular vectors v1,,vn\mathbf{v}_1, \ldots, \mathbf{v}_n.

  2. Axis-aligned scaling: Σ\mathbf{\Sigma} stretches each axis by σi\sigma_i, and also changes the dimension of the space (from nn to mm) by padding with zeros or truncating.

  3. Rotation (and reflection) in the codomain: U\mathbf{U} rotates the output space Cm\mathbb{C}^m, mapping the scaled standard basis vectors to the left singular vectors u1,,um\mathbf{u}_1, \ldots, \mathbf{u}_m.

Unit sphere to ellipsoid. Consider the unit sphere S={xCn:x=1}\mathcal{S} = \{\mathbf{x} \in \mathbb{C}^n : \|\mathbf{x}\| = 1\}. Under A\mathbf{A}, this sphere maps to an ellipsoid (possibly degenerate) A(S)={Ax:x=1}\mathbf{A}(\mathcal{S}) = \{\mathbf{A}\mathbf{x} : \|\mathbf{x}\| = 1\} whose semi-axes have lengths σ1,σ2,,σr\sigma_1, \sigma_2, \ldots, \sigma_r and point along the directions u1,u2,,ur\mathbf{u}_1, \mathbf{u}_2, \ldots, \mathbf{u}_r.

This geometric picture is the reason SVD is so natural for MIMO: the channel H\mathbf{H} maps the transmit signal sphere into an ellipsoid at the receiver, and the semi-axes of that ellipsoid are precisely the sub-channel gains.

Theorem: Eckart--Young Low-Rank Approximation Theorem

Let ACm×n\mathbf{A} \in \mathbb{C}^{m \times n} have SVD A=i=1rσiuiviH\mathbf{A} = \sum_{i=1}^{r} \sigma_i \, \mathbf{u}_i \mathbf{v}_i^H with σ1σr>0\sigma_1 \geq \cdots \geq \sigma_r > 0 and r=rank(A)r = \operatorname{rank}(\mathbf{A}). For 1kr1 \leq k \leq r, define the rank-kk truncated SVD Ak=i=1kσiuiviH.\mathbf{A}_k = \sum_{i=1}^{k} \sigma_i \, \mathbf{u}_i \mathbf{v}_i^H. Then Ak\mathbf{A}_k is a best rank-kk approximation of A\mathbf{A} in the Frobenius norm: Ak=argminBCm×nrank(B)kABF,\mathbf{A}_k = \arg\min_{\substack{\mathbf{B} \in \mathbb{C}^{m \times n} \\ \operatorname{rank}(\mathbf{B}) \leq k}} \|\mathbf{A} - \mathbf{B}\|_F, and the approximation error is AAkF=σk+12+σk+22++σr2.\|\mathbf{A} - \mathbf{A}_k\|_F = \sqrt{\sigma_{k+1}^2 + \sigma_{k+2}^2 + \cdots + \sigma_r^2}. Moreover, the same result holds for the spectral (operator) norm: minBrank(B)kAB2=σk+1.\min_{\substack{\mathbf{B} \\ \operatorname{rank}(\mathbf{B}) \leq k}} \|\mathbf{A} - \mathbf{B}\|_2 = \sigma_{k+1}.

The SVD sorts the "energy" of a matrix by importance. The first singular vector pair (u1,v1)(\mathbf{u}_1, \mathbf{v}_1) captures the single rank-one matrix that best approximates A\mathbf{A}. Adding more terms improves the approximation greedily, and the Eckart--Young theorem says this greedy approach is globally optimal. In wireless communications, if a channel has rapidly decaying singular values, only a few dominant modes carry significant energy, and the channel is effectively low-rank.

SVD Geometry: Rotation–Scaling–Rotation

Visualize how SVD transforms the unit sphere through VH\mathbf{V}^H (rotate), Σ\mathbf{\Sigma} (scale to ellipsoid), U\mathbf{U} (rotate again). At each step the transformed surface is shown, along with the singular vector directions and their corresponding singular values as axis labels.

Parameters

Singular Values of a Parameterized Channel Matrix

Explore how singular values change as the channel matrix varies. The condition number σ1/σr\sigma_1/\sigma_r indicates how well-conditioned the channel is. A fully uncorrelated channel (ρ=0\rho = 0) has nearly equal singular values, while a highly correlated channel (ρ1\rho \to 1) has one dominant singular value and the rest collapse toward zero — resulting in a near-rank-one channel with dramatically reduced spatial multiplexing gain.

Parameters
0

Correlation between antenna elements

4

Progressive Rank-kk Approximation

Watch how adding successive rank-1 terms σiuiviH\sigma_i \mathbf{u}_i \mathbf{v}_i^H progressively reconstructs the original matrix. At each frame kk, the approximation Ak=i=1kσiuiviH\mathbf{A}_k = \sum_{i=1}^{k} \sigma_i \mathbf{u}_i \mathbf{v}_i^H is displayed alongside the Frobenius-norm error AAkF\|\mathbf{A} - \mathbf{A}_k\|_F. The Eckart--Young theorem guarantees this is the optimal rank-kk approximation.

Parameters

Eigendecomposition vs. Singular Value Decomposition

PropertyEigendecompositionSingular Value Decomposition (SVD)
Applicable toSquare matrices (n×nn \times n) onlyAny matrix (m×nm \times n), square or rectangular
Decomposition formA=PΛP1\mathbf{A} = \mathbf{P}\mathbf{\Lambda}\mathbf{P}^{-1} (if diagonalizable)A=UΣVH\mathbf{A} = \mathbf{U}\mathbf{\Sigma}\mathbf{V}^H (always exists)
Factor structureP\mathbf{P} generally not unitary; Λ\mathbf{\Lambda} complex diagonalU,V\mathbf{U}, \mathbf{V} both unitary; Σ\mathbf{\Sigma} real non-negative diagonal
ExistenceNot guaranteed (defective matrices have no eigendecomposition)Always exists for every matrix
SpectrumEigenvalues λiC\lambda_i \in \mathbb{C} (complex in general)Singular values σiR0\sigma_i \in \mathbb{R}_{\geq 0} (always real, non-negative)
Geometric meaningDirections scaled without rotation (eigenvectors)Rotation \to scaling \to rotation (unit sphere \to ellipsoid)
RelationEigenvalues of AHA\mathbf{A}^H\mathbf{A} are σi2\sigma_i^2For Hermitian A0\mathbf{A} \succeq 0: σi=λi\sigma_i = \lambda_i (eigenvalues = singular values)
Numerical stabilityCan be ill-conditioned for non-normal matricesAlways numerically stable (backward-stable algorithms exist)
Low-rank approximationNo direct optimality guaranteeTruncated SVD is optimal (Eckart--Young theorem)
Telecom applicationCovariance eigenanalysis, PCA, stability analysisMIMO channel decomposition, beamforming, channel estimation

Example: SVD of a 2×32 \times 3 Matrix

Compute the full SVD of A=(110011)R2×3.\mathbf{A} = \begin{pmatrix} 1 & 1 & 0 \\ 0 & 1 & 1 \end{pmatrix} \in \mathbb{R}^{2 \times 3}.

Historical Note: From Beltrami to Eckart--Young: The History of SVD

1873–1965

The singular value decomposition has a remarkably long and multi-threaded history, with key ideas discovered independently by several mathematicians:

Beltrami (1873). Eugenio Beltrami, an Italian mathematician best known for his work in differential geometry, was the first to consider the decomposition of a bilinear form into canonical form. In his 1873 paper "Sulle funzioni bilineari," he showed that a real bilinear form can be reduced to a sum of products of orthogonal linear forms — essentially the real SVD for square matrices. He identified the singular values as the positive square roots of the eigenvalues of ATA\mathbf{A}^T\mathbf{A}.

Jordan (1874). Just one year later, Camille Jordan independently obtained the same canonical decomposition. Jordan, already famous for the Jordan normal form (1870), approached the problem from the theory of bilinear and quadratic forms. His proof, published in the Journal de math'{e}matiques pures et appliqu'{e}es, used different techniques from Beltrami's but arrived at the same result.

Schmidt (1907). Erhard Schmidt extended the SVD to integral operators (compact operators on function spaces), presaging the modern functional-analytic viewpoint. His work introduced the "Schmidt pairs" (singular vector pairs) and established the convergence of the singular-value expansion.

Eckart and Young (1936). Carl Eckart and Gale Young proved the optimality of the truncated SVD for low-rank approximation. Their 1936 paper "The Approximation of One Matrix by Another of Lower Rank" in Psychometrika showed that the best rank-kk approximation (in the Frobenius norm) is obtained by keeping the kk largest singular values. This result is now fundamental in data compression, signal processing, and machine learning.

Golub and Kahan (1965). Gene Golub and William Kahan developed the first numerically stable algorithm for computing the SVD, based on bidiagonalization followed by QR-type iterations. This algorithmic breakthrough made the SVD practical for large-scale computation and is the ancestor of the SVD routines in LAPACK and MATLAB.

⚠️Engineering Note

SVD Computation: Cost and Algorithm Choice

The SVD of an m×nm \times n matrix (with mnm \geq n) costs O(mn2)O(mn^2) flops using the Golub–Kahan bidiagonalization algorithm (LAPACK's dgesvd). For typical MIMO dimensions:

  • 4×44 \times 4 MIMO: ~256 flops (negligible)
  • 64×1664 \times 16 massive MIMO: ~10510^5 flops (fast)
  • 256×64256 \times 64 XL-MIMO: ~10810^8 flops (requires optimization) When only the top kk singular values/vectors are needed (e.g., for rank-kk channel approximation), use truncated SVD (scipy.sparse.linalg.svds), which costs O(mnk)O(mnk) — a major savings when knk \ll n. In real-time MIMO receivers, the full SVD is computed once per coherence interval (\sim1 ms in 5G NR at 30 kHz SCS). At 64×1664 \times 16 dimensions and 1 ms update rate, this is well within the computational budget of a modern baseband processor.
Practical Constraints
  • LAPACK dgesvd: 4mn2+8n3\sim 4mn^2 + 8n^3 flops for the full SVD

  • 5G NR slot duration at 30 kHz SCS: 0.5 ms (14 OFDM symbols)

  • For real-time: prefer economic SVD (U\mathbf{U} is m×nm \times n, not m×mm \times m) to save memory

Key Takeaway

The SVD is the decomposition for wireless communications. For any channel matrix H=UΣVH\mathbf{H} = \mathbf{U}\mathbf{\Sigma}\mathbf{V}^H: (1) the singular values σi\sigma_i are the sub-channel gains; (2) the right singular vectors vi\mathbf{v}_i are the optimal transmit directions (beamformers); (3) the left singular vectors ui\mathbf{u}_i are the optimal receive combiners; (4) the number of nonzero singular values equals the spatial multiplexing rank; and (5) the condition number σ1/σr\sigma_1/\sigma_r governs sensitivity to noise. Unlike eigendecomposition, SVD works for any matrix — square or rectangular, Hermitian or not — making it the universally applicable tool. Every capacity formula, every beamforming design, and every channel estimation algorithm in MIMO communications is, at its core, an application of SVD.

Why This Matters: MIMO Channel SVD: Parallel Sub-channels

Consider a narrowband MIMO system with ntn_t transmit and nrn_r receive antennas: y=Hx+n,\mathbf{y} = \mathbf{H}\mathbf{x} + \mathbf{n}, where HCnr×nt\mathbf{H} \in \mathbb{C}^{n_r \times n_t} is the channel matrix and nCN(0,σn2I)\mathbf{n} \sim \mathcal{CN}(\mathbf{0}, \sigma_n^2 \mathbf{I}).

Let H=UΣVH\mathbf{H} = \mathbf{U}\mathbf{\Sigma}\mathbf{V}^H be the SVD with singular values σ1σr>0\sigma_1 \geq \cdots \geq \sigma_r > 0 and r=rank(H)min(nt,nr)r = \operatorname{rank}(\mathbf{H}) \leq \min(n_t, n_r).

Precoding and combining. If the transmitter precodes with V\mathbf{V} (i.e., transmits x=Vx~\mathbf{x} = \mathbf{V}\tilde{\mathbf{x}}) and the receiver applies UH\mathbf{U}^H (i.e., forms y~=UHy\tilde{\mathbf{y}} = \mathbf{U}^H \mathbf{y}), then: y~=UHHVx~+UHn=Σx~+n~,\tilde{\mathbf{y}} = \mathbf{U}^H \mathbf{H} \mathbf{V} \tilde{\mathbf{x}} + \mathbf{U}^H \mathbf{n} = \mathbf{\Sigma} \tilde{\mathbf{x}} + \tilde{\mathbf{n}}, where n~=UHn\tilde{\mathbf{n}} = \mathbf{U}^H \mathbf{n} has the same distribution as n\mathbf{n} (since U\mathbf{U} is unitary and preserves the i.i.d.\ Gaussian distribution).

This decouples into rr independent scalar sub-channels: y~i=σix~i+n~i,i=1,,r.\tilde{y}_i = \sigma_i \tilde{x}_i + \tilde{n}_i, \qquad i = 1, \ldots, r.

The capacity with full CSI at both ends is therefore: C=i=1rlog2 ⁣(1+piσi2σn2)bits/s/Hz,C = \sum_{i=1}^{r} \log_2\!\left(1 + \frac{p_i \sigma_i^2}{\sigma_n^2}\right) \quad \text{bits/s/Hz}, where the power allocation {pi}\{p_i\} is determined by water-filling: pi=(μσn2/σi2)+p_i = (\mu - \sigma_n^2/\sigma_i^2)^+ subject to ipiP\sum_i p_i \leq P.

Key insight: The SVD simultaneously diagonalizes the channel, orthogonalizes the noise, and reveals the optimal signaling directions. No other decomposition achieves all three.

See full treatment in MIMO Capacity: Deterministic Channels

Common Mistake: Singular Values Are Not Eigenvalues

Mistake:

A common error is to conflate the singular values of a matrix A\mathbf{A} with its eigenvalues. Students often write σi(A)=λi(A)\sigma_i(\mathbf{A}) = |\lambda_i(\mathbf{A})| or assume that the SVD and eigendecomposition produce the same factors.

Correction:

Singular values and eigenvalues are distinct concepts that coincide only in special cases:

1. Different definitions. Eigenvalues satisfy Av=λv\mathbf{A}\mathbf{v} = \lambda \mathbf{v} (require A\mathbf{A} square). Singular values are σi=λi(AHA)\sigma_i = \sqrt{\lambda_i(\mathbf{A}^H\mathbf{A})} (work for any matrix).

2. σiλi\sigma_i \neq |\lambda_i| in general. For the matrix A=(0200)\mathbf{A} = \begin{pmatrix} 0 & 2 \\ 0 & 0 \end{pmatrix}, both eigenvalues are 00, yet σ1=2,σ2=0\sigma_1 = 2, \sigma_2 = 0.

3. When they do agree.

  • Hermitian PSD matrices: A=AH0\mathbf{A} = \mathbf{A}^H \succeq 0 implies σi=λi\sigma_i = \lambda_i (eigenvalues are the singular values).
  • Hermitian matrices (general): σi=λi\sigma_i = |\lambda_i|.
  • Normal matrices: σi=λi\sigma_i = |\lambda_i| (with matching ordering of λi|\lambda_i| in decreasing order).
  • Non-normal matrices: No simple relation. The singular values depend on the interaction between A\mathbf{A} and AH\mathbf{A}^H, not just on the eigenvalues.

4. Practical consequence. In MIMO, the channel gains are the singular values of H\mathbf{H}, not its eigenvalues (which may not even exist if H\mathbf{H} is rectangular). The eigenvalues of HHH\mathbf{H}^H\mathbf{H} are σi2\sigma_i^2, which is a related but different quantity.

Singular Value

For a matrix ACm×n\mathbf{A} \in \mathbb{C}^{m \times n}, the ii-th singular value σi\sigma_i is the positive square root of the ii-th largest eigenvalue of AHA\mathbf{A}^H \mathbf{A}. Equivalently, σi\sigma_i is the ii-th semi-axis length of the ellipsoid obtained by mapping the unit sphere through A\mathbf{A}. Singular values are always real and non-negative, ordered σ1σ20\sigma_1 \geq \sigma_2 \geq \cdots \geq 0.

Related: singular vector, SVD Existence Theorem, Eigenvalue and Eigenvector, Frobenius Norm

Frobenius Norm

The Frobenius norm of a matrix ACm×n\mathbf{A} \in \mathbb{C}^{m \times n} is AF=i=1mj=1naij2=tr(AHA)=σ12+σ22++σr2,\|\mathbf{A}\|_F = \sqrt{\sum_{i=1}^m \sum_{j=1}^n |a_{ij}|^2} = \sqrt{\operatorname{tr}(\mathbf{A}^H \mathbf{A})} = \sqrt{\sigma_1^2 + \sigma_2^2 + \cdots + \sigma_r^2}, where σ1,,σr\sigma_1, \ldots, \sigma_r are the singular values. The Frobenius norm is unitarily invariant: UAVF=AF\|\mathbf{U}\mathbf{A}\mathbf{V}\|_F = \|\mathbf{A}\|_F for any unitary U\mathbf{U} and V\mathbf{V}.

Related: Singular Value, spectral norm, Trace of a Matrix

Condition Number

The condition number of a matrix A\mathbf{A} with respect to the 2-norm is κ(A)=σ1σr,\kappa(\mathbf{A}) = \frac{\sigma_1}{\sigma_r}, where σ1\sigma_1 is the largest singular value and σr\sigma_r is the smallest nonzero singular value. A matrix with κ1\kappa \approx 1 is well-conditioned; κ1\kappa \gg 1 indicates ill-conditioning. In MIMO, the condition number of the channel matrix determines how sensitive zero-forcing receivers are to noise amplification.

Related: Singular Value, Why SVD Is the Backbone of MIMO, Null Space and Zero-Forcing in Multiuser MIMO

Quick Check

Let AC3×2\mathbf{A} \in \mathbb{C}^{3 \times 2} have singular values σ1=5\sigma_1 = 5 and σ2=3\sigma_2 = 3. What is AF\|\mathbf{A}\|_F?

88

34\sqrt{34}

55

1515

Quick Check

A 4×44 \times 4 MIMO channel matrix H\mathbf{H} has singular values σ1=3,σ2=2,σ3=1,σ4=0.5\sigma_1 = 3, \sigma_2 = 2, \sigma_3 = 1, \sigma_4 = 0.5. What is the Frobenius-norm error of the best rank-2 approximation H2\mathbf{H}_2?

1.51.5

1.25\sqrt{1.25}

11

14.25\sqrt{14.25}