Ferkans — Interactive Telecom Tutor

ex-ch01-01

Easy

Find the eigenvalues and eigenvectors of $\mathbf{A} = \begin{bmatrix} 3 & 1 \\ 1 & 3 \end{bmatrix}.$ Then verify the spectral decomposition $\mathbf{A} = \mathbf{Q}\mathbf{\Lambda}\mathbf{Q}^T$ by explicitly multiplying the factors.

Show Hint

The characteristic polynomial is $\det(\mathbf{A} - \lambda \mathbf{I}) = (3-\lambda)^2 - 1$ .

After factoring, the eigenvalues are $\lambda_1 = 4$ and $\lambda_2 = 2$ . Find each eigenvector by solving $(\mathbf{A} - \lambda_i \mathbf{I})\mathbf{v} = \mathbf{0}$ .

Normalize the eigenvectors and arrange them as columns of $\mathbf{Q}$ . Because $\mathbf{A}$ is real symmetric, $\mathbf{Q}$ is orthogonal.

Solution

Characteristic polynomial

The characteristic polynomial is $\det(\mathbf{A} - \lambda \mathbf{I}) = (3 - \lambda)^2 - 1 = \lambda^2 - 6\lambda + 8 = (\lambda - 4)(\lambda - 2).$ Therefore the eigenvalues are $\lambda_1 = 4$ and $\lambda_2 = 2$ .

Eigenvectors

For $\lambda_1 = 4$ : $(\mathbf{A} - 4\mathbf{I})\mathbf{v} = \begin{bmatrix} -1 & 1 \\ 1 & -1 \end{bmatrix}\mathbf{v} = \mathbf{0} \;\Longrightarrow\; \mathbf{v}_1 = \frac{1}{\sqrt{2}}\begin{bmatrix} 1 \\ 1 \end{bmatrix}.$

For $\lambda_2 = 2$ : $(\mathbf{A} - 2\mathbf{I})\mathbf{v} = \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix}\mathbf{v} = \mathbf{0} \;\Longrightarrow\; \mathbf{v}_2 = \frac{1}{\sqrt{2}}\begin{bmatrix} 1 \\ -1 \end{bmatrix}.$

Spectral decomposition and verification

Set $\mathbf{Q} = \frac{1}{\sqrt{2}}\begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix}, \qquad \mathbf{\Lambda} = \begin{bmatrix} 4 & 0 \\ 0 & 2 \end{bmatrix}.$ Then $\mathbf{Q}\mathbf{\Lambda}\mathbf{Q}^T = \frac{1}{2}\begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \begin{bmatrix} 4 & 0 \\ 0 & 2 \end{bmatrix} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} = \frac{1}{2}\begin{bmatrix} 4 & 2 \\ 4 & -2 \end{bmatrix} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} = \begin{bmatrix} 3 & 1 \\ 1 & 3 \end{bmatrix} = \mathbf{A}. \;\square$

ex-ch01-02

Easy

Compute the singular value decomposition (SVD) of $\mathbf{A} = \begin{bmatrix} 1 & 1 \\ 1 & 0 \\ 0 & 1 \end{bmatrix}.$ Verify the factorization $\mathbf{A} = \mathbf{U}\mathbf{\Sigma}\mathbf{V}^H$ .

Show Hint

Start with $\mathbf{A}^T\mathbf{A} = \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix}$ . Its eigenvalues give $\sigma_i^2$ .

The eigenvalues of $\mathbf{A}^T\mathbf{A}$ are $3$ and $1$ , so $\sigma_1 = \sqrt{3}$ and $\sigma_2 = 1$ . The eigenvectors of $\mathbf{A}^T\mathbf{A}$ form $\mathbf{V}$ .

Compute $\mathbf{u}_i = \frac{1}{\sigma_i}\mathbf{A}\mathbf{v}_i$ to get the left singular vectors.

Solution

Compute $\mathbf{A}^T\mathbf{A}$ and its eigen-decomposition

$\mathbf{A}^T\mathbf{A} = \begin{bmatrix} 1 & 1 & 0 \\ 1 & 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 1 \\ 1 & 0 \\ 0 & 1 \end{bmatrix} = \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix}.KATEXPLACEHOLDER0END\mathbf{v}_1 = \frac{1}{\sqrt{2}}\begin{bmatrix} 1 \\ 1 \end{bmatrix},\qquad \mathbf{v}_2 = \frac{1}{\sqrt{2}}\begin{bmatrix} 1 \\ -1 \end{bmatrix}.$ $

Compute left singular vectors

$\mathbf{u}_1 = \frac{1}{\sigma_1}\mathbf{A}\mathbf{v}_1 = \frac{1}{\sqrt{3}}\cdot\frac{1}{\sqrt{2}} \begin{bmatrix} 1+1 \\ 1+0 \\ 0+1 \end{bmatrix} = \frac{1}{\sqrt{6}}\begin{bmatrix} 2 \\ 1 \\ 1 \end{bmatrix}.KATEXPLACEHOLDER0END\mathbf{u}_2 = \frac{1}{\sigma_2}\mathbf{A}\mathbf{v}_2 = \frac{1}{\sqrt{2}} \begin{bmatrix} 1-1 \\ 1-0 \\ 0-1 \end{bmatrix} = \frac{1}{\sqrt{2}}\begin{bmatrix} 0 \\ 1 \\ -1 \end{bmatrix}.KATEXPLACEHOLDER1END\mathbf{u}_3 = \frac{1}{\sqrt{3}}\begin{bmatrix} -1 \\ 1 \\ 1 \end{bmatrix}.$ $

Assemble the SVD and verify

$\mathbf{U} = \begin{bmatrix} \frac{2}{\sqrt{6}} & 0 & \frac{-1}{\sqrt{3}} \\[4pt] \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} \\[4pt] \frac{1}{\sqrt{6}} & \frac{-1}{\sqrt{2}} & \frac{1}{\sqrt{3}} \end{bmatrix},\quad \mathbf{\Sigma} = \begin{bmatrix} \sqrt{3} & 0 \\ 0 & 1 \\ 0 & 0 \end{bmatrix},\quad \mathbf{V} = \frac{1}{\sqrt{2}}\begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix}.KATEXPLACEHOLDER0END\mathbf{U}\mathbf{\Sigma}\mathbf{V}^T = \sigma_1 \mathbf{u}_1 \mathbf{v}_1^T + \sigma_2 \mathbf{u}_2 \mathbf{v}_2^T = \sqrt{3}\cdot\frac{1}{\sqrt{6}}\begin{bmatrix}2\\1\\1\end{bmatrix}\cdot\frac{1}{\sqrt{2}}\begin{bmatrix}1&1\end{bmatrix} + 1\cdot\frac{1}{\sqrt{2}}\begin{bmatrix}0\\1\\-1\end{bmatrix}\cdot\frac{1}{\sqrt{2}}\begin{bmatrix}1&-1\end{bmatrix}.KATEXPLACEHOLDER1END\begin{bmatrix}1&1\\1&0\\0&1\end{bmatrix} = \mathbf{A}. \;\square$ $

ex-ch01-03

Easy

For $\mathbf{A} = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix},\qquad \mathbf{B} = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix},$ compute the Kronecker product $\mathbf{A} \otimes \mathbf{B}$ and verify that the result is $4 \times 4$ .

Show Hint

Recall $\mathbf{A} \otimes \mathbf{B}$ replaces each entry $a_{ij}$ of $\mathbf{A}$ by the $2\times 2$ block $a_{ij}\mathbf{B}$ .

The result is a $2\cdot 2 \times 2\cdot 2 = 4 \times 4$ matrix.

For example, the upper-left $2\times 2$ block is $1 \cdot \mathbf{B} = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}$ .

Solution

Apply the definition block by block

$\mathbf{A} \otimes \mathbf{B} = \begin{bmatrix} a_{11}\mathbf{B} & a_{12}\mathbf{B} \\[4pt] a_{21}\mathbf{B} & a_{22}\mathbf{B} \end{bmatrix} = \begin{bmatrix} 1\cdot\begin{bmatrix}0&1\\1&0\end{bmatrix} & 2\cdot\begin{bmatrix}0&1\\1&0\end{bmatrix} \\[6pt] 3\cdot\begin{bmatrix}0&1\\1&0\end{bmatrix} & 4\cdot\begin{bmatrix}0&1\\1&0\end{bmatrix} \end{bmatrix}.$ $

Write the full $4 \times 4$ matrix

$\mathbf{A} \otimes \mathbf{B} = \begin{bmatrix} 0 & 1 & 0 & 2 \\ 1 & 0 & 2 & 0 \\ 0 & 3 & 0 & 4 \\ 3 & 0 & 4 & 0 \end{bmatrix}.$ $

Verify dimensions

$\mathbf{A}$ is $2 \times 2$ and $\mathbf{B}$ is $2 \times 2$ , so $\mathbf{A}\otimes\mathbf{B}$ is $(2\cdot 2)\times(2\cdot 2) = 4 \times 4$ . This matches the matrix above. $\square$

ex-ch01-04

Easy

Apply the Gram–Schmidt process to the set of vectors $\left\{\begin{bmatrix}1\\1\\0\end{bmatrix},\; \begin{bmatrix}1\\0\\1\end{bmatrix},\; \begin{bmatrix}0\\1\\1\end{bmatrix}\right\}$ in $\mathbb{R}^3$ to produce an orthonormal basis.

Show Hint

Set $\mathbf{u}_1 = \mathbf{a}_1/\|\mathbf{a}_1\|$ . Then project $\mathbf{a}_2$ onto $\mathbf{u}_1$ and subtract.

After the first step, $\mathbf{u}_1 = \frac{1}{\sqrt{2}}(1,1,0)^T$ . The projection of $\mathbf{a}_2$ onto $\mathbf{u}_1$ has coefficient $\langle \mathbf{a}_2, \mathbf{u}_1 \rangle = 1/\sqrt{2}$ .

Repeat for the third vector: subtract projections onto both $\mathbf{u}_1$ and $\mathbf{u}_2$ , then normalize.

Solution

First vector

Let $\mathbf{a}_1 = (1,1,0)^T$ . Then $\mathbf{u}_1 = \frac{\mathbf{a}_1}{\|\mathbf{a}_1\|} = \frac{1}{\sqrt{2}}\begin{bmatrix}1\\1\\0\end{bmatrix}.$

Second vector

Let $\mathbf{a}_2 = (1,0,1)^T$ . Compute the projection coefficient: $\langle \mathbf{a}_2, \mathbf{u}_1 \rangle = \frac{1}{\sqrt{2}}(1\cdot 1 + 0\cdot 1 + 1\cdot 0) = \frac{1}{\sqrt{2}}.$ Subtract the projection: $\widetilde{\mathbf{u}}_2 = \mathbf{a}_2 - \langle \mathbf{a}_2, \mathbf{u}_1 \rangle \mathbf{u}_1 = \begin{bmatrix}1\\0\\1\end{bmatrix} - \frac{1}{\sqrt{2}}\cdot\frac{1}{\sqrt{2}}\begin{bmatrix}1\\1\\0\end{bmatrix} = \begin{bmatrix}1\\0\\1\end{bmatrix} - \begin{bmatrix}1/2\\1/2\\0\end{bmatrix} = \begin{bmatrix}1/2\\-1/2\\1\end{bmatrix}.$ Normalize: $\|\widetilde{\mathbf{u}}_2\| = \sqrt{1/4+1/4+1} = \sqrt{3/2}$ , so $\mathbf{u}_2 = \frac{1}{\sqrt{3/2}}\begin{bmatrix}1/2\\-1/2\\1\end{bmatrix} = \frac{1}{\sqrt{6}}\begin{bmatrix}1\\-1\\2\end{bmatrix}.$

Third vector

Let $\mathbf{a}_3 = (0,1,1)^T$ . Compute projections: $\langle \mathbf{a}_3, \mathbf{u}_1 \rangle = \frac{1}{\sqrt{2}}(0+1+0) = \frac{1}{\sqrt{2}},$ $\langle \mathbf{a}_3, \mathbf{u}_2 \rangle = \frac{1}{\sqrt{6}}(0-1+2) = \frac{1}{\sqrt{6}}.$ Subtract both projections: $\widetilde{\mathbf{u}}_3 = \begin{bmatrix}0\\1\\1\end{bmatrix} - \frac{1}{\sqrt{2}}\cdot\frac{1}{\sqrt{2}}\begin{bmatrix}1\\1\\0\end{bmatrix} - \frac{1}{\sqrt{6}}\cdot\frac{1}{\sqrt{6}}\begin{bmatrix}1\\-1\\2\end{bmatrix} = \begin{bmatrix}0\\1\\1\end{bmatrix} - \begin{bmatrix}1/2\\1/2\\0\end{bmatrix} - \begin{bmatrix}1/6\\-1/6\\1/3\end{bmatrix} = \begin{bmatrix}-2/3\\2/3\\2/3\end{bmatrix}.$ Normalize: $\|\widetilde{\mathbf{u}}_3\| = \frac{2}{3}\sqrt{3}$ , so $\mathbf{u}_3 = \frac{1}{\sqrt{3}}\begin{bmatrix}-1\\1\\1\end{bmatrix}.$

Final orthonormal basis

The Gram–Schmidt process yields the orthonormal basis $\left\{ \frac{1}{\sqrt{2}}\begin{bmatrix}1\\1\\0\end{bmatrix},\; \frac{1}{\sqrt{6}}\begin{bmatrix}1\\-1\\2\end{bmatrix},\; \frac{1}{\sqrt{3}}\begin{bmatrix}-1\\1\\1\end{bmatrix} \right\}.$ One can verify $\mathbf{u}_i^T \mathbf{u}_j = \delta_{ij}$ for all pairs. $\square$

ex-ch01-05

Easy

Given the complex vector $\mathbf{x} = \begin{bmatrix} 1 \\ j \\ 1-j \end{bmatrix},$ compute the $\ell_1$ -norm $\|\mathbf{x}\|_1$ , the $\ell_2$ -norm $\|\mathbf{x}\|_2$ , and the $\ell_\infty$ -norm $\|\mathbf{x}\|_\infty$ .

Show Hint

For complex vectors, $\|\mathbf{x}\|_1 = \sum_i |x_i|$ where $|\cdot|$ is the complex modulus.

Compute $|1| = 1$ , $|j| = 1$ , $|1-j| = \sqrt{1^2+(-1)^2} = \sqrt{2}$ .

$\|\mathbf{x}\|_2 = \sqrt{\mathbf{x}^H \mathbf{x}} = \sqrt{\sum_i |x_i|^2}$ and $\|\mathbf{x}\|_\infty = \max_i |x_i|$ .

Solution

Moduli of the entries

$|x_1| = |1| = 1,\qquad |x_2| = |j| = 1,\qquad |x_3| = |1-j| = \sqrt{1^2+1^2} = \sqrt{2}.$ $

$\ell_1$-norm

$\|\mathbf{x}\|_1 = |x_1|+|x_2|+|x_3| = 1+1+\sqrt{2} = 2+\sqrt{2} \approx 3.414.$ $

$\ell_2$-norm

$\|\mathbf{x}\|_2 = \sqrt{|x_1|^2+|x_2|^2+|x_3|^2} = \sqrt{1+1+2} = \sqrt{4} = 2.$ $

$\ell_\infty$-norm

$\|\mathbf{x}\|_\infty = \max\{|x_1|,|x_2|,|x_3|\} = \max\{1,1,\sqrt{2}\} = \sqrt{2} \approx 1.414.$ $As a sanity check, note the standard inequality chain$ |\mathbf{x}|_\infty \le |\mathbf{x}|_2 \le |\mathbf{x}|_1 $is satisfied:$ \sqrt{2} \le 2 \le 2+\sqrt{2} $.$ \square$

ex-ch01-06

Medium

Let $\mathbf{A} \in \mathbb{C}^{m \times n}$ with $\operatorname{rank}(\mathbf{A}) = n$ (i.e., $\mathbf{A}$ has full column rank, $m \ge n$ ). Show that $\mathbf{A}^H\mathbf{A}$ is positive definite.

Show Hint

A matrix $\mathbf{M}$ is positive definite if $\mathbf{x}^H \mathbf{M} \mathbf{x} > 0$ for all nonzero $\mathbf{x}$ .

Write $\mathbf{x}^H (\mathbf{A}^H\mathbf{A}) \mathbf{x} = \|\mathbf{A}\mathbf{x}\|^2$ and think about when this can be zero.

Full column rank means $\operatorname{null}(\mathbf{A}) = \{\mathbf{0}\}$ , so $\mathbf{A}\mathbf{x} = \mathbf{0}$ implies $\mathbf{x} = \mathbf{0}$ .

Solution

Hermitian property

First, $\mathbf{A}^H\mathbf{A}$ is Hermitian: $(\mathbf{A}^H\mathbf{A})^H = \mathbf{A}^H(\mathbf{A}^H)^H = \mathbf{A}^H\mathbf{A}.$ So it makes sense to ask whether it is positive definite.

Quadratic form as a squared norm

For any $\mathbf{x} \in \mathbb{C}^n$ , $\mathbf{x}^H(\mathbf{A}^H\mathbf{A})\mathbf{x} = (\mathbf{A}\mathbf{x})^H(\mathbf{A}\mathbf{x}) = \|\mathbf{A}\mathbf{x}\|_2^2 \ge 0.$ This shows $\mathbf{A}^H\mathbf{A}$ is at least positive semidefinite.

Strict positivity from full column rank

Now suppose $\mathbf{x} \ne \mathbf{0}$ . Since $\operatorname{rank}(\mathbf{A}) = n$ , the null space of $\mathbf{A}$ is $\{\mathbf{0}\}$ . Therefore $\mathbf{A}\mathbf{x} \ne \mathbf{0}$ , which gives $\|\mathbf{A}\mathbf{x}\|_2^2 > 0.$ Hence $\mathbf{x}^H(\mathbf{A}^H\mathbf{A})\mathbf{x} > 0$ for all nonzero $\mathbf{x}$ , proving that $\mathbf{A}^H\mathbf{A}$ is positive definite. $\square$

ex-ch01-07

Medium

Prove that for any $\mathbf{A} \in \mathbb{C}^{m \times n}$ , $\operatorname{tr}(\mathbf{A}^H\mathbf{A}) = \sum_{i=1}^{m}\sum_{j=1}^{n} |a_{ij}|^2 = \|\mathbf{A}\|_F^2,$ where $\|\mathbf{A}\|_F$ is the Frobenius norm.

Show Hint

Write out the $(k,k)$ diagonal entry of $\mathbf{A}^H\mathbf{A}$ using the definition of matrix multiplication.

$(\mathbf{A}^H\mathbf{A})_{kk} = \sum_{i=1}^{m} \overline{a_{ik}} a_{ik} = \sum_{i=1}^{m} |a_{ik}|^2$ . This is the squared $\ell_2$ -norm of column $k$ .

Sum the diagonal entries over $k = 1,\dots,n$ to get the trace.

Solution

Diagonal entries of $\mathbf{A}^H\mathbf{A}$

Let $\mathbf{A} = [a_{ij}]$ with $i = 1,\dots,m$ and $j = 1,\dots,n$ . The $(k,\ell)$ entry of $\mathbf{A}^H\mathbf{A}$ is $(\mathbf{A}^H\mathbf{A})_{k\ell} = \sum_{i=1}^{m} \overline{a_{ik}}\, a_{i\ell}.$ Setting $k = \ell$ : $(\mathbf{A}^H\mathbf{A})_{kk} = \sum_{i=1}^{m} \overline{a_{ik}}\, a_{ik} = \sum_{i=1}^{m} |a_{ik}|^2.$

Take the trace

$\operatorname{tr}(\mathbf{A}^H\mathbf{A}) = \sum_{k=1}^{n} (\mathbf{A}^H\mathbf{A})_{kk} = \sum_{k=1}^{n}\sum_{i=1}^{m} |a_{ik}|^2 = \sum_{i=1}^{m}\sum_{k=1}^{n} |a_{ik}|^2.KATEXPLACEHOLDER0END\|\mathbf{A}\|_F^2 \;:=\; \sum_{i=1}^{m}\sum_{j=1}^{n} |a_{ij}|^2.$ $This completes the proof.$ \square$

ex-ch01-08

Medium

Show that for any unitary matrix $\mathbf{U}$ (i.e., $\mathbf{U}^H\mathbf{U} = \mathbf{I}$ ), the Euclidean norm is preserved: $\|\mathbf{U}\mathbf{x}\|_2 = \|\mathbf{x}\|_2 \quad \text{for all } \mathbf{x}.$

Show Hint

Expand $\|\mathbf{U}\mathbf{x}\|_2^2 = (\mathbf{U}\mathbf{x})^H(\mathbf{U}\mathbf{x})$ .

Use the associativity of the Hermitian transpose: $(\mathbf{U}\mathbf{x})^H = \mathbf{x}^H\mathbf{U}^H$ .

Apply $\mathbf{U}^H\mathbf{U} = \mathbf{I}$ and take square roots.

Solution

Expand the squared norm

$\|\mathbf{U}\mathbf{x}\|_2^2 = (\mathbf{U}\mathbf{x})^H(\mathbf{U}\mathbf{x}) = \mathbf{x}^H \mathbf{U}^H \mathbf{U}\,\mathbf{x}.$ $

Apply the unitary property

Since $\mathbf{U}^H\mathbf{U} = \mathbf{I}$ , $\mathbf{x}^H \mathbf{U}^H \mathbf{U}\,\mathbf{x} = \mathbf{x}^H \mathbf{I}\,\mathbf{x} = \mathbf{x}^H \mathbf{x} = \|\mathbf{x}\|_2^2.$

Conclude

We have shown $\|\mathbf{U}\mathbf{x}\|_2^2 = \|\mathbf{x}\|_2^2$ . Since norms are non-negative, taking square roots gives $\|\mathbf{U}\mathbf{x}\|_2 = \|\mathbf{x}\|_2.$ Geometrically, unitary transformations are rotations (possibly with reflections) in $\mathbb{C}^n$ — they preserve lengths and angles. $\square$

ex-ch01-09

Medium

For real vectors $\mathbf{a}, \mathbf{x} \in \mathbb{R}^n$ , derive the gradients $\frac{\partial}{\partial \mathbf{x}}(\mathbf{a}^T\mathbf{x}) = \mathbf{a} \qquad\text{and}\qquad \frac{\partial}{\partial \mathbf{x}}(\mathbf{x}^T\mathbf{a}) = \mathbf{a}.$

Show Hint

Write the scalar $\mathbf{a}^T\mathbf{x} = \sum_{i=1}^n a_i x_i$ and differentiate with respect to each $x_k$ .

The gradient $\frac{\partial f}{\partial \mathbf{x}}$ is the vector whose $k$ -th entry is $\frac{\partial f}{\partial x_k}$ .

Note $\mathbf{a}^T\mathbf{x} = \mathbf{x}^T\mathbf{a}$ since both are the same scalar.

Solution

Expand as a sum

Let $f(\mathbf{x}) = \mathbf{a}^T\mathbf{x} = \sum_{i=1}^n a_i x_i$ . This is a scalar-valued function of the vector $\mathbf{x}$ .

Differentiate component-wise

The $k$ -th component of the gradient is $\left(\frac{\partial f}{\partial \mathbf{x}}\right)_k = \frac{\partial}{\partial x_k}\sum_{i=1}^n a_i x_i = a_k,$ since each term $a_i x_i$ with $i \ne k$ has zero derivative with respect to $x_k$ , and the $i = k$ term contributes $a_k$ .

Assemble the gradient

Stacking all components: $\frac{\partial}{\partial \mathbf{x}}(\mathbf{a}^T\mathbf{x}) = \begin{bmatrix} a_1 \\ a_2 \\ \vdots \\ a_n \end{bmatrix} = \mathbf{a}.$

Second identity

Since $\mathbf{a}^T\mathbf{x} = \mathbf{x}^T\mathbf{a}$ (both equal the same scalar $\sum_i a_i x_i$ for real vectors), we immediately have $\frac{\partial}{\partial \mathbf{x}}(\mathbf{x}^T\mathbf{a}) = \mathbf{a}.$ Note: for complex vectors with Wirtinger derivatives, the two expressions differ, but in the real case they coincide. $\square$

ex-ch01-10

Medium

Prove that all eigenvalues of a positive definite matrix $\mathbf{A}$ are strictly positive.

Show Hint

Let $\lambda$ be an eigenvalue with eigenvector $\mathbf{v} \ne \mathbf{0}$ , so $\mathbf{A}\mathbf{v} = \lambda \mathbf{v}$ .

Premultiply both sides by $\mathbf{v}^H$ and use the definition of positive definiteness.

Isolate $\lambda = \mathbf{v}^H\mathbf{A}\mathbf{v} / \mathbf{v}^H\mathbf{v}$ — the Rayleigh quotient.

Solution

Set up the eigenvalue equation

Let $\lambda$ be an eigenvalue of $\mathbf{A}$ with corresponding eigenvector $\mathbf{v} \ne \mathbf{0}$ : $\mathbf{A}\mathbf{v} = \lambda\mathbf{v}.$

Premultiply by $\mathbf{v}^H$

$\mathbf{v}^H\mathbf{A}\mathbf{v} = \mathbf{v}^H(\lambda\mathbf{v}) = \lambda\,\mathbf{v}^H\mathbf{v} = \lambda\|\mathbf{v}\|^2.$ $

Apply positive definiteness

Since $\mathbf{A}$ is positive definite and $\mathbf{v} \ne \mathbf{0}$ , we have $\mathbf{v}^H\mathbf{A}\mathbf{v} > 0$ . Also $\|\mathbf{v}\|^2 > 0$ . Therefore $\lambda = \frac{\mathbf{v}^H\mathbf{A}\mathbf{v}}{\|\mathbf{v}\|^2} > 0.$ Since $\lambda$ was an arbitrary eigenvalue, all eigenvalues of $\mathbf{A}$ are strictly positive. $\square$

Remark. The converse also holds for Hermitian matrices: if all eigenvalues are positive and $\mathbf{A}$ is Hermitian, then $\mathbf{A}$ is positive definite (since $\mathbf{A} = \mathbf{Q}\mathbf{\Lambda}\mathbf{Q}^H$ and $\mathbf{x}^H\mathbf{A}\mathbf{x} = \sum_i \lambda_i |(\mathbf{Q}^H\mathbf{x})_i|^2 > 0$ for $\mathbf{x} \ne \mathbf{0}$ ).

ex-ch01-11

Medium

Let $\mathbf{A} \succeq 0$ and $\mathbf{B} \succeq 0$ (both positive semidefinite). Prove that $\operatorname{tr}(\mathbf{A}\mathbf{B}) \geq 0$ .

Show Hint

Since $\mathbf{B} \succeq 0$ , it has a matrix square root $\mathbf{B}^{1/2} \succeq 0$ such that $\mathbf{B} = \mathbf{B}^{1/2}\mathbf{B}^{1/2}$ .

Use the cyclic property of the trace: $\operatorname{tr}(\mathbf{X}\mathbf{Y}\mathbf{Z}) = \operatorname{tr}(\mathbf{Z}\mathbf{X}\mathbf{Y})$ .

After rearranging, you should arrive at $\operatorname{tr}(\mathbf{A}\mathbf{B}) = \operatorname{tr}(\mathbf{B}^{1/2}\mathbf{A}\mathbf{B}^{1/2})$ , which is the trace of a PSD matrix.

Solution

Matrix square root

Since $\mathbf{B} \succeq 0$ , it has an eigendecomposition $\mathbf{B} = \mathbf{Q}\mathbf{\Lambda}\mathbf{Q}^H$ with $\mathbf{\Lambda} \ge 0$ . Define $\mathbf{B}^{1/2} = \mathbf{Q}\mathbf{\Lambda}^{1/2}\mathbf{Q}^H$ where $\mathbf{\Lambda}^{1/2} = \operatorname{diag}(\sqrt{\lambda_1},\dots,\sqrt{\lambda_n})$ . Then $\mathbf{B}^{1/2} \succeq 0$ and $\mathbf{B}^{1/2}\mathbf{B}^{1/2} = \mathbf{B}$ .

Rewrite using the cyclic property of trace

$\operatorname{tr}(\mathbf{A}\mathbf{B}) = \operatorname{tr}(\mathbf{A}\,\mathbf{B}^{1/2}\mathbf{B}^{1/2}) = \operatorname{tr}(\mathbf{B}^{1/2}\mathbf{A}\,\mathbf{B}^{1/2}),$ $where the last step uses the cyclic property$ \operatorname{tr}(\mathbf{X}\mathbf{Y}\mathbf{Z}) = \operatorname{tr}(\mathbf{Z}\mathbf{X}\mathbf{Y})$.

The product $\mathbf{B}^{1/2}\mathbf{A}\mathbf{B}^{1/2}$ is PSD

For any $\mathbf{x}$ , $\mathbf{x}^H(\mathbf{B}^{1/2}\mathbf{A}\,\mathbf{B}^{1/2})\mathbf{x} = (\mathbf{B}^{1/2}\mathbf{x})^H \mathbf{A}\,(\mathbf{B}^{1/2}\mathbf{x}) \ge 0,$ since $\mathbf{A} \succeq 0$ . Hence $\mathbf{B}^{1/2}\mathbf{A}\,\mathbf{B}^{1/2} \succeq 0$ .

Trace of a PSD matrix is non-negative

The trace of a PSD matrix equals the sum of its (non-negative) eigenvalues, hence $\operatorname{tr}(\mathbf{A}\mathbf{B}) = \operatorname{tr}(\mathbf{B}^{1/2}\mathbf{A}\,\mathbf{B}^{1/2}) \ge 0. \;\square$

ex-ch01-12

Medium

Prove the Eckart–Young theorem: if $\mathbf{A} = \sum_{i=1}^r \sigma_i \mathbf{u}_i \mathbf{v}_i^H$ is the SVD of $\mathbf{A}$ (with $\sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_r > 0$ ), then the best rank- $k$ approximation in Frobenius norm is $\mathbf{A}_k = \sum_{i=1}^k \sigma_i \mathbf{u}_i \mathbf{v}_i^H,$ i.e., $\mathbf{A}_k$ minimizes $\|\mathbf{A} - \mathbf{X}\|_F$ over all matrices $\mathbf{X}$ of rank at most $k$ .

Show Hint

First show that $\|\mathbf{A} - \mathbf{A}_k\|_F^2 = \sum_{i=k+1}^r \sigma_i^2$ by orthogonality of the rank-1 terms in the SVD.

For any rank- $k$ matrix $\mathbf{X}$ , its null space has dimension $\ge n-k$ . Consider the intersection of $\operatorname{null}(\mathbf{X})$ with $\operatorname{span}\{\mathbf{v}_1,\dots,\mathbf{v}_{k+1}\}$ .

By a dimension-counting argument, this intersection contains a unit vector $\mathbf{w}$ , and $\|\mathbf{A}\mathbf{w}\|^2 \ge \sigma_{k+1}^2$ .

Solution

Error of the truncated SVD

Write $\mathbf{A} = \mathbf{A}_k + \mathbf{E}_k$ where $\mathbf{E}_k = \sum_{i=k+1}^r \sigma_i \mathbf{u}_i \mathbf{v}_i^H$ . Since the sets $\{\mathbf{u}_i \mathbf{v}_i^H\}$ are orthonormal in the Frobenius inner product (because $\langle \mathbf{u}_i \mathbf{v}_i^H, \mathbf{u}_j \mathbf{v}_j^H \rangle_F = \operatorname{tr}(\mathbf{v}_i \mathbf{u}_i^H \mathbf{u}_j \mathbf{v}_j^H) = \delta_{ij}$ ), we have $\|\mathbf{A} - \mathbf{A}_k\|_F^2 = \sum_{i=k+1}^r \sigma_i^2.$

Lower bound for any rank-$k$ approximation

Let $\mathbf{X}$ be any matrix with $\operatorname{rank}(\mathbf{X}) \le k$ . Then $\dim(\operatorname{null}(\mathbf{X})) \ge n - k$ .

Consider the subspace $\mathcal{V}_{k+1} = \operatorname{span}\{\mathbf{v}_1, \mathbf{v}_2, \dots, \mathbf{v}_{k+1}\}$ , which has dimension $k+1$ .

By the dimension formula for subspace intersections: $\dim(\operatorname{null}(\mathbf{X}) \cap \mathcal{V}_{k+1}) \ge \dim(\operatorname{null}(\mathbf{X})) + \dim(\mathcal{V}_{k+1}) - n \ge (n-k) + (k+1) - n = 1.$ So there exists a unit vector $\mathbf{w} \in \operatorname{null}(\mathbf{X}) \cap \mathcal{V}_{k+1}$ .

Bound using this vector

Write $\mathbf{w} = \sum_{i=1}^{k+1} c_i \mathbf{v}_i$ with $\sum_{i=1}^{k+1} |c_i|^2 = 1$ . Since $\mathbf{X}\mathbf{w} = \mathbf{0}$ : $\|\mathbf{A} - \mathbf{X}\|_F^2 \ge \|(\mathbf{A} - \mathbf{X})\mathbf{w}\|^2 = \|\mathbf{A}\mathbf{w}\|^2.$ Now compute $\mathbf{A}\mathbf{w}$ : since $\mathbf{A}\mathbf{v}_i = \sigma_i \mathbf{u}_i$ , $\|\mathbf{A}\mathbf{w}\|^2 = \left\|\sum_{i=1}^{k+1} c_i \sigma_i \mathbf{u}_i\right\|^2 = \sum_{i=1}^{k+1} |c_i|^2 \sigma_i^2 \ge \sigma_{k+1}^2 \sum_{i=1}^{k+1} |c_i|^2 = \sigma_{k+1}^2.$

Summing the lower bounds

A more refined version of the argument applies to the sum of the bottom singular values. Consider the $n-k$ dimensional null space of $\mathbf{X}$ and the subspaces $\operatorname{span}\{\mathbf{v}_1,\dots,\mathbf{v}_j\}$ for $j = k+1,\dots,r$ . By applying the Courant–Fischer minimax characterization, one can show $\|\mathbf{A} - \mathbf{X}\|_F^2 \ge \sum_{i=k+1}^r \sigma_i^2.$ Since the truncated SVD $\mathbf{A}_k$ achieves this lower bound (from Step 1), it is optimal: $\mathbf{A}_k = \arg\min_{\operatorname{rank}(\mathbf{X}) \le k} \|\mathbf{A} - \mathbf{X}\|_F. \;\square$

ex-ch01-13

Medium

(Water-filling) Consider the optimization problem $\max_{p_1,\dots,p_n} \;\sum_{i=1}^n \log(1 + p_i \lambda_i) \quad\text{subject to}\quad \sum_{i=1}^n p_i = P,\;\; p_i \ge 0,$ where $\lambda_i > 0$ are given constants and $P > 0$ is a total power budget. Derive the water-filling solution using the KKT conditions.

Show Hint

Form the Lagrangian $\mathcal{L} = \sum_i \log(1+p_i\lambda_i) - \mu(\sum_i p_i - P) + \sum_i \nu_i p_i$ .

The KKT stationarity condition gives $\frac{\lambda_i}{1+p_i\lambda_i} = \mu - \nu_i$ . Use complementary slackness $\nu_i p_i = 0$ .

When $p_i > 0$ , we get $p_i = 1/\mu - 1/\lambda_i$ . The water level $1/\mu$ is chosen so that $\sum p_i = P$ .

Solution

Form the Lagrangian

The Lagrangian is $\mathcal{L}(\mathbf{p}, \mu, \boldsymbol{\nu}) = \sum_{i=1}^n \log(1 + p_i \lambda_i) - \mu\!\left(\sum_{i=1}^n p_i - P\right) + \sum_{i=1}^n \nu_i p_i,$ where $\mu \ge 0$ is the multiplier for the power constraint and $\nu_i \ge 0$ are the multipliers for $p_i \ge 0$ .

KKT stationarity

Differentiating with respect to $p_i$ and setting to zero: $\frac{\partial \mathcal{L}}{\partial p_i} = \frac{\lambda_i}{1 + p_i \lambda_i} - \mu + \nu_i = 0,$ which gives $\frac{\lambda_i}{1 + p_i \lambda_i} = \mu - \nu_i. \tag{1}$

Complementary slackness

The KKT complementary slackness conditions require $\nu_i p_i = 0$ for each $i$ . There are two cases:

Case 1: $p_i > 0$ . Then $\nu_i = 0$ , so from (1): $\frac{\lambda_i}{1 + p_i \lambda_i} = \mu \;\;\Longrightarrow\;\; 1 + p_i \lambda_i = \frac{\lambda_i}{\mu} \;\;\Longrightarrow\;\; p_i = \frac{1}{\mu} - \frac{1}{\lambda_i}.$

Case 2: $p_i = 0$ . Then $\nu_i \ge 0$ and from (1): $\lambda_i = \mu - \nu_i \le \mu$ , i.e., $1/\mu \le 1/\lambda_i$ .

Combining both cases with the non-negativity requirement: $p_i = \left(\frac{1}{\mu} - \frac{1}{\lambda_i}\right)^{\!+},$ where $(x)^+ = \max(x, 0)$ .

Water level interpretation

Define the water level $\eta = 1/\mu$ . The solution becomes $p_i = \left(\eta - \frac{1}{\lambda_i}\right)^{\!+}.$ The water level $\eta$ is determined by the power constraint $\sum_{i=1}^n \left(\eta - \frac{1}{\lambda_i}\right)^{\!+} = P.$ Geometrically, one "pours water" of total volume $P$ over steps of height $1/\lambda_i$ : channels with small $\lambda_i$ (high step) receive no power, while channels with large $\lambda_i$ (low step) receive more power. The water surface level $\eta$ is uniform across all active channels. $\square$

ex-ch01-14

Medium

Show that the function $f(\mathbf{A}) = \log\det(\mathbf{A})$ is concave on the cone of positive definite matrices. That is, for any $\mathbf{A} \succ 0$ , $\mathbf{B} \succ 0$ , and $t \in [0,1]$ : $f(t\mathbf{A} + (1-t)\mathbf{B}) \ge t\,f(\mathbf{A}) + (1-t)\,f(\mathbf{B}).$

Show Hint

Factor out $\mathbf{A}$ : write $t\mathbf{A} + (1-t)\mathbf{B} = \mathbf{A}^{1/2}(t\mathbf{I} + (1-t)\mathbf{A}^{-1/2}\mathbf{B}\mathbf{A}^{-1/2})\mathbf{A}^{1/2}$ .

Use $\log\det$ on a product: $\log\det(\mathbf{X}\mathbf{Y}) = \log\det(\mathbf{X}) + \log\det(\mathbf{Y})$ .

Reduce to proving concavity of $g(t) = \sum_i \log(t + (1-t)\mu_i)$ where $\mu_i$ are eigenvalues of $\mathbf{A}^{-1/2}\mathbf{B}\mathbf{A}^{-1/2}$ , and use the concavity of $\log$ .

Solution

Reduce to a simpler problem

Write $\mathbf{C} = \mathbf{A}^{-1/2}\mathbf{B}\,\mathbf{A}^{-1/2}$ , which is positive definite with eigenvalues $\mu_1,\dots,\mu_n > 0$ . Then $t\mathbf{A} + (1-t)\mathbf{B} = \mathbf{A}^{1/2}\bigl(t\mathbf{I} + (1-t)\mathbf{C}\bigr)\mathbf{A}^{1/2},$ so $\log\det\bigl(t\mathbf{A}+(1-t)\mathbf{B}\bigr) = \log\det(\mathbf{A}) + \log\det\bigl(t\mathbf{I}+(1-t)\mathbf{C}\bigr).$

Express in terms of eigenvalues

Since $\mathbf{C}$ has eigenvalues $\mu_i$ , the matrix $t\mathbf{I} + (1-t)\mathbf{C}$ has eigenvalues $t + (1-t)\mu_i$ , so $\log\det\bigl(t\mathbf{I}+(1-t)\mathbf{C}\bigr) = \sum_{i=1}^n \log\bigl(t+(1-t)\mu_i\bigr).$ Also, $\log\det(\mathbf{B}) - \log\det(\mathbf{A}) = \log\det(\mathbf{C})= \sum_i \log \mu_i$ .

Apply concavity of $\log$

By the concavity of $\log$ on $(0,\infty)$ : $\log\bigl(t \cdot 1 + (1-t)\cdot\mu_i\bigr) \ge t\log(1) + (1-t)\log(\mu_i) = (1-t)\log\mu_i.$ Summing over $i$ : $\sum_{i=1}^n \log\bigl(t+(1-t)\mu_i\bigr) \ge (1-t)\sum_{i=1}^n \log\mu_i = (1-t)\log\det(\mathbf{C}).$

Combine to get the concavity inequality

$\log\det\bigl(t\mathbf{A}+(1-t)\mathbf{B}\bigr) = \log\det(\mathbf{A}) + \sum_i \log(t+(1-t)\mu_i)KATEXPLACEHOLDER0END\ge \log\det(\mathbf{A}) + (1-t)\log\det(\mathbf{C})KATEXPLACEHOLDER1END= \log\det(\mathbf{A}) + (1-t)\bigl(\log\det(\mathbf{B})-\log\det(\mathbf{A})\bigr)KATEXPLACEHOLDER2END= t\log\det(\mathbf{A}) + (1-t)\log\det(\mathbf{B}).$ $This is exactly$ f(t\mathbf{A}+(1-t)\mathbf{B}) \ge tf(\mathbf{A}) + (1-t)f(\mathbf{B}) $, proving concavity.$ \square$

ex-ch01-15

Medium

Prove the mixed-product property of Kronecker products: if $\mathbf{A}$ , $\mathbf{B}$ , $\mathbf{C}$ , $\mathbf{D}$ are matrices of compatible dimensions (so that $\mathbf{AC}$ and $\mathbf{BD}$ exist), then $(\mathbf{A} \otimes \mathbf{B})(\mathbf{C} \otimes \mathbf{D}) = (\mathbf{AC}) \otimes (\mathbf{BD}).$

Show Hint

Use the block structure of the Kronecker product: if $\mathbf{A}$ is $m \times n$ , then $\mathbf{A}\otimes\mathbf{B}$ is the block matrix with $(i,j)$ -block $a_{ij}\mathbf{B}$ .

Compute the $(i,j)$ -block of the product $(\mathbf{A}\otimes\mathbf{B})(\mathbf{C}\otimes\mathbf{D})$ using block matrix multiplication.

You should get $\sum_k a_{ik} c_{kj} \mathbf{B}\mathbf{D}$ , which is exactly the $(i,j)$ -block of $(\mathbf{AC})\otimes(\mathbf{BD})$ .

Solution

Block structure of Kronecker products

Let $\mathbf{A} \in \mathbb{C}^{m \times n}$ , $\mathbf{B} \in \mathbb{C}^{p \times q}$ , $\mathbf{C} \in \mathbb{C}^{n \times s}$ , $\mathbf{D} \in \mathbb{C}^{q \times t}$ .

The Kronecker product $\mathbf{A} \otimes \mathbf{B}$ is the $mp \times nq$ block matrix whose $(i,k)$ -block (each block being $p \times q$ ) is $a_{ik}\mathbf{B}$ : $(\mathbf{A}\otimes\mathbf{B})_{ik} = a_{ik}\mathbf{B}, \quad i=1,\dots,m,\; k=1,\dots,n.$ Similarly, $(\mathbf{C}\otimes\mathbf{D})_{kj} = c_{kj}\mathbf{D}$ for $k=1,\dots,n$ and $j=1,\dots,s$ .

Block multiplication

The $(i,j)$ -block of the product $(\mathbf{A}\otimes\mathbf{B})(\mathbf{C}\otimes\mathbf{D})$ is obtained by multiplying the $i$ -th block row of $\mathbf{A}\otimes\mathbf{B}$ with the $j$ -th block column of $\mathbf{C}\otimes\mathbf{D}$ : $\bigl[(\mathbf{A}\otimes\mathbf{B})(\mathbf{C}\otimes\mathbf{D})\bigr]_{ij} = \sum_{k=1}^n (a_{ik}\mathbf{B})(c_{kj}\mathbf{D}) = \sum_{k=1}^n a_{ik} c_{kj}\,\mathbf{B}\mathbf{D} = \left(\sum_{k=1}^n a_{ik} c_{kj}\right)\mathbf{B}\mathbf{D}.$

Identify as the Kronecker product

Recall that $(\mathbf{AC})_{ij} = \sum_{k=1}^n a_{ik}c_{kj}$ . Therefore the $(i,j)$ -block above equals $(\mathbf{AC})_{ij}\cdot\mathbf{BD}$ , which is precisely the $(i,j)$ -block of $(\mathbf{AC})\otimes(\mathbf{BD})$ .

Since the block structures match for all $i,j$ : $(\mathbf{A}\otimes\mathbf{B})(\mathbf{C}\otimes\mathbf{D}) = (\mathbf{AC})\otimes(\mathbf{BD}). \;\square$

ex-ch01-16

Hard

Prove the spectral theorem for Hermitian matrices from scratch: every Hermitian matrix $\mathbf{A} \in \mathbb{C}^{n \times n}$ (i.e., $\mathbf{A} = \mathbf{A}^H$ ) can be written as $\mathbf{A} = \mathbf{Q}\mathbf{\Lambda}\mathbf{Q}^H,$ where $\mathbf{Q}$ is unitary and $\mathbf{\Lambda}$ is real diagonal. Use induction on the matrix size $n$ .

Show Hint

Base case $n=1$ : a $1 \times 1$ Hermitian matrix is just a real scalar.

For the inductive step, first show that all eigenvalues of a Hermitian matrix are real (use $\mathbf{v}^H\mathbf{A}\mathbf{v} = \overline{\mathbf{v}^H\mathbf{A}\mathbf{v}}$ ). Then let $\lambda_1, \mathbf{q}_1$ be an eigenvalue/eigenvector pair.

Extend $\mathbf{q}_1$ to an orthonormal basis and apply a unitary change of basis to reduce $\mathbf{A}$ to a block form with an $(n{-}1)\times(n{-}1)$ Hermitian sub-block. Then apply the induction hypothesis.

Solution

Eigenvalues are real

Let $\lambda$ be an eigenvalue with eigenvector $\mathbf{v} \ne \mathbf{0}$ : $\mathbf{A}\mathbf{v} = \lambda\mathbf{v}$ . Then $\lambda\,\mathbf{v}^H\mathbf{v} = \mathbf{v}^H\mathbf{A}\mathbf{v}.$ Taking the conjugate: $\overline{\lambda\,\mathbf{v}^H\mathbf{v}} = \overline{\mathbf{v}^H\mathbf{A}\mathbf{v}} = (\mathbf{v}^H\mathbf{A}\mathbf{v})^H = \mathbf{v}^H\mathbf{A}^H\mathbf{v} = \mathbf{v}^H\mathbf{A}\mathbf{v},$ where we used $\mathbf{A} = \mathbf{A}^H$ . Thus $\bar{\lambda}\|\mathbf{v}\|^2 = \lambda\|\mathbf{v}\|^2$ , and since $\|\mathbf{v}\|^2 > 0$ , we conclude $\bar{\lambda} = \lambda$ , i.e., $\lambda \in \mathbb{R}$ .

Existence of at least one eigenvalue

Every matrix over $\mathbb{C}$ has at least one eigenvalue (the characteristic polynomial of degree $n$ has at least one root in $\mathbb{C}$ by the fundamental theorem of algebra). Combined with the result above, $\mathbf{A}$ has at least one real eigenvalue $\lambda_1$ with a unit eigenvector $\mathbf{q}_1$ .

Base case ($n = 1$)

If $n = 1$ , then $\mathbf{A} = [a]$ with $\bar{a} = a$ , so $a \in \mathbb{R}$ . We can write $\mathbf{A} = [1][a][1]^H = \mathbf{Q}\mathbf{\Lambda}\mathbf{Q}^H$ with $\mathbf{Q} = [1]$ (trivially unitary) and $\mathbf{\Lambda} = [a]$ .

Inductive step: unitary reduction

Assume the theorem holds for all $(n{-}1) \times (n{-}1)$ Hermitian matrices. Let $\mathbf{A}$ be $n \times n$ Hermitian with eigenvalue $\lambda_1$ and unit eigenvector $\mathbf{q}_1$ .

Extend $\mathbf{q}_1$ to an orthonormal basis $\{\mathbf{q}_1, \mathbf{w}_2, \dots, \mathbf{w}_n\}$ of $\mathbb{C}^n$ (e.g., via Gram–Schmidt). Form the unitary matrix $\mathbf{U}_1 = \begin{bmatrix} \mathbf{q}_1 & \mathbf{w}_2 & \cdots & \mathbf{w}_n \end{bmatrix}.$ Then $\mathbf{U}_1^H \mathbf{A}\,\mathbf{U}_1 = \begin{bmatrix} \mathbf{q}_1^H\mathbf{A}\mathbf{q}_1 & \mathbf{q}_1^H\mathbf{A}\mathbf{W} \\ \mathbf{W}^H\mathbf{A}\mathbf{q}_1 & \mathbf{W}^H\mathbf{A}\mathbf{W} \end{bmatrix} = \begin{bmatrix} \lambda_1 & \mathbf{b}^H \\ \mathbf{b} & \mathbf{A}' \end{bmatrix},$ where $\mathbf{W} = [\mathbf{w}_2 \;\cdots\; \mathbf{w}_n]$ , $\mathbf{b} = \mathbf{W}^H\mathbf{A}\mathbf{q}_1 = \lambda_1 \mathbf{W}^H\mathbf{q}_1 = \mathbf{0}$ (since $\mathbf{q}_1 \perp \mathbf{w}_i$ ), and $\mathbf{A}' = \mathbf{W}^H\mathbf{A}\mathbf{W}$ .

Since $\mathbf{b} = \mathbf{0}$ : $\mathbf{U}_1^H \mathbf{A}\,\mathbf{U}_1 = \begin{bmatrix} \lambda_1 & \mathbf{0}^H \\ \mathbf{0} & \mathbf{A}' \end{bmatrix}.$

The sub-block is Hermitian

$\mathbf{A}' = \mathbf{W}^H\mathbf{A}\mathbf{W}$ is $(n{-}1)\times(n{-}1)$ and Hermitian: $(\mathbf{A}')^H = (\mathbf{W}^H\mathbf{A}\mathbf{W})^H = \mathbf{W}^H\mathbf{A}^H\mathbf{W} = \mathbf{W}^H\mathbf{A}\mathbf{W} = \mathbf{A}'.$

Apply the induction hypothesis

By the induction hypothesis, $\mathbf{A}' = \mathbf{Q}'\mathbf{\Lambda}'\mathbf{Q}'^H$ where $\mathbf{Q}'$ is $(n{-}1)\times(n{-}1)$ unitary and $\mathbf{\Lambda}'$ is real diagonal. Therefore $\mathbf{U}_1^H \mathbf{A}\,\mathbf{U}_1 = \begin{bmatrix} \lambda_1 & \mathbf{0}^H \\ \mathbf{0} & \mathbf{Q}'\mathbf{\Lambda}'\mathbf{Q}'^H \end{bmatrix} = \begin{bmatrix} 1 & \mathbf{0}^H \\ \mathbf{0} & \mathbf{Q}' \end{bmatrix} \begin{bmatrix} \lambda_1 & \mathbf{0}^H \\ \mathbf{0} & \mathbf{\Lambda}' \end{bmatrix} \begin{bmatrix} 1 & \mathbf{0}^H \\ \mathbf{0} & \mathbf{Q}' \end{bmatrix}^H.$ Define $\mathbf{U}_2 = \begin{bmatrix} 1 & \mathbf{0}^H \\ \mathbf{0} & \mathbf{Q}' \end{bmatrix}$ (unitary) and $\mathbf{\Lambda} = \begin{bmatrix} \lambda_1 & \\ & \mathbf{\Lambda}' \end{bmatrix}$ (real diagonal). Then $\mathbf{A} = \mathbf{U}_1 \mathbf{U}_2 \mathbf{\Lambda}\,\mathbf{U}_2^H \mathbf{U}_1^H = \mathbf{Q}\mathbf{\Lambda}\mathbf{Q}^H,$ where $\mathbf{Q} = \mathbf{U}_1\mathbf{U}_2$ is unitary (product of unitary matrices is unitary). $\square$

ex-ch01-17

Hard

Prove Hadamard's inequality using Fischer's inequality. Specifically:

(a) First prove Fischer's inequality: if $\mathbf{A} \succ 0$ is partitioned as $\mathbf{A} = \begin{bmatrix} \mathbf{A}_{11} & \mathbf{A}_{12} \\ \mathbf{A}_{21} & \mathbf{A}_{22} \end{bmatrix},$ then $\det(\mathbf{A}) \le \det(\mathbf{A}_{11})\det(\mathbf{A}_{22})$ .

(b) Then apply Fischer's inequality iteratively to obtain Hadamard's inequality: $\det(\mathbf{A}) \le \prod_{i=1}^n a_{ii}.$

Show Hint

For Fischer's inequality, use the Schur complement: $\det(\mathbf{A}) = \det(\mathbf{A}_{11})\det(\mathbf{A}_{22} - \mathbf{A}_{21}\mathbf{A}_{11}^{-1}\mathbf{A}_{12})$ .

The Schur complement $\mathbf{S} = \mathbf{A}_{22} - \mathbf{A}_{21}\mathbf{A}_{11}^{-1}\mathbf{A}_{12}$ satisfies $\mathbf{S} \preceq \mathbf{A}_{22}$ , so $\det(\mathbf{S}) \le \det(\mathbf{A}_{22})$ .

For part (b), repeatedly split off one row/column at a time.

Solution

(a) Schur complement factorization

Since $\mathbf{A} \succ 0$ , the leading principal sub-block $\mathbf{A}_{11}$ is also positive definite, hence invertible. The Schur complement of $\mathbf{A}_{11}$ in $\mathbf{A}$ is $\mathbf{S} = \mathbf{A}_{22} - \mathbf{A}_{21}\mathbf{A}_{11}^{-1}\mathbf{A}_{12}.$ The block LDU factorization gives $\mathbf{A} = \begin{bmatrix} \mathbf{I} & \mathbf{0} \\ \mathbf{A}_{21}\mathbf{A}_{11}^{-1} & \mathbf{I} \end{bmatrix} \begin{bmatrix} \mathbf{A}_{11} & \mathbf{0} \\ \mathbf{0} & \mathbf{S} \end{bmatrix} \begin{bmatrix} \mathbf{I} & \mathbf{A}_{11}^{-1}\mathbf{A}_{12} \\ \mathbf{0} & \mathbf{I} \end{bmatrix}.$ Taking determinants (the outer block-triangular matrices have unit determinant): $\det(\mathbf{A}) = \det(\mathbf{A}_{11})\det(\mathbf{S}). \tag{$\star$}$

(a) Bounding the Schur complement

Since $\mathbf{A}_{11} \succ 0$ , we can write $\mathbf{A}_{21}\mathbf{A}_{11}^{-1}\mathbf{A}_{12} = \mathbf{A}_{21}\mathbf{A}_{11}^{-1}\mathbf{A}_{21}^H \succeq 0$ (using $\mathbf{A}_{12} = \mathbf{A}_{21}^H$ since $\mathbf{A}$ is Hermitian). Therefore $\mathbf{S} = \mathbf{A}_{22} - \mathbf{A}_{21}\mathbf{A}_{11}^{-1}\mathbf{A}_{21}^H \preceq \mathbf{A}_{22}.$ Since $\mathbf{A} \succ 0$ implies $\mathbf{S} \succ 0$ , and $\mathbf{S} \preceq \mathbf{A}_{22}$ , all eigenvalues of $\mathbf{A}_{22}^{-1/2}\mathbf{S}\,\mathbf{A}_{22}^{-1/2}$ lie in $(0, 1]$ . Therefore $\det(\mathbf{S}) \le \det(\mathbf{A}_{22}).$

(a) Fischer's inequality

Combining $(\star)$ with the bound on $\det(\mathbf{S})$ : $\det(\mathbf{A}) = \det(\mathbf{A}_{11})\det(\mathbf{S}) \le \det(\mathbf{A}_{11})\det(\mathbf{A}_{22}). \;\square$

(b) Iterative application: Hadamard's inequality

We apply Fischer's inequality iteratively. Partition $\mathbf{A}$ by splitting off the first row and column: $\mathbf{A} = \begin{bmatrix} a_{11} & \mathbf{a}_{12}^H \\ \mathbf{a}_{12} & \mathbf{A}_{[2:n]} \end{bmatrix},$ where $\mathbf{A}_{[2:n]}$ is the $(n{-}1)\times(n{-}1)$ lower-right sub-matrix. By Fischer's inequality: $\det(\mathbf{A}) \le a_{11} \cdot \det(\mathbf{A}_{[2:n]}).$ Now apply Fischer's inequality to $\mathbf{A}_{[2:n]}$ by splitting off its first row/column (which corresponds to $a_{22}$ ): $\det(\mathbf{A}_{[2:n]}) \le a_{22} \cdot \det(\mathbf{A}_{[3:n]}).$ Continuing inductively: $\det(\mathbf{A}) \le a_{11} \cdot a_{22} \cdots a_{nn} = \prod_{i=1}^n a_{ii}.$ This is Hadamard's inequality. Equality holds if and only if $\mathbf{A}$ is diagonal. $\square$

ex-ch01-18

Hard

Let $\mathbf{H} \in \mathbb{C}^{n_r \times n_t}$ be a MIMO channel matrix with SVD $\mathbf{H} = \mathbf{U}\mathbf{\Sigma}\mathbf{V}^H$ . Prove that the MIMO capacity $C = \max_{\mathbf{Q} \succeq 0,\; \operatorname{tr}(\mathbf{Q}) \le P} \log\det\!\left(\mathbf{I}_{n_r} + \mathbf{H}\mathbf{Q}\mathbf{H}^H\right)$ is achieved by $\mathbf{Q}^\star = \mathbf{V}\mathbf{P}\mathbf{V}^H$ where $\mathbf{P} = \operatorname{diag}(p_1,\dots,p_{n_t})$ with $p_i \ge 0$ , reducing the capacity to $C = \sum_{i=1}^{\min(n_r,n_t)} \log(1 + p_i \sigma_i^2),$ where the $p_i$ satisfy the water-filling solution from Exercise 13 with $\lambda_i = \sigma_i^2$ .

Show Hint

Substitute the SVD $\mathbf{H} = \mathbf{U}\mathbf{\Sigma}\mathbf{V}^H$ into the objective and use $\log\det(\mathbf{I} + \mathbf{U}\mathbf{\Sigma}\mathbf{V}^H\mathbf{Q}\mathbf{V}\mathbf{\Sigma}^H\mathbf{U}^H) = \log\det(\mathbf{I} + \mathbf{\Sigma}\widetilde{\mathbf{Q}}\mathbf{\Sigma}^H)$ where $\widetilde{\mathbf{Q}} = \mathbf{V}^H\mathbf{Q}\mathbf{V}$ .

Use Hadamard's inequality: $\det(\mathbf{I} + \mathbf{\Sigma}\widetilde{\mathbf{Q}}\mathbf{\Sigma}^H) \le \prod_i (1 + \sigma_i^2 \widetilde{q}_{ii})$ , with equality when $\widetilde{\mathbf{Q}}$ is diagonal.

The constraint $\operatorname{tr}(\mathbf{Q}) = \operatorname{tr}(\widetilde{\mathbf{Q}}) \le P$ is preserved under unitary conjugation, and $\widetilde{\mathbf{Q}} \succeq 0$ if and only if $\mathbf{Q} \succeq 0$ .

Solution

Substitute the SVD

Let $\mathbf{H} = \mathbf{U}\mathbf{\Sigma}\mathbf{V}^H$ where $\mathbf{U} \in \mathbb{C}^{n_r \times n_r}$ and $\mathbf{V} \in \mathbb{C}^{n_t \times n_t}$ are unitary, and $\mathbf{\Sigma} \in \mathbb{R}^{n_r \times n_t}$ has diagonal entries $\sigma_1 \ge \cdots \ge \sigma_{\min(n_r,n_t)} \ge 0$ . Substituting: $\mathbf{H}\mathbf{Q}\mathbf{H}^H = \mathbf{U}\mathbf{\Sigma}\mathbf{V}^H \mathbf{Q}\,\mathbf{V}\mathbf{\Sigma}^H\mathbf{U}^H = \mathbf{U}\mathbf{\Sigma}\,\widetilde{\mathbf{Q}}\,\mathbf{\Sigma}^H\mathbf{U}^H,$ where $\widetilde{\mathbf{Q}} = \mathbf{V}^H\mathbf{Q}\mathbf{V}$ .

Simplify the objective using $\det(\mathbf{I} + \mathbf{A}\mathbf{B}) = \det(\mathbf{I} + \mathbf{B}\mathbf{A})$

$\log\det(\mathbf{I}_{n_r} + \mathbf{H}\mathbf{Q}\mathbf{H}^H) = \log\det(\mathbf{I}_{n_r} + \mathbf{U}\mathbf{\Sigma}\widetilde{\mathbf{Q}}\mathbf{\Sigma}^H\mathbf{U}^H).KATEXPLACEHOLDER0END= \log\det(\mathbf{I}_{n_r} + \mathbf{\Sigma}\widetilde{\mathbf{Q}}\mathbf{\Sigma}^H).$ $

Constraint preservation

Since $\mathbf{V}$ is unitary, the map $\mathbf{Q} \mapsto \widetilde{\mathbf{Q}} = \mathbf{V}^H\mathbf{Q}\mathbf{V}$ is a bijection on the PSD cone. Moreover:

$\widetilde{\mathbf{Q}} \succeq 0 \iff \mathbf{Q} \succeq 0$ ,
$\operatorname{tr}(\widetilde{\mathbf{Q}}) = \operatorname{tr}(\mathbf{V}^H\mathbf{Q}\mathbf{V}) = \operatorname{tr}(\mathbf{Q}\mathbf{V}\mathbf{V}^H) = \operatorname{tr}(\mathbf{Q})$ .

So the optimization over $\mathbf{Q}$ is equivalent to optimizing over $\widetilde{\mathbf{Q}}$ : $C = \max_{\widetilde{\mathbf{Q}} \succeq 0,\;\operatorname{tr}(\widetilde{\mathbf{Q}}) \le P} \log\det(\mathbf{I} + \mathbf{\Sigma}\widetilde{\mathbf{Q}}\mathbf{\Sigma}^H).$

Apply Hadamard's inequality

Let $r = \min(n_r, n_t)$ . The matrix $\mathbf{I}_{n_r} + \mathbf{\Sigma}\widetilde{\mathbf{Q}}\mathbf{\Sigma}^H$ is positive definite. Its $(i,i)$ diagonal entry (for $i = 1,\dots,r$ ) is $1 + \sigma_i^2 \widetilde{q}_{ii}$ , and for $i > r$ it is $1$ . By Hadamard's inequality: $\det(\mathbf{I} + \mathbf{\Sigma}\widetilde{\mathbf{Q}}\mathbf{\Sigma}^H) \le \prod_{i=1}^{r} (1 + \sigma_i^2 \widetilde{q}_{ii}).$ Equality holds when $\mathbf{I} + \mathbf{\Sigma}\widetilde{\mathbf{Q}}\mathbf{\Sigma}^H$ is diagonal, which occurs when $\widetilde{\mathbf{Q}}$ is diagonal (since $\mathbf{\Sigma}$ is diagonal).

Diagonal $\widetilde{\mathbf{Q}}$ is optimal

Furthermore, for any $\widetilde{\mathbf{Q}} \succeq 0$ with diagonal entries $\widetilde{q}_{ii}$ , the diagonal matrix $\widetilde{\mathbf{Q}}_{\text{diag}} = \operatorname{diag}(\widetilde{q}_{11},\dots,\widetilde{q}_{nn})$ satisfies:

$\widetilde{\mathbf{Q}}_{\text{diag}} \succeq 0$ (diagonal entries of a PSD matrix are non-negative),
$\operatorname{tr}(\widetilde{\mathbf{Q}}_{\text{diag}}) = \operatorname{tr}(\widetilde{\mathbf{Q}})$ ,
$\det(\mathbf{I} + \mathbf{\Sigma}\widetilde{\mathbf{Q}}_{\text{diag}}\mathbf{\Sigma}^H) = \prod_i (1+\sigma_i^2 \widetilde{q}_{ii}) \ge \det(\mathbf{I} + \mathbf{\Sigma}\widetilde{\mathbf{Q}}\mathbf{\Sigma}^H)$ .

So we may restrict to diagonal $\widetilde{\mathbf{Q}}$ without loss of optimality. Setting $p_i = \widetilde{q}_{ii}$ : $C = \max_{p_i \ge 0,\; \sum_i p_i \le P} \sum_{i=1}^{r}\log(1 + p_i\sigma_i^2).$

Recover the optimal $\mathbf{Q}$ and water-filling

The optimal $\widetilde{\mathbf{Q}}^\star = \mathbf{P} = \operatorname{diag}(p_1,\dots,p_{n_t})$ maps back to $\mathbf{Q}^\star = \mathbf{V}\mathbf{P}\mathbf{V}^H.$ This means the transmitter beamforms along the right singular vectors of the channel.

The scalar optimization over $\{p_i\}$ is exactly the water-filling problem from Exercise 13 with $\lambda_i = \sigma_i^2$ . The solution is $p_i = \left(\eta - \frac{1}{\sigma_i^2}\right)^{\!+},$ where the water level $\eta$ is chosen so that $\sum_i p_i = P$ . The resulting capacity is $C = \sum_{i=1}^{r} \log(1 + p_i^\star \sigma_i^2). \;\square$

ex-ch01-19

Challenge

Power Iteration Algorithm.

Implement the power iteration algorithm in Python to find the dominant eigenvalue and eigenvector of a real symmetric matrix.

Generate a random $5 \times 5$ symmetric matrix $\mathbf{A} = \mathbf{M} + \mathbf{M}^T$ where $\mathbf{M}$ has i.i.d. $\mathcal{N}(0,1)$ entries.
Starting from a random unit vector $\mathbf{q}^{(0)}$ , iterate: $\mathbf{z}^{(k+1)} = \mathbf{A}\mathbf{q}^{(k)}, \qquad \mathbf{q}^{(k+1)} = \frac{\mathbf{z}^{(k+1)}}{\|\mathbf{z}^{(k+1)}\|}, \qquad \lambda^{(k+1)} = (\mathbf{q}^{(k+1)})^T \mathbf{A}\,\mathbf{q}^{(k+1)}.$
Plot $|\lambda^{(k)} - \lambda_1|$ versus $k$ on a log scale, where $\lambda_1$ is the true dominant eigenvalue (from numpy.linalg.eigh).
Verify that the convergence rate is $O(|\lambda_2/\lambda_1|^{2k})$ .

Reference implementation: ch01/python/power_iteration.py.

Show Hint

Use numpy.random.randn(5,5) for $\mathbf{M}$ , then symmetrize. Use numpy.linalg.eigh for the ground-truth eigenvalues.

The Rayleigh quotient $\lambda^{(k)} = (\mathbf{q}^{(k)})^T\mathbf{A}\mathbf{q}^{(k)}$ converges to $\lambda_1$ at rate $|\lambda_2/\lambda_1|^{2k}$ (quadratic in the eigenvalue ratio per iteration for the Rayleigh quotient, versus linear for the eigenvector).

On a semilog plot, the error should appear as an approximately straight line with slope $\approx 2\log|\lambda_2/\lambda_1|$ .

Solution

Algorithm description

The power iteration exploits the fact that for a symmetric matrix with eigenvalues $|\lambda_1| > |\lambda_2| \ge \cdots \ge |\lambda_n|$ and corresponding orthonormal eigenvectors $\mathbf{v}_1,\dots,\mathbf{v}_n$ , the iteration $\mathbf{q}^{(k)} = \mathbf{A}^k\mathbf{q}^{(0)}/\|\mathbf{A}^k\mathbf{q}^{(0)}\|$ converges to $\pm\mathbf{v}_1$ as long as $\mathbf{q}^{(0)}$ has a nonzero component along $\mathbf{v}_1$ .

Expanding $\mathbf{q}^{(0)} = \sum_i c_i \mathbf{v}_i$ with $c_1 \ne 0$ : $\mathbf{A}^k \mathbf{q}^{(0)} = \sum_i c_i \lambda_i^k \mathbf{v}_i = c_1\lambda_1^k\left(\mathbf{v}_1 + \sum_{i \ge 2} \frac{c_i}{c_1}\left(\frac{\lambda_i}{\lambda_1}\right)^k \mathbf{v}_i\right).$ Since $|\lambda_i/\lambda_1| < 1$ for $i \ge 2$ , the sum vanishes as $k \to \infty$ .

Convergence rate analysis

The error in the eigenvector direction decays as $|\lambda_2/\lambda_1|^k$ .

For the Rayleigh quotient, the convergence is faster. Let $\mathbf{q}^{(k)} = \mathbf{v}_1 + \epsilon^{(k)}$ where $\|\epsilon^{(k)}\| = O(|\lambda_2/\lambda_1|^k)$ . Then $\lambda^{(k)} = (\mathbf{q}^{(k)})^T\mathbf{A}\mathbf{q}^{(k)} = \lambda_1 + O(\|\epsilon^{(k)}\|^2) = \lambda_1 + O(|\lambda_2/\lambda_1|^{2k}).$ So the eigenvalue estimate converges at rate $|\lambda_2/\lambda_1|^{2k}$ — twice as fast on a log scale.

Implementation outline

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42)
M = np.random.randn(5, 5)
A = M + M.T  # symmetric

# Ground truth
eigvals, eigvecs = np.linalg.eigh(A)
lambda1 = eigvals[np.argmax(np.abs(eigvals))]

# Power iteration
q = np.random.randn(5)
q = q / np.linalg.norm(q)

errors = []
n_iter = 50
for k in range(n_iter):
    z = A @ q
    q = z / np.linalg.norm(z)
    lam_k = q.T @ A @ q
    errors.append(abs(lam_k - lambda1))

# Plot
plt.semilogy(range(n_iter), errors, 'b-o', markersize=3)
plt.xlabel('Iteration k')
plt.ylabel(r' $|\lambda^{(k)} - \lambda_1|$ ')
plt.title('Power Iteration Convergence')

# Reference slope
ratio = np.sort(np.abs(eigvals))[-2] / np.max(np.abs(eigvals))
plt.semilogy(range(n_iter),
             errors[0] * ratio**(2*np.arange(n_iter)),
             'r--', label=rf' $|\lambda_2/\lambda_1|^{{2k}}$ , ratio={ratio:.3f}')
plt.legend()
plt.grid(True)
plt.savefig('power_iteration_convergence.png', dpi=150)
plt.show()

Expected output

The semilog plot should show the error $|\lambda^{(k)} - \lambda_1|$ decreasing approximately linearly on the log scale, with slope $\approx 2\log_{10}|\lambda_2/\lambda_1|$ per iteration. The red dashed reference line $|\lambda_2/\lambda_1|^{2k}$ should closely track the blue error curve, confirming the theoretical convergence rate.

If the eigenvalue ratio $|\lambda_2/\lambda_1|$ is close to 1 (eigenvalues nearly equal in magnitude), convergence will be slow. If the ratio is small, convergence is rapid — often reaching machine precision in fewer than 20 iterations.

The complete implementation is available at ch01/python/power_iteration.py. $\square$

ex-ch01-20

Challenge

SVD-Based Image Compression.

Implement SVD-based image compression in Python.

Load a grayscale image as a matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$ .
Compute the full SVD: $\mathbf{A} = \mathbf{U}\mathbf{\Sigma}\mathbf{V}^T$ .
Form rank- $k$ approximations $\mathbf{A}_k = \sum_{i=1}^{k} \sigma_i \mathbf{u}_i \mathbf{v}_i^T$ for $k = 1, 5, 10, 20, 50$ .
Display the reconstructed images side by side.
Plot the relative reconstruction error $\|\mathbf{A} - \mathbf{A}_k\|_F / \|\mathbf{A}\|_F$ versus $k$ and verify that it equals $\frac{\sqrt{\sum_{i>k} \sigma_i^2}}{\sqrt{\sum_{i} \sigma_i^2}}.$
Also plot the compression ratio: the rank- $k$ approximation stores $k(m + n + 1)$ numbers versus $mn$ for the full image.

Reference implementation: ch01/python/svd_compression.py.

Show Hint

Use numpy.linalg.svd with full_matrices=False for the economy SVD. The reconstruction is U[:, :k] @ np.diag(S[:k]) @ Vt[:k, :].

The Frobenius norm error follows directly from the SVD: $\|\mathbf{A} - \mathbf{A}_k\|_F^2 = \sum_{i=k+1}^{r} \sigma_i^2$ (by the Eckart–Young theorem from Exercise 12).

For a typical $512 \times 512$ image, rank $k = 50$ stores $50(512+512+1) \approx 51{,}250$ numbers versus $262{,}144$ for the full image — about 5 $\times$ compression — while retaining most visual quality.

Solution

Mathematical foundation

By the Eckart–Young theorem (Exercise 12), the best rank- $k$ approximation in Frobenius norm is the truncated SVD: $\mathbf{A}_k = \sum_{i=1}^{k} \sigma_i \mathbf{u}_i \mathbf{v}_i^T.$ The reconstruction error is $\|\mathbf{A} - \mathbf{A}_k\|_F = \sqrt{\sum_{i=k+1}^{r} \sigma_i^2},$ and the relative error is $\frac{\|\mathbf{A} - \mathbf{A}_k\|_F}{\|\mathbf{A}\|_F} = \frac{\sqrt{\sum_{i>k}\sigma_i^2}}{\sqrt{\sum_i \sigma_i^2}}.$

Storage analysis

The full $m \times n$ image requires $mn$ numbers. The rank- $k$ approximation stores:

$\mathbf{U}_k \in \mathbb{R}^{m \times k}$ : $mk$ numbers,
$\boldsymbol{\sigma}_k \in \mathbb{R}^k$ : $k$ numbers,
$\mathbf{V}_k \in \mathbb{R}^{n \times k}$ : $nk$ numbers.

Total: $k(m + n + 1)$ . The compression ratio is $\rho = \frac{mn}{k(m+n+1)}.$

Implementation outline

import numpy as np
import matplotlib.pyplot as plt
from skimage import data, color

# Load grayscale image
img = color.rgb2gray(data.astronaut())  # 512 x 512
A = img.astype(np.float64)
m, n = A.shape

# Economy SVD
U, S, Vt = np.linalg.svd(A, full_matrices=False)

# Rank-k approximations
ks = [1, 5, 10, 20, 50]
fig, axes = plt.subplots(1, len(ks)+1, figsize=(18, 3))
axes[0].imshow(A, cmap='gray')
axes[0].set_title('Original')
axes[0].axis('off')

for idx, k in enumerate(ks):
    Ak = U[:, :k] @ np.diag(S[:k]) @ Vt[:k, :]
    axes[idx+1].imshow(np.clip(Ak, 0, 1), cmap='gray')
    ratio = m*n / (k*(m+n+1))
    axes[idx+1].set_title(f'Rank {k}\n({ratio:.1f}x)')
    axes[idx+1].axis('off')
plt.tight_layout()
plt.savefig('svd_compression_images.png', dpi=150)
plt.show()

# Error plot
r = len(S)
total_energy = np.sum(S**2)
rel_errors_theory = []
rel_errors_actual = []
k_range = range(1, r+1)

for k in k_range:
    # Theoretical
    err_theory = np.sqrt(np.sum(S[k:]**2) / total_energy)
    rel_errors_theory.append(err_theory)
    # Actual reconstruction
    Ak = U[:, :k] @ np.diag(S[:k]) @ Vt[:k, :]
    err_actual = np.linalg.norm(A - Ak, 'fro') / np.linalg.norm(A, 'fro')
    rel_errors_actual.append(err_actual)

plt.figure(figsize=(8, 5))
plt.semilogy(k_range, rel_errors_theory, 'b-', label='Theory')
plt.semilogy(k_range, rel_errors_actual, 'r--', label='Actual')
plt.xlabel('Rank k')
plt.ylabel(r' $\|\mathbf{A}-\mathbf{A}_k\|_F / \|\mathbf{A}\|_F$ ')
plt.title('SVD Compression: Relative Error vs Rank')
plt.legend()
plt.grid(True)
plt.savefig('svd_compression_error.png', dpi=150)
plt.show()

Expected output and verification

Visual output: The rank-1 image captures only the gross intensity pattern. Rank 5 shows rough structure. Rank 10 begins to show recognizable features. Rank 20 is a reasonable approximation with some blurring. Rank 50 is nearly indistinguishable from the original for most images.

Error plot: The theory and actual curves should coincide to machine precision, confirming the identity $\|\mathbf{A}-\mathbf{A}_k\|_F / \|\mathbf{A}\|_F = \sqrt{\sum_{i>k}\sigma_i^2 \,/\, \sum_i \sigma_i^2}.$

Compression ratios (for a $512\times 512$ image):

Rank $k$	Storage	Compression ratio
1	1,025	255.8 $\times$
5	5,125	51.1 $\times$
10	10,250	25.6 $\times$
20	20,500	12.8 $\times$
50	51,250	5.1 $\times$

The rapid decay of singular values in natural images explains why SVD-based low-rank approximation achieves effective compression — most of the image "energy" (Frobenius norm) is captured by a small number of singular values.

The complete implementation is available at ch01/python/svd_compression.py. $\square$

Exercises

ex-ch01-01

Characteristic polynomial

Eigenvectors

Spectral decomposition and verification

ex-ch01-02

Compute $\mathbf{A}^T\mathbf{A}$ and its eigen-decomposition

Compute left singular vectors

Assemble the SVD and verify

ex-ch01-03

Apply the definition block by block

Write the full $4 \times 4$ matrix

Verify dimensions

ex-ch01-04

First vector

Second vector

Third vector

Final orthonormal basis

ex-ch01-05

Moduli of the entries

$\ell_1$-norm

$\ell_2$-norm

$\ell_\infty$-norm

ex-ch01-06

Hermitian property

Quadratic form as a squared norm

Strict positivity from full column rank

ex-ch01-07

Diagonal entries of $\mathbf{A}^H\mathbf{A}$

Take the trace

ex-ch01-08

Expand the squared norm

Apply the unitary property

Conclude

ex-ch01-09

Expand as a sum

Differentiate component-wise

Assemble the gradient

Second identity

ex-ch01-10

Set up the eigenvalue equation

Premultiply by $\mathbf{v}^H$

Apply positive definiteness

ex-ch01-11

Matrix square root

Rewrite using the cyclic property of trace

The product $\mathbf{B}^{1/2}\mathbf{A}\mathbf{B}^{1/2}$ is PSD

Trace of a PSD matrix is non-negative

ex-ch01-12

Error of the truncated SVD

Lower bound for any rank-$k$ approximation

Bound using this vector

Summing the lower bounds

ex-ch01-13

Form the Lagrangian

KKT stationarity

Complementary slackness

Water level interpretation

ex-ch01-14

Reduce to a simpler problem

Express in terms of eigenvalues

Apply concavity of $\log$

Combine to get the concavity inequality

ex-ch01-15

Block structure of Kronecker products

Block multiplication

Identify as the Kronecker product

ex-ch01-16

Eigenvalues are real

Existence of at least one eigenvalue

Base case ($n = 1$)

Inductive step: unitary reduction

The sub-block is Hermitian

Apply the induction hypothesis

ex-ch01-17

(a) Schur complement factorization

(a) Bounding the Schur complement

(a) Fischer's inequality

(b) Iterative application: Hadamard's inequality

ex-ch01-18