Exercises
ex-ch01-01
EasyFind the eigenvalues and eigenvectors of Then verify the spectral decomposition by explicitly multiplying the factors.
The characteristic polynomial is .
After factoring, the eigenvalues are and . Find each eigenvector by solving .
Normalize the eigenvectors and arrange them as columns of . Because is real symmetric, is orthogonal.
Characteristic polynomial
The characteristic polynomial is Therefore the eigenvalues are and .
Eigenvectors
For :
For :
Spectral decomposition and verification
Set Then
ex-ch01-02
EasyCompute the singular value decomposition (SVD) of Verify the factorization .
Start with . Its eigenvalues give .
The eigenvalues of are and , so and . The eigenvectors of form .
Compute to get the left singular vectors.
Compute $\mathbf{A}^T\mathbf{A}$ and its eigen-decomposition
$
Compute left singular vectors
$
Assemble the SVD and verify
$
ex-ch01-03
EasyFor compute the Kronecker product and verify that the result is .
Recall replaces each entry of by the block .
The result is a matrix.
For example, the upper-left block is .
Apply the definition block by block
$
Write the full $4 \times 4$ matrix
$
Verify dimensions
is and is , so is . This matches the matrix above.
ex-ch01-04
EasyApply the Gram–Schmidt process to the set of vectors in to produce an orthonormal basis.
Set . Then project onto and subtract.
After the first step, . The projection of onto has coefficient .
Repeat for the third vector: subtract projections onto both and , then normalize.
First vector
Let . Then
Second vector
Let . Compute the projection coefficient: Subtract the projection: Normalize: , so
Third vector
Let . Compute projections: Subtract both projections: Normalize: , so
Final orthonormal basis
The Gram–Schmidt process yields the orthonormal basis One can verify for all pairs.
ex-ch01-05
EasyGiven the complex vector compute the -norm , the -norm , and the -norm .
For complex vectors, where is the complex modulus.
Compute , , .
and .
Moduli of the entries
$
$\ell_1$-norm
$
$\ell_2$-norm
$
$\ell_\infty$-norm
|\mathbf{x}|_\infty \le |\mathbf{x}|_2 \le |\mathbf{x}|_1\sqrt{2} \le 2 \le 2+\sqrt{2}\square$
ex-ch01-06
MediumLet with (i.e., has full column rank, ). Show that is positive definite.
A matrix is positive definite if for all nonzero .
Write and think about when this can be zero.
Full column rank means , so implies .
Hermitian property
First, is Hermitian: So it makes sense to ask whether it is positive definite.
Quadratic form as a squared norm
For any , This shows is at least positive semidefinite.
Strict positivity from full column rank
Now suppose . Since , the null space of is . Therefore , which gives Hence for all nonzero , proving that is positive definite.
ex-ch01-07
MediumProve that for any , where is the Frobenius norm.
Write out the diagonal entry of using the definition of matrix multiplication.
. This is the squared -norm of column .
Sum the diagonal entries over to get the trace.
Diagonal entries of $\mathbf{A}^H\mathbf{A}$
Let with and . The entry of is Setting :
Take the trace
\square$
ex-ch01-08
MediumShow that for any unitary matrix (i.e., ), the Euclidean norm is preserved:
Expand .
Use the associativity of the Hermitian transpose: .
Apply and take square roots.
Expand the squared norm
$
Apply the unitary property
Since ,
Conclude
We have shown . Since norms are non-negative, taking square roots gives Geometrically, unitary transformations are rotations (possibly with reflections) in — they preserve lengths and angles.
ex-ch01-09
MediumFor real vectors , derive the gradients
Write the scalar and differentiate with respect to each .
The gradient is the vector whose -th entry is .
Note since both are the same scalar.
Expand as a sum
Let . This is a scalar-valued function of the vector .
Differentiate component-wise
The -th component of the gradient is since each term with has zero derivative with respect to , and the term contributes .
Assemble the gradient
Stacking all components:
Second identity
Since (both equal the same scalar for real vectors), we immediately have Note: for complex vectors with Wirtinger derivatives, the two expressions differ, but in the real case they coincide.
ex-ch01-10
MediumProve that all eigenvalues of a positive definite matrix are strictly positive.
Let be an eigenvalue with eigenvector , so .
Premultiply both sides by and use the definition of positive definiteness.
Isolate — the Rayleigh quotient.
Set up the eigenvalue equation
Let be an eigenvalue of with corresponding eigenvector :
Premultiply by $\mathbf{v}^H$
$
Apply positive definiteness
Since is positive definite and , we have . Also . Therefore Since was an arbitrary eigenvalue, all eigenvalues of are strictly positive.
Remark. The converse also holds for Hermitian matrices: if all eigenvalues are positive and is Hermitian, then is positive definite (since and for ).
ex-ch01-11
MediumLet and (both positive semidefinite). Prove that .
Since , it has a matrix square root such that .
Use the cyclic property of the trace: .
After rearranging, you should arrive at , which is the trace of a PSD matrix.
Matrix square root
Since , it has an eigendecomposition with . Define where . Then and .
Rewrite using the cyclic property of trace
\operatorname{tr}(\mathbf{X}\mathbf{Y}\mathbf{Z}) = \operatorname{tr}(\mathbf{Z}\mathbf{X}\mathbf{Y})$.
The product $\mathbf{B}^{1/2}\mathbf{A}\mathbf{B}^{1/2}$ is PSD
For any , since . Hence .
Trace of a PSD matrix is non-negative
The trace of a PSD matrix equals the sum of its (non-negative) eigenvalues, hence
ex-ch01-12
MediumProve the Eckart–Young theorem: if is the SVD of (with ), then the best rank- approximation in Frobenius norm is i.e., minimizes over all matrices of rank at most .
First show that by orthogonality of the rank-1 terms in the SVD.
For any rank- matrix , its null space has dimension . Consider the intersection of with .
By a dimension-counting argument, this intersection contains a unit vector , and .
Error of the truncated SVD
Write where . Since the sets are orthonormal in the Frobenius inner product (because ), we have
Lower bound for any rank-$k$ approximation
Let be any matrix with . Then .
Consider the subspace , which has dimension .
By the dimension formula for subspace intersections: So there exists a unit vector .
Bound using this vector
Write with . Since : Now compute : since ,
Summing the lower bounds
A more refined version of the argument applies to the sum of the bottom singular values. Consider the dimensional null space of and the subspaces for . By applying the Courant–Fischer minimax characterization, one can show Since the truncated SVD achieves this lower bound (from Step 1), it is optimal:
ex-ch01-13
Medium(Water-filling) Consider the optimization problem where are given constants and is a total power budget. Derive the water-filling solution using the KKT conditions.
Form the Lagrangian .
The KKT stationarity condition gives . Use complementary slackness .
When , we get . The water level is chosen so that .
Form the Lagrangian
The Lagrangian is where is the multiplier for the power constraint and are the multipliers for .
KKT stationarity
Differentiating with respect to and setting to zero: which gives
Complementary slackness
The KKT complementary slackness conditions require for each . There are two cases:
Case 1: . Then , so from (1):
Case 2: . Then and from (1): , i.e., .
Combining both cases with the non-negativity requirement: where .
Water level interpretation
Define the water level . The solution becomes The water level is determined by the power constraint Geometrically, one "pours water" of total volume over steps of height : channels with small (high step) receive no power, while channels with large (low step) receive more power. The water surface level is uniform across all active channels.
ex-ch01-14
MediumShow that the function is concave on the cone of positive definite matrices. That is, for any , , and :
Factor out : write .
Use on a product: .
Reduce to proving concavity of where are eigenvalues of , and use the concavity of .
Reduce to a simpler problem
Write , which is positive definite with eigenvalues . Then so
Express in terms of eigenvalues
Since has eigenvalues , the matrix has eigenvalues , so Also, .
Apply concavity of $\log$
By the concavity of on : Summing over :
Combine to get the concavity inequality
f(t\mathbf{A}+(1-t)\mathbf{B}) \ge tf(\mathbf{A}) + (1-t)f(\mathbf{B})\square$
ex-ch01-15
MediumProve the mixed-product property of Kronecker products: if , , , are matrices of compatible dimensions (so that and exist), then
Use the block structure of the Kronecker product: if is , then is the block matrix with -block .
Compute the -block of the product using block matrix multiplication.
You should get , which is exactly the -block of .
Block structure of Kronecker products
Let , , , .
The Kronecker product is the block matrix whose -block (each block being ) is : Similarly, for and .
Block multiplication
The -block of the product is obtained by multiplying the -th block row of with the -th block column of :
Identify as the Kronecker product
Recall that . Therefore the -block above equals , which is precisely the -block of .
Since the block structures match for all :
ex-ch01-16
HardProve the spectral theorem for Hermitian matrices from scratch: every Hermitian matrix (i.e., ) can be written as where is unitary and is real diagonal. Use induction on the matrix size .
Base case : a Hermitian matrix is just a real scalar.
For the inductive step, first show that all eigenvalues of a Hermitian matrix are real (use ). Then let be an eigenvalue/eigenvector pair.
Extend to an orthonormal basis and apply a unitary change of basis to reduce to a block form with an Hermitian sub-block. Then apply the induction hypothesis.
Eigenvalues are real
Let be an eigenvalue with eigenvector : . Then Taking the conjugate: where we used . Thus , and since , we conclude , i.e., .
Existence of at least one eigenvalue
Every matrix over has at least one eigenvalue (the characteristic polynomial of degree has at least one root in by the fundamental theorem of algebra). Combined with the result above, has at least one real eigenvalue with a unit eigenvector .
Base case ($n = 1$)
If , then with , so . We can write with (trivially unitary) and .
Inductive step: unitary reduction
Assume the theorem holds for all Hermitian matrices. Let be Hermitian with eigenvalue and unit eigenvector .
Extend to an orthonormal basis of (e.g., via Gram–Schmidt). Form the unitary matrix Then where , (since ), and .
Since :
The sub-block is Hermitian
is and Hermitian:
Apply the induction hypothesis
By the induction hypothesis, where is unitary and is real diagonal. Therefore Define (unitary) and (real diagonal). Then where is unitary (product of unitary matrices is unitary).
ex-ch01-17
HardProve Hadamard's inequality using Fischer's inequality. Specifically:
(a) First prove Fischer's inequality: if is partitioned as then .
(b) Then apply Fischer's inequality iteratively to obtain Hadamard's inequality:
For Fischer's inequality, use the Schur complement: .
The Schur complement satisfies , so .
For part (b), repeatedly split off one row/column at a time.
(a) Schur complement factorization
Since , the leading principal sub-block is also positive definite, hence invertible. The Schur complement of in is The block LDU factorization gives Taking determinants (the outer block-triangular matrices have unit determinant):
(a) Bounding the Schur complement
Since , we can write (using since is Hermitian). Therefore Since implies , and , all eigenvalues of lie in . Therefore
(a) Fischer's inequality
Combining with the bound on :
(b) Iterative application: Hadamard's inequality
We apply Fischer's inequality iteratively. Partition by splitting off the first row and column: where is the lower-right sub-matrix. By Fischer's inequality: Now apply Fischer's inequality to by splitting off its first row/column (which corresponds to ): Continuing inductively: This is Hadamard's inequality. Equality holds if and only if is diagonal.
ex-ch01-18
HardLet be a MIMO channel matrix with SVD . Prove that the MIMO capacity is achieved by where with , reducing the capacity to where the satisfy the water-filling solution from Exercise 13 with .
Substitute the SVD into the objective and use where .
Use Hadamard's inequality: , with equality when is diagonal.
The constraint is preserved under unitary conjugation, and if and only if .
Substitute the SVD
Let where and are unitary, and has diagonal entries . Substituting: where .
Simplify the objective using $\det(\mathbf{I} + \mathbf{A}\mathbf{B}) = \det(\mathbf{I} + \mathbf{B}\mathbf{A})$
$
Constraint preservation
Since is unitary, the map is a bijection on the PSD cone. Moreover:
- ,
- .
So the optimization over is equivalent to optimizing over :
Apply Hadamard's inequality
Let . The matrix is positive definite. Its diagonal entry (for ) is , and for it is . By Hadamard's inequality: Equality holds when is diagonal, which occurs when is diagonal (since is diagonal).
Diagonal $\widetilde{\mathbf{Q}}$ is optimal
Furthermore, for any with diagonal entries , the diagonal matrix satisfies:
- (diagonal entries of a PSD matrix are non-negative),
- ,
- .
So we may restrict to diagonal without loss of optimality. Setting :
Recover the optimal $\mathbf{Q}$ and water-filling
The optimal maps back to This means the transmitter beamforms along the right singular vectors of the channel.
The scalar optimization over is exactly the water-filling problem from Exercise 13 with . The solution is where the water level is chosen so that . The resulting capacity is
ex-ch01-19
ChallengePower Iteration Algorithm.
Implement the power iteration algorithm in Python to find the dominant eigenvalue and eigenvector of a real symmetric matrix.
- Generate a random symmetric matrix where has i.i.d. entries.
- Starting from a random unit vector , iterate:
- Plot versus on a log scale, where is the true dominant eigenvalue (from
numpy.linalg.eigh). - Verify that the convergence rate is .
Reference implementation: ch01/python/power_iteration.py.
Use numpy.random.randn(5,5) for , then symmetrize. Use numpy.linalg.eigh for the ground-truth eigenvalues.
The Rayleigh quotient converges to at rate (quadratic in the eigenvalue ratio per iteration for the Rayleigh quotient, versus linear for the eigenvector).
On a semilog plot, the error should appear as an approximately straight line with slope .
Algorithm description
The power iteration exploits the fact that for a symmetric matrix with eigenvalues and corresponding orthonormal eigenvectors , the iteration converges to as long as has a nonzero component along .
Expanding with : Since for , the sum vanishes as .
Convergence rate analysis
The error in the eigenvector direction decays as .
For the Rayleigh quotient, the convergence is faster. Let where . Then So the eigenvalue estimate converges at rate — twice as fast on a log scale.
Implementation outline
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42)
M = np.random.randn(5, 5)
A = M + M.T # symmetric
# Ground truth
eigvals, eigvecs = np.linalg.eigh(A)
lambda1 = eigvals[np.argmax(np.abs(eigvals))]
# Power iteration
q = np.random.randn(5)
q = q / np.linalg.norm(q)
errors = []
n_iter = 50
for k in range(n_iter):
z = A @ q
q = z / np.linalg.norm(z)
lam_k = q.T @ A @ q
errors.append(abs(lam_k - lambda1))
# Plot
plt.semilogy(range(n_iter), errors, 'b-o', markersize=3)
plt.xlabel('Iteration k')
plt.ylabel(r'')
plt.title('Power Iteration Convergence')
# Reference slope
ratio = np.sort(np.abs(eigvals))[-2] / np.max(np.abs(eigvals))
plt.semilogy(range(n_iter),
errors[0] * ratio**(2*np.arange(n_iter)),
'r--', label=rf', ratio={ratio:.3f}')
plt.legend()
plt.grid(True)
plt.savefig('power_iteration_convergence.png', dpi=150)
plt.show()
Expected output
The semilog plot should show the error decreasing approximately linearly on the log scale, with slope per iteration. The red dashed reference line should closely track the blue error curve, confirming the theoretical convergence rate.
If the eigenvalue ratio is close to 1 (eigenvalues nearly equal in magnitude), convergence will be slow. If the ratio is small, convergence is rapid — often reaching machine precision in fewer than 20 iterations.
The complete implementation is available at ch01/python/power_iteration.py.
ex-ch01-20
ChallengeSVD-Based Image Compression.
Implement SVD-based image compression in Python.
- Load a grayscale image as a matrix .
- Compute the full SVD: .
- Form rank- approximations for .
- Display the reconstructed images side by side.
- Plot the relative reconstruction error versus and verify that it equals
- Also plot the compression ratio: the rank- approximation stores numbers versus for the full image.
Reference implementation: ch01/python/svd_compression.py.
Use numpy.linalg.svd with full_matrices=False for the economy SVD. The reconstruction is U[:, :k] @ np.diag(S[:k]) @ Vt[:k, :].
The Frobenius norm error follows directly from the SVD: (by the Eckart–Young theorem from Exercise 12).
For a typical image, rank stores numbers versus for the full image — about 5 compression — while retaining most visual quality.
Mathematical foundation
By the Eckart–Young theorem (Exercise 12), the best rank- approximation in Frobenius norm is the truncated SVD: The reconstruction error is and the relative error is
Storage analysis
The full image requires numbers. The rank- approximation stores:
- : numbers,
- : numbers,
- : numbers.
Total: . The compression ratio is
Implementation outline
import numpy as np
import matplotlib.pyplot as plt
from skimage import data, color
# Load grayscale image
img = color.rgb2gray(data.astronaut()) # 512 x 512
A = img.astype(np.float64)
m, n = A.shape
# Economy SVD
U, S, Vt = np.linalg.svd(A, full_matrices=False)
# Rank-k approximations
ks = [1, 5, 10, 20, 50]
fig, axes = plt.subplots(1, len(ks)+1, figsize=(18, 3))
axes[0].imshow(A, cmap='gray')
axes[0].set_title('Original')
axes[0].axis('off')
for idx, k in enumerate(ks):
Ak = U[:, :k] @ np.diag(S[:k]) @ Vt[:k, :]
axes[idx+1].imshow(np.clip(Ak, 0, 1), cmap='gray')
ratio = m*n / (k*(m+n+1))
axes[idx+1].set_title(f'Rank {k}\n({ratio:.1f}x)')
axes[idx+1].axis('off')
plt.tight_layout()
plt.savefig('svd_compression_images.png', dpi=150)
plt.show()
# Error plot
r = len(S)
total_energy = np.sum(S**2)
rel_errors_theory = []
rel_errors_actual = []
k_range = range(1, r+1)
for k in k_range:
# Theoretical
err_theory = np.sqrt(np.sum(S[k:]**2) / total_energy)
rel_errors_theory.append(err_theory)
# Actual reconstruction
Ak = U[:, :k] @ np.diag(S[:k]) @ Vt[:k, :]
err_actual = np.linalg.norm(A - Ak, 'fro') / np.linalg.norm(A, 'fro')
rel_errors_actual.append(err_actual)
plt.figure(figsize=(8, 5))
plt.semilogy(k_range, rel_errors_theory, 'b-', label='Theory')
plt.semilogy(k_range, rel_errors_actual, 'r--', label='Actual')
plt.xlabel('Rank k')
plt.ylabel(r'')
plt.title('SVD Compression: Relative Error vs Rank')
plt.legend()
plt.grid(True)
plt.savefig('svd_compression_error.png', dpi=150)
plt.show()
Expected output and verification
Visual output: The rank-1 image captures only the gross intensity pattern. Rank 5 shows rough structure. Rank 10 begins to show recognizable features. Rank 20 is a reasonable approximation with some blurring. Rank 50 is nearly indistinguishable from the original for most images.
Error plot: The theory and actual curves should coincide to machine precision, confirming the identity
Compression ratios (for a image):
| Rank | Storage | Compression ratio |
|---|---|---|
| 1 | 1,025 | 255.8 |
| 5 | 5,125 | 51.1 |
| 10 | 10,250 | 25.6 |
| 20 | 20,500 | 12.8 |
| 50 | 51,250 | 5.1 |
The rapid decay of singular values in natural images explains why SVD-based low-rank approximation achieves effective compression — most of the image "energy" (Frobenius norm) is captured by a small number of singular values.
The complete implementation is available at ch01/python/svd_compression.py.