Ferkans — Interactive Telecom Tutor

ex-ch13-01

Easy

Show that the $\ell_0$ "norm" is not a norm (it fails the positive-homogeneity axiom).

Show Hint

Test $\|\alpha\mathbf{x}\|_0$ for $\alpha \neq 0$ .

Solution

Scale invariance

For $\alpha \neq 0$ , $\|\alpha\mathbf{x}\|_0 = \|\mathbf{x}\|_0$ . A norm requires $\|\alpha\mathbf{x}\| = |\alpha|\,\|\mathbf{x}\|$ , which fails.

ex-ch13-02

Easy

Compute $\|\mathbf{x}\|_1$ and $\|\mathbf{x}\|_2$ for $\mathbf{x}=(3,-4,0,0,0)^T$ . Compare with $\|\mathbf{x}\|_0$ .

Show Hint

Apply the definitions directly.

Solution

Compute

$\|\mathbf{x}\|_0=2$ , $\|\mathbf{x}\|_1=7$ , $\|\mathbf{x}\|_2=5$ .

ex-ch13-03

Easy

Write down the soft-thresholding operator $\mathrm{soft}_\tau(x)$ and evaluate it at $x \in \{-3, -0.5, 0, 1.5\}$ for $\tau = 1$ .

Show Hint

$\mathrm{soft}_\tau(x)=\mathrm{sign}(x)\max(|x|-\tau,0)$ .

Solution

Evaluations

$-2, 0, 0, 0.5$ .

ex-ch13-04

Easy

A dictionary $\mathbf{A}\in\mathbb{R}^{M\times N}$ has unit-norm columns with $|\langle\mathbf{A}_{i},\mathbf{A}_{j}\rangle|=0.2$ for all $i\neq j$ . What is the largest $s$ for which coherence-based recovery guarantees sparse recovery?

Show Hint

Use $s < \tfrac{1}{2}(1+1/\mu)$ .

Solution

Plug in

$\mu=0.2$ , so $s < \tfrac{1}{2}(1+5)=3$ . Exact recovery is guaranteed for $s\leq 2$ .

ex-ch13-05

Easy

State the Welch lower bound on coherence for a $10\times 100$ unit-norm dictionary.

Show Hint

$\mu \geq \sqrt{(N-M)/(M(N-1))}$ .

Solution

Compute

$\mu \geq \sqrt{90/(10\cdot 99)} = \sqrt{90/990}\approx 0.301$ .

ex-ch13-06

Medium

Prove that the $\ell_1$ -ball $\{\mathbf{x}: \|\mathbf{x}\|_1\leq 1\}$ in $\mathbb{R}^N$ has exactly $2N$ vertices, and identify them.

Show Hint

Vertices correspond to $1$ -sparse signals.

Solution

Characterize vertices

A vertex is an extreme point: a point that is not a convex combination of two others in the set. In the $\ell_1$ -ball, these are $\pm\mathbf{e}_i$ for $i=1,\ldots,N$ .

Count

There are $2N$ such points, all $1$ -sparse with $\|\pm\mathbf{e}_i\|_1 = 1$ .

ex-ch13-07

Medium

Let $\mathbf{A}=[I_M\ \mathbf{H}]$ where $\mathbf{H}$ is the $M\times M$ Hadamard matrix scaled to unit-norm columns. Compute the coherence.

Show Hint

Columns of $I_M$ vs columns of $\mathbf{H}$ : inner product is $\pm 1/\sqrt{M}$ .

Solution

Worst-case pair

Any column of $I_M$ paired with any column of $\mathbf{H}/\sqrt{M}$ has inner product $1/\sqrt{M}$ . Columns of $I_M$ are pairwise orthogonal; columns of $\mathbf{H}$ are pairwise orthogonal.

Coherence

$\mu = 1/\sqrt{M}$ , achieving the Welch bound for $N=2M$ .

ex-ch13-08

Medium

Show that LASSO $\min_{\mathbf{x}}\tfrac{1}{2}\|\mathbf{y}-\mathbf{A}\mathbf{x}\|_2^2+\lambda\|\mathbf{x}\|_1$ is a convex optimization problem. Identify the role of convexity.

Show Hint

Both terms are convex. Sum of convex is convex.

Solution

Quadratic term

$\tfrac{1}{2}\|\mathbf{y}-\mathbf{A}\mathbf{x}\|_2^2$ is a convex quadratic in $\mathbf{x}$ (Hessian $\mathbf{A}^{H}\mathbf{A}\succeq 0$ ).

$\ell_1$ norm

Every norm is convex by the triangle inequality.

Convexity matters

Every local minimum is global. Descent algorithms converge to the global optimum. The dual is well-posed and the KKT conditions characterize optima. Convexity is why LASSO scales to millions of variables.

ex-ch13-09

Medium

Let $\mathbf{A}$ satisfy RIP with $\delta_{2s}=0.3$ . Show that every $s$ -sparse vector is uniquely determined from $\mathbf{y}=\mathbf{A}\mathbf{x}$ .

Show Hint

$\delta_{2s}<1$ implies injectivity on $s$ -sparse vectors.

Solution

Suppose two distinct $s$-sparse recoveries

If $\mathbf{y}=\mathbf{A}\mathbf{x}=\mathbf{A}\mathbf{x}'$ with $\mathbf{x}\neq\mathbf{x}'$ both $s$ -sparse, then $\mathbf{v}=\mathbf{x}-\mathbf{x}'$ is $2s$ -sparse and $\mathbf{A}\mathbf{v}=0$ .

Contradiction

RIP gives $\|\mathbf{A}\mathbf{v}\|_2^2 \geq (1-\delta_{2s})\|\mathbf{v}\|_2^2 > 0$ , contradicting $\mathbf{A}\mathbf{v}=0$ .

ex-ch13-10

Medium

Derive the KKT conditions for LASSO and interpret them as a sign-consistency condition.

Show Hint

Subdifferential of $\|\cdot\|_1$ at $x_i=0$ is $[-1,1]$ .

Solution

Subgradient optimality

$\mathbf{0}\in -\mathbf{A}^{H}(\mathbf{y}-\mathbf{A}\hat{\mathbf{x}})+\lambda\,\partial\|\hat{\mathbf{x}}\|_1$ .

Componentwise

For $i$ with $\hat{x}_i\neq 0$ : $[\mathbf{A}^{H}(\mathbf{y}-\mathbf{A}\hat{\mathbf{x}})]_i = \lambda\,\mathrm{sign}(\hat{x}_i)$ . For $i$ with $\hat{x}_i=0$ : $|[\mathbf{A}^{H}(\mathbf{y}-\mathbf{A}\hat{\mathbf{x}})]_i| \leq \lambda$ .

Interpretation

Inactive coordinates must have correlation with the residual bounded by $\lambda$ .

ex-ch13-11

Medium

A sensing matrix with $M=50$ , $N=500$ is drawn i.i.d.\ $\mathcal{N}(0,1/M)$ . Estimate the maximum sparsity for which RIP-based recovery is likely.

Show Hint

Use $M \geq C s \log(N/s)$ .

Solution

Solve for $s$

With $C\approx 2$ (typical constant): $50 \geq 2s\log(500/s)$ . Trying $s=5$ : $2\cdot 5\cdot\log(100) \approx 46$ . So $s\approx 5$ is feasible.

Practical answer

$s \lesssim 5$ likely recoverable; $s\gtrsim 10$ unlikely.

ex-ch13-12

Medium

Show that the proximal operator of $\lambda\|\cdot\|_1$ is componentwise soft thresholding.

Show Hint

Minimize $\tfrac{1}{2}(x-v)^2+\lambda|x|$ componentwise.

Solution

Scalar problem

$\min_x \tfrac{1}{2}(x-v)^2 + \lambda|x|$ . Subgradient condition: $x-v+\lambda\partial|x|\ni 0$ .

Cases

If $v>\lambda$ : $x=v-\lambda>0$ . If $v<-\lambda$ : $x=v+\lambda<0$ . Else $x=0$ . This is $\mathrm{soft}_{\lambda}(v)$ .

ex-ch13-13

Hard

Prove that if $\mathrm{spark}(\mathbf{A}) > 2s$ , then every $s$ -sparse $\mathbf{x}$ is the unique $s$ -sparse solution of $\mathbf{A}\mathbf{x}=\mathbf{y}$ .

Show Hint

$\mathrm{spark}$ is the smallest number of linearly dependent columns.

Solution

Contrapositive

Suppose distinct $s$ -sparse $\mathbf{x},\mathbf{x}'$ give the same $\mathbf{y}$ . Then $\mathbf{v}=\mathbf{x}-\mathbf{x}'$ is $2s$ -sparse, nonzero, and $\mathbf{A}\mathbf{v}=0$ , i.e.\ columns of $\mathbf{A}$ on $\mathrm{supp}(\mathbf{v})$ are dependent.

Conclude

This contradicts $\mathrm{spark}(\mathbf{A}) > 2s$ , so $\mathbf{x}=\mathbf{x}'$ .

ex-ch13-14

Hard

Derive the Welch bound: for a unit-norm dictionary $\mathbf{A}\in\mathbb{R}^{M\times N}$ , $\mu(\mathbf{A})\geq\sqrt{(N-M)/(M(N-1))}$ .

Show Hint

Compute $\mathrm{tr}((\mathbf{A}^{H}\mathbf{A})^2)$ two ways.

Solution

Frobenius identity

$\|\mathbf{A}^{H}\mathbf{A}\|_F^2=\sum_{i,j}|\langle\mathbf{A}_{i},\mathbf{A}_{j}\rangle|^2 = N + 2\sum_{i<j}|\langle\mathbf{A}_{i},\mathbf{A}_{j}\rangle|^2$ .

Use rank bound

Since $\mathbf{A}^{H}\mathbf{A}$ is $N\times N$ of rank $\leq M$ , $\|\mathbf{A}^{H}\mathbf{A}\|_F^2\geq(\mathrm{tr}(\mathbf{A}^{H}\mathbf{A}))^2/M=N^2/M$ .

Combine

$N + N(N-1)\mu^2 \geq N^2/M$ (bounding the average off-diagonal by $\mu^2$ ). Solving for $\mu$ : $\mu^2\geq(N-M)/(M(N-1))$ .

ex-ch13-15

Hard

Show that if $\delta_{2s}<\sqrt{2}-1$ , BPDN's error is bounded by $\|\hat{\mathbf{x}}-\mathbf{x}^\star\|_2\leq C\epsilon$ for a constant $C$ depending only on $\delta_{2s}$ .

Show Hint

Use the cone condition and the RIP-based restricted eigenvalue property.

Solution

Cone condition

Let $\mathbf{h}=\hat{\mathbf{x}}-\mathbf{x}^\star$ . Feasibility of both $\mathbf{x}^\star$ and $\hat{\mathbf{x}}$ gives $\|\mathbf{A}\mathbf{h}\|_2\leq 2\epsilon$ . Optimality of $\hat{\mathbf{x}}$ gives $\|\mathbf{h}_{S^c}\|_1\leq\|\mathbf{h}_S\|_1$ where $S=\mathrm{supp}(\mathbf{x}^\star)$ .

Norm compression

Let $S_1$ be the top- $s$ entries of $\mathbf{h}_{S^c}$ , $S_2$ the next, etc. A standard bound gives $\sum_{j\geq 2}\|\mathbf{h}_{S_j}\|_2\leq\|\mathbf{h}_{S^c}\|_1/\sqrt{s}\leq\|\mathbf{h}_S\|_1/\sqrt{s}\leq\|\mathbf{h}_{S\cup S_1}\|_2$ .

Apply RIP

$(1-\delta_{2s})\|\mathbf{h}_{S\cup S_1}\|_2^2\leq\|\mathbf{A}\mathbf{h}_{S\cup S_1}\|_2^2\leq\|\mathbf{A}\mathbf{h}\|_2\cdot\|\mathbf{A}\mathbf{h}_{S\cup S_1}\|_2+\delta_{2s}\sqrt{2}\|\mathbf{h}_{S\cup S_1}\|_2^2$ . The condition $\delta_{2s}<\sqrt{2}-1$ makes the coefficient positive; solving yields $\|\mathbf{h}\|_2\leq C\epsilon$ .

Exercises

ex-ch13-01

Scale invariance

ex-ch13-02

Compute

ex-ch13-03

Evaluations

ex-ch13-04

Plug in

ex-ch13-05

Compute

ex-ch13-06

Characterize vertices

Count

ex-ch13-07

Worst-case pair

Coherence

ex-ch13-08

Quadratic term

$\ell_1$ norm

Convexity matters

ex-ch13-09

Suppose two distinct $s$-sparse recoveries

Contradiction

ex-ch13-10

Subgradient optimality

Componentwise

Interpretation

ex-ch13-11

Solve for $s$

Practical answer

ex-ch13-12

Scalar problem

Cases

ex-ch13-13

Contrapositive

Conclude

ex-ch13-14

Frobenius identity

Use rank bound

Combine

ex-ch13-15

Cone condition

Norm compression

Apply RIP