Ferkans — Interactive Telecom Tutor

ex-ch14-01

Easy

Write the ISTA update for LASSO in one line. Identify the step size and the proximal operator.

Show Hint

Gradient step + soft threshold.

Solution

Update

$\mathbf{x}^{(t+1)} = \mathrm{soft}_{\lambda/L}\bigl(\mathbf{x}^{(t)} + L^{-1}\mathbf{A}^{H}(\mathbf{y}-\mathbf{A}\mathbf{x}^{(t)})\bigr)$ , with $L\geq\lambda_{\max}(\mathbf{A}^{H}\mathbf{A})$ .

ex-ch14-02

Easy

FISTA improves ISTA's rate from $O(1/k)$ to $O(1/k^2)$ . By how many iterations must ISTA run to match FISTA at iteration $k=100$ ?

Show Hint

$1/k_{\text{ISTA}}\approx 1/k^2_{\text{FISTA}}$ .

Solution

Match rates

$k_{\text{ISTA}}\approx k_{\text{FISTA}}^2 = 10{,}000$ .

ex-ch14-03

Easy

Write the ADMM updates for LASSO.

Show Hint

Split $\mathbf{x}=\mathbf{z}$ .

Solution

Updates

$\mathbf{x}^{+}=(\mathbf{A}^{H}\mathbf{A}+\rho I)^{-1}(\mathbf{A}^{H}\mathbf{y}+\rho(\mathbf{z}-\mathbf{u}))$ , $\mathbf{z}^{+}=\mathrm{soft}_{\lambda/\rho}(\mathbf{x}^{+}+\mathbf{u})$ , $\mathbf{u}^{+}=\mathbf{u}+\mathbf{x}^{+}-\mathbf{z}^{+}$ .

ex-ch14-04

Easy

Describe one OMP iteration in words.

Show Hint

Correlation, selection, least squares, residual.

Solution

Steps

(1) Compute $\mathbf{c}=\mathbf{A}^{H}\mathbf{r}^{(t)}$ ; (2) add $i^\star=\arg\max|c_i|$ to the support; (3) solve least-squares on the active set; (4) update residual $\mathbf{r}^{(t+1)}=\mathbf{y}-\mathbf{A}\hat{\mathbf{x}}$ .

ex-ch14-05

Easy

Explain why the posterior mean under the Bernoulli-Gaussian model is generically not sparse.

Show Hint

It averages over supports.

Solution

Mixture average

The posterior is a mixture over all $2^N$ supports. Entries of the posterior mean are weighted averages across supports; the weight on " $x_i=0$ " supports shrinks but does not make $\mathbb{E}[x_i\mid\mathbf{y}]$ exactly zero unless the posterior collapses.

ex-ch14-06

Medium

Prove that ISTA with step size $1/L$ is a contraction on the LASSO objective.

Show Hint

Use Lipschitz continuity of the gradient of the smooth part and the non-expansiveness of the proximal operator.

Solution

Smooth descent

Let $f(\mathbf{x})=\tfrac{1}{2}\|\mathbf{y}-\mathbf{A}\mathbf{x}\|_2^2$ . Lipschitz gradient with $L$ gives $f(\mathbf{x}^{+})\leq f(\mathbf{x})-\tfrac{1}{2L}\|\nabla f(\mathbf{x})\|_2^2$ for a full gradient step.

Prox non-expansiveness

$\|\mathrm{prox}_{\lambda/L\,\|\cdot\|_1}(\mathbf{u})-\mathrm{prox}_{\lambda/L\,\|\cdot\|_1}(\mathbf{v})\|_2\leq\|\mathbf{u}-\mathbf{v}\|_2$ .

Combine

The composite iteration decreases $F(\mathbf{x})=f(\mathbf{x})+\lambda\|\mathbf{x}\|_1$ at rate $O(1/k)$ .

ex-ch14-07

Medium

Derive the ADMM $\mathbf{x}$ -update for LASSO when $\mathbf{A}\in\mathbb{R}^{M\times N}$ with $M\ll N$ (use the Woodbury identity).

Show Hint

$(\mathbf{A}^{H}\mathbf{A}+\rho I_N)^{-1}$ via Woodbury becomes $O(M^2)$ solve.

Solution

Woodbury

$(\mathbf{A}^{H}\mathbf{A}+\rho I_N)^{-1} = \rho^{-1}I_N - \rho^{-1}\mathbf{A}^{H}(\rho I_M+\mathbf{A}\mathbf{A}^{H})^{-1}\mathbf{A}\cdot\rho^{-1}$ .

Cost

One $M\times M$ Cholesky factorization ( $O(M^3)$ ) shared across iterations; per-iteration cost becomes $O(MN)$ .

ex-ch14-08

Medium

Show that OMP with exact arithmetic recovers every $s$ -sparse signal when $\mu(\mathbf{A})<1/(2s-1)$ .

Show Hint

Tropp 2004, coherence-based guarantee.

Solution

Inductive argument

At each iteration, the correlation of the residual with any in-support atom dominates the correlation with any out-of-support atom, provided the coherence is small.

Condition

The worst-case correlation gap is $1-(2s-1)\mu>0$ , i.e.\ $\mu<1/(2s-1)$ . This ensures OMP never selects a wrong atom and exits after exactly $s$ iterations.

ex-ch14-09

Medium

Explain why hard thresholding is non-convex while soft thresholding is the proximal operator of a convex function.

Show Hint

Hard = prox of $\|\cdot\|_0$ (non-convex); soft = prox of $\|\cdot\|_1$ (convex).

Solution

Hard threshold

$\mathrm{hard}_\tau(x) = x\cdot\mathbb{1}\{|x|>\tau\}$ . Discontinuous; equivalent to proximal of $\lambda\|\cdot\|_0$ which is nonconvex.

Soft threshold

Lipschitz-continuous, monotone, equals $\mathrm{prox}_{\lambda\|\cdot\|_1}$ which is convex.

Implication

Hard thresholding algorithms (IHT) lack global convergence guarantees; soft thresholding algorithms (ISTA) converge globally.

ex-ch14-10

Medium

Compute the proximal operator of the group $\ell_{2,1}$ norm.

Show Hint

Block soft threshold.

Solution

Group-wise

For each group $g$ , $\mathrm{prox}_{\lambda\|\cdot\|_{2,1}}(\mathbf{v})_g = \max(1-\lambda/\|\mathbf{v}_g\|_2, 0)\cdot\mathbf{v}_g$ .

ex-ch14-11

Medium

Show that ADMM's primal and dual residuals converge to zero as $t\to\infty$ for convex LASSO.

Show Hint

Use Boyd et al. Sec. 3.2.

Solution

Lyapunov function

$V^{(t)}=\rho^{-1}\|\mathbf{u}^{(t)}-\mathbf{u}^\star\|_2^2+\rho\|\mathbf{z}^{(t)}-\mathbf{z}^\star\|_2^2$ is nonincreasing and bounded below.

Residual decay

$V^{(t)}-V^{(t+1)}\geq\rho\|\mathbf{r}^{(t+1)}\|_2^2+\rho\|\mathbf{s}^{(t+1)}\|_2^2$ where $\mathbf{r}, \mathbf{s}$ are primal/dual residuals. Summability forces $\|\mathbf{r}^{(t)}\|_2,\|\mathbf{s}^{(t)}\|_2\to 0$ .

ex-ch14-12

Medium

For SBL, derive the M-step update $\hat{\gamma}_i = \mu_i^2 + \Sigma_{ii}$ .

Show Hint

Type-II ML in the EM framework.

Solution

E-step

Compute posterior $p(\mathbf{x}\mid\mathbf{y},\boldsymbol{\gamma})$ : Gaussian with mean $\boldsymbol{\mu}$ , covariance $\boldsymbol{\Sigma}$ .

M-step

Maximize $\mathbb{E}[\log p(\mathbf{x}\mid\boldsymbol{\gamma})]$ in $\gamma_i$ : $\partial/\partial\gamma_i[-\tfrac{1}{2}\log\gamma_i - \tfrac{1}{2\gamma_i}\mathbb{E}[x_i^2]]=0$ gives $\hat{\gamma}_i=\mathbb{E}[x_i^2]=\mu_i^2+\Sigma_{ii}$ .

ex-ch14-13

Hard

Prove Nesterov's $O(1/k^2)$ rate for FISTA on the LASSO objective.

Show Hint

Construct the potential $V_k = k^2(F(\mathbf{x}^{(k)})-F^\star) + (\text{momentum term})$ and bound its increment.

Solution

Potential construction

Define $V_k = \tfrac{2}{L}(t_k^2)(F(\mathbf{x}^{(k)})-F^\star) + \|t_k\mathbf{x}^{(k)}-(t_k-1)\mathbf{x}^{(k-1)}-\mathbf{x}^\star\|_2^2$ with $t_k\sim k/2$ .

Monotonicity

Using the descent lemma $F(\mathbf{x}^{(k+1)})\leq F(\mathbf{y})+\tfrac{L}{2}\|\mathbf{x}^{(k+1)}-\mathbf{y}\|_2^2+\nabla f(\mathbf{y})^T(\mathbf{x}^{(k+1)}-\mathbf{y})$ at the momentum point $\mathbf{y}$ , one shows $V_{k+1}\leq V_k$ .

Conclude

$V_k\leq V_0$ gives $F(\mathbf{x}^{(k)})-F^\star \leq \tfrac{2L}{t_k^2}V_0 = O(1/k^2)$ .

ex-ch14-14

Hard

Prove the CoSaMP recovery guarantee: if $\delta_{4s}<0.1$ , CoSaMP recovers $s$ -sparse signals to accuracy $\|\hat{\mathbf{x}}-\mathbf{x}^\star\|_2\leq C\|\mathbf{w}\|_2$ .

Show Hint

Four steps: identification, support merge, LS estimation, pruning. Use RIP at each step.

Solution

Identification

Correlate residual with columns; select top $2s$ . RIP guarantees signal components dominate noise.

Merge and LS

Merge with current support ( $\leq 3s$ atoms); solve least-squares restricted to this support.

Pruning + induction

Keep top $s$ magnitudes. Using $\delta_{4s}$ -RIP, the error contracts by a factor $<1$ per iteration, giving geometric convergence to an $O(\|\mathbf{w}\|_2)$ -neighborhood.

ex-ch14-15

Hard

Show that SBL fixed points have at most $M$ nonzero entries.

Show Hint

Use the rank structure of the marginal covariance $\mathbf{C}$ .

Solution

Marginal likelihood

$\mathcal{L}(\boldsymbol{\gamma})=-\log\det(\mathbf{C})-\mathbf{y}^H\mathbf{C}^{-1}\mathbf{y}$ with $\mathbf{C}=\sigma^2I_M+\mathbf{A}\Gamma\mathbf{A}^{H}$ , $\Gamma=\mathrm{diag}(\boldsymbol{\gamma})$ .

Rank argument

$\mathbf{C}$ is $M\times M$ ; adding a nonzero $\gamma_i>0$ adds at most rank $1$ . Once rank $M$ is reached, further positive $\gamma_i$ give no improvement to $\mathcal{L}$ .

Conclude

Wipf-Rao (2004) show the unconstrained maxima lie on faces of the positive orthant with at most $M$ positive coordinates.

Exercises

ex-ch14-01

Update

ex-ch14-02

Match rates

ex-ch14-03

Updates

ex-ch14-04

Steps

ex-ch14-05

Mixture average

ex-ch14-06

Smooth descent

Prox non-expansiveness

Combine

ex-ch14-07

Woodbury

Cost

ex-ch14-08

Inductive argument

Condition

ex-ch14-09

Hard threshold

Soft threshold

Implication

ex-ch14-10

Group-wise

ex-ch14-11

Lyapunov function

Residual decay

ex-ch14-12

E-step

M-step

ex-ch14-13

Potential construction

Monotonicity

Conclude

ex-ch14-14

Identification

Merge and LS

Pruning + induction

ex-ch14-15

Marginal likelihood

Rank argument

Conclude