Ferkans — Interactive Telecom Tutor

ex-ch06-01

Easy

Let $Y_1, \ldots, Y_n$ be i.i.d. Bernoulli $(\theta)$ . Derive the MLE of $\theta$ and verify that it is unbiased.

Show Hint

Write the log-likelihood as $k\log\theta + (n-k)\log(1-\theta)$ where $k = \sum_i y_i$ .

Set the derivative to zero.

Solution

Log-likelihood

$\ell_n(\theta) = k\log\theta + (n-k)\log(1-\theta)$ with $k = \sum_i y_i$ .

Solve score equation

$\partial\ell_n/\partial\theta = k/\theta - (n-k)/(1-\theta) = 0$ gives $\hat\theta_{\text{ml}} = k/n$ .

Unbiasedness

$\mathbb{E}[\hat\theta_{\text{ml}}] = n^{-1}\sum_i \mathbb{E}[Y_i] = \theta$ . $\blacksquare$

ex-ch06-02

Easy

Let $Y_1, \ldots, Y_n$ be i.i.d. Poisson $(\lambda)$ . Find the MLE of $\lambda$ and compute its Fisher information.

Show Hint

Recall $f_\lambda(y) = e^{-\lambda}\lambda^y/y!$ .

Solution

MLE

$\ell_n(\lambda) = -n\lambda + (\sum_i y_i)\log\lambda + \text{const}$ , giving $\hat\lambda_{\text{ml}} = \bar y$ .

Fisher information

$J_1(\lambda) = -\mathbb{E}[\partial^2\log f/\partial\lambda^2] = 1/\lambda$ , so $J(\lambda) = n/\lambda$ . The asymptotic variance of the MLE is $\lambda/n$ , matching $\operatorname{\text{Var}}(\bar Y) = \lambda/n$ exactly. $\blacksquare$

ex-ch06-03

Medium

Let $Y_1, \ldots, Y_n$ be i.i.d. $\mathcal{N}(\theta, \theta)$ with $\theta > 0$ (mean and variance equal). Derive the score equation and find the MLE in closed form.

Show Hint

Both $\mu$ and $\sigma^2$ equal $\theta$ , so there is a single parameter.

Use the Gaussian log-density and differentiate carefully in $\theta$ .

Solution

Log-likelihood

$\ell_n(\theta) = -\tfrac{n}{2}\log(2\pi\theta) - \tfrac{1}{2\theta}\sum_i(y_i - \theta)^2$ .

Score

$\partial\ell/\partial\theta = -n/(2\theta) + \sum_i(y_i-\theta)/\theta + \tfrac{1}{2\theta^2}\sum_i(y_i-\theta)^2$ . Simplify to $\partial\ell/\partial\theta = \bigl(-n\theta + \sum_i(y_i^2 - y_i\theta)\bigr)/(2\theta^2)\cdot 2 + \ldots$ (collect like terms carefully).

Quadratic in theta

After simplification, the score equation is $n\theta^2 + n\theta - \sum_i y_i^2 = 0$ (using the identity $\sum_i (y_i - \theta)^2 = \sum_i y_i^2 - 2\theta\sum_i y_i + n\theta^2$ ). The positive root is $\hat\theta_{\text{ml}} = \tfrac{1}{2}\bigl(-1 + \sqrt{1 + 4\overline{y^2}}\bigr)$ with $\overline{y^2} = n^{-1}\sum_i y_i^2$ . $\blacksquare$

ex-ch06-04

Easy

Use the invariance property to find the MLE of the standard deviation $\sigma$ from i.i.d. $\mathcal{N}(0, \sigma^2)$ observations, given the MLE of $\sigma^2$ .

Show Hint

$\sigma = \sqrt{\sigma^2}$ and the square root is one-to-one on $(0,\infty)$ .

Solution

Apply invariance

With $\hat\sigma^2_{\text{ml}} = n^{-1}\sum_i y_i^2$ and $u(v) = \sqrt v$ , $\hat\sigma_{\text{ml}} = \sqrt{\hat\sigma^2_{\text{ml}}} = \sqrt{n^{-1}\sum_i y_i^2}$ . $\blacksquare$

ex-ch06-05

Medium

For i.i.d. exponential $Y_i \sim \text{Exp}(\theta)$ , compute the finite-sample bias of $\hat\theta_{\text{ml}} = 1/\bar Y$ and show it is $O(1/n)$ .

Show Hint

$\sum Y_i \sim \text{Gamma}(n, \theta)$ , so $1/(\sum Y_i)$ is inverse-Gamma distributed.

Solution

Distribution of $\bar Y$

$S_n = \sum_i Y_i \sim \text{Gamma}(n, \theta)$ , so $\bar Y = S_n/n$ and $1/\bar Y = n/S_n$ . For $Z \sim \text{Gamma}(n, \theta)$ , $\mathbb{E}[1/Z] = \theta/(n-1)$ (standard identity, $n \geq 2$ ).

Bias computation

$\mathbb{E}[\hat\theta_{\text{ml}}] = n\mathbb{E}[1/S_n] = n\theta/(n-1)$ . Therefore bias $= \theta/(n-1) = O(1/n)$ . The bias-corrected estimator is $(n-1)/(n\bar Y)$ . $\blacksquare$

ex-ch06-06

Medium

Show that for i.i.d. $Y_i \sim \mathcal{N}(\mu, \sigma^2)$ the MLE $\hat\sigma^2_{\text{ml}} = n^{-1}\sum_i(Y_i - \bar Y)^2$ has asymptotic variance $2\sigma^4/n$ , matching the CRLB for $\sigma^2$ .

Show Hint

Compute the per-sample Fisher information for $\sigma^2$ treating $\mu$ as known; then check the block-diagonal FIM.

Solution

Fisher information for $\sigma^2$

$\partial^2 \log f/\partial(\sigma^2)^2 = 1/(2\sigma^4) - (y-\mu)^2/\sigma^6$ . Taking negative expectation at the truth gives $J_1(\sigma^2) = 1/(2\sigma^4)$ .

Asymptotic variance

By asymptotic normality, $\operatorname{\text{Var}}(\hat\sigma^2_{\text{ml}}) \to 1/(nJ_1) = 2\sigma^4/n$ , attaining the CRLB. $\blacksquare$

ex-ch06-07

Medium

Let $\mathbf{Y} = \mathbf{A}\boldsymbol{\theta} + \mathbf{Z}$ with $\mathbf{Z} \sim \mathcal{N}(\mathbf{0}, \sigma^2 \mathbf{I})$ and $\mathbf{A} \in \mathbb{R}^{n\times m}$ of full column rank. Write the MLE and its covariance, and identify it as a BLUE.

Show Hint

Apply Theorem thm-linear-gaussian-mle with $\boldsymbol{\Sigma} = \sigma^2 \mathbf{I}$ .

Solution

MLE formula

$\hat{\boldsymbol{\theta}}_{\text{ml}} = (\mathbf{A}^\mathsf{T}\mathbf{A})^{-1}\mathbf{A}^\mathsf{T}\mathbf{y}$ (ordinary least squares).

Covariance

$\operatorname{\text{Var}}(\hat{\boldsymbol{\theta}}_{\text{ml}}) = \sigma^2(\mathbf{A}^\mathsf{T}\mathbf{A})^{-1}$ , which equals the CRLB and makes the estimator both MVUE and BLUE (Gauss-Markov). $\blacksquare$

ex-ch06-08

Medium

For i.i.d. Laplace observations $f_\theta(y) = \tfrac{1}{2}e^{-|y-\theta|}$ , show that the MLE of the location $\theta$ is the sample median.

Show Hint

Minimize $\sum_i |y_i - \theta|$ ; the derivative (where it exists) is $\sum_i \text{sign}(\theta - y_i)$ .

Solution

Log-likelihood

$\ell_n(\theta) = -n\log 2 - \sum_i |y_i - \theta|$ . Maximizing is equivalent to minimizing $\sum_i |y_i - \theta|$ .

Median minimizes L1

The function $\theta \mapsto \sum_i |y_i - \theta|$ is convex and piecewise linear, minimized at the sample median (any median when $n$ is even). Thus $\hat\theta_{\text{ml}} = \text{median}(y_1,\ldots,y_n)$ . $\blacksquare$

ex-ch06-09

Hard

(Pareto tail index) $Y_1, \ldots, Y_n$ i.i.d. with density $f_\alpha(y) = \alpha y_0^\alpha y^{-(\alpha+1)}$ for $y \geq y_0$ , $y_0$ known. Find the MLE of $\alpha$ and its asymptotic distribution.

Show Hint

Differentiate $\ell_n(\alpha) = n\log\alpha + n\alpha\log y_0 - (\alpha+1)\sum_i \log y_i$ in $\alpha$ .

$\log(Y_i/y_0) \sim \text{Exp}(\alpha)$ .

Solution

Score equation

$\partial\ell_n/\partial\alpha = n/\alpha + n\log y_0 - \sum_i \log y_i = 0$ gives $\hat\alpha_{\text{ml}} = n/\sum_i \log(y_i/y_0)$ .

Asymptotic normality

$J_1(\alpha) = 1/\alpha^2$ . Hence $\sqrt{n}(\hat\alpha_{\text{ml}} - \alpha) \xrightarrow{d} \mathcal{N}(0, \alpha^2)$ . $\blacksquare$

ex-ch06-10

Medium

Implement Fisher scoring for Poisson GLM: $Y_i \sim \text{Poisson}(\exp(\mathbf{x}_i^\mathsf{T}\boldsymbol{\beta}))$ . Derive the score, the FIM, and the scoring update.

Show Hint

Let $\mu_i = \exp(\mathbf{x}_i^\mathsf{T}\boldsymbol{\beta})$ .

Solution

Score

$\nabla_{\boldsymbol{\beta}}\ell = \sum_i (y_i - \mu_i)\mathbf{x}_i = \mathbf{X}^\mathsf{T}(\mathbf{y} - \boldsymbol{\mu})$ .

FIM

$\mathbf{J}(\boldsymbol{\beta}) = \mathbf{X}^\mathsf{T}\operatorname{diag}(\mu_1,\ldots,\mu_n)\mathbf{X}$ .

Scoring update

$\boldsymbol{\beta}^{(k+1)} = \boldsymbol{\beta}^{(k)} + (\mathbf{X}^\mathsf{T}\operatorname{diag}(\boldsymbol{\mu}^{(k)})\mathbf{X})^{-1}\mathbf{X}^\mathsf{T}(\mathbf{y} - \boldsymbol{\mu}^{(k)})$ , a weighted least squares step — the IRLS algorithm. $\blacksquare$

ex-ch06-11

Medium

Derive the Cramer-Rao bound for the frequency $f_0$ of a single complex sinusoid $Y[n] = A e^{j(2\pi f_0 n + \phi)} + W[n]$ , $W[n]\sim \mathcal{CN}(0,\sigma^2)$ , $n=0,\ldots,N-1$ , treating $A$ and $\phi$ as known.

Show Hint

Compute the score in $f_0$ and the Fisher information via $-\mathbb{E}[\partial^2\ell/\partial f_0^2]$ .

Use $\sum_{n=0}^{N-1} n^2 \approx N^3/3$ for large $N$ .

Solution

Log-likelihood

$\ell = -{\sigma^2}^{-1}\sum_n |Y[n] - Ae^{j(2\pi f_0 n + \phi)}|^2$ .

Fisher information

$J(f_0) = \tfrac{8\pi^2 A^2}{\sigma^2}\sum_{n=0}^{N-1} n^2 \approx \tfrac{8\pi^2 A^2}{\sigma^2} \tfrac{N^3}{3}$ .

CRLB

$\operatorname{\text{Var}}(\hat f_0) \geq \tfrac{3\sigma^2}{8\pi^2 A^2 N^3} \sim N^{-3}$ — the super-efficient $N^{-3}$ scaling peculiar to frequency estimation. $\blacksquare$

ex-ch06-12

Easy

Show that for the Gaussian mean model $\mathcal{N}(\theta, \sigma^2)$ with known $\sigma^2$ , the MLE is efficient (achieves CRLB) for every finite $n$ .

Show Hint

Compute $\operatorname{\text{Var}}(\bar Y)$ and compare to $1/J(\theta)$ .

Solution

CRLB

$J_1(\theta) = 1/\sigma^2$ , so $J(\theta)^{-1} = \sigma^2/n$ .

MLE variance

$\operatorname{\text{Var}}(\bar Y) = \sigma^2/n = J(\theta)^{-1}$ . The bound is attained exactly for all $n$ , not just asymptotically. $\blacksquare$

ex-ch06-13

Hard

(Consistency via Jensen) For i.i.d. observations, show that $\theta_0 = \arg\max_\theta \mathbb{E}_{\theta_0}[\log f_\theta(Y)]$ using Jensen's inequality, and interpret the difference $\mathbb{E}_{\theta_0}[\log f_{\theta_0}(Y)] - \mathbb{E}_{\theta_0}[\log f_\theta(Y)]$ as the KL divergence.

Show Hint

Apply Jensen to $\log(f_\theta(Y)/f_{\theta_0}(Y))$ .

Solution

Jensen

$\mathbb{E}_{\theta_0}[\log(f_\theta(Y)/f_{\theta_0}(Y))] \leq \log \mathbb{E}_{\theta_0}[f_\theta(Y)/f_{\theta_0}(Y)] = \log 1 = 0$ .

KL identification

Negating, $\mathbb{E}_{\theta_0}[\log f_{\theta_0}(Y)] - \mathbb{E}_{\theta_0}[\log f_\theta(Y)] = D(f_{\theta_0}\|f_\theta) \geq 0$ , with equality iff $f_\theta = f_{\theta_0}$ a.s. By identifiability this forces $\theta = \theta_0$ , so $\theta_0$ is the unique population maximizer. $\blacksquare$

ex-ch06-14

Medium

(Profile likelihood) In the Gaussian AR(1) model $Y_t = \rho Y_{t-1} + W_t$ with $W_t \sim \mathcal{N}(0,\sigma^2)$ i.i.d., $|\rho| < 1$ , find the MLE of $\rho$ by profiling out $\sigma^2$ .

Show Hint

Conditional on $y_0$ , the conditional likelihood of $y_1, \ldots, y_n$ is Gaussian with mean $\rho y_{t-1}$ .

Solution

Conditional log-likelihood

$\ell(\rho,\sigma^2) = -\tfrac{n}{2}\log(2\pi\sigma^2) - \tfrac{1}{2\sigma^2}\sum_{t=1}^n(y_t - \rho y_{t-1})^2$ .

Profile $\sigma^2$

$\hat\sigma^2(\rho) = n^{-1}\sum_t(y_t - \rho y_{t-1})^2$ . Substituting makes the profiled log-likelihood $-\tfrac{n}{2}(\log\hat\sigma^2(\rho) + 1) + \text{const}$ .

MLE of $\rho$

Minimizing $\hat\sigma^2(\rho)$ in $\rho$ yields the least-squares estimator $\hat\rho_{\text{ml}} = \sum_t y_t y_{t-1}/\sum_t y_{t-1}^2$ . $\blacksquare$

ex-ch06-15

Hard

(Two-sinusoid MLE is non-convex) For $Y[n] = A_1\cos(2\pi f_1 n) + A_2\cos(2\pi f_2 n) + W[n]$ with $W[n]\sim\mathcal{N}(0,\sigma^2)$ , explain why the joint MLE in $(f_1, f_2)$ has multiple local maxima and recommend a practical algorithm.

Show Hint

Consider $f_1 \leftrightarrow f_2$ label symmetry and near-collision behaviour.

Solution

Identify non-convexity

The likelihood is invariant under $(f_1, f_2) \leftrightarrow (f_2, f_1)$ permutation — at least two global maxima. Additionally, when the periodogram has two dominant peaks, the likelihood has a ridge along the line $f_1 = f_2$ that creates a saddle.

Practical algorithm

(i) Compute the periodogram and identify the $K=2$ largest peaks as initializations. (ii) Run alternating Newton updates: fix $f_2$ , refine $f_1$ ; fix $f_1$ , refine $f_2$ . (iii) Alternatively apply ESPRIT/MUSIC to the sample covariance for a closed-form starting point. Without good initialization, pure Newton-Raphson will often return a wrong local maximum. $\blacksquare$

ex-ch06-16

Medium

Show that the Kaczmarz/normal-equation step $\hat{\boldsymbol{\theta}} = (\mathbf{A}^\mathsf{T}\mathbf{A})^{-1}\mathbf{A}^\mathsf{T}\mathbf{y}$ for the Gaussian linear model with $\boldsymbol{\Sigma} = \sigma^2\mathbf{I}$ is the orthogonal projection of $\mathbf{y}$ onto the column space of $\mathbf{A}$ .

Show Hint

The fitted values are $\mathbf{A}\hat{\boldsymbol{\theta}} = \mathbf{P}_\mathbf{A}\mathbf{y}$ with $\mathbf{P}_\mathbf{A} = \mathbf{A}(\mathbf{A}^\mathsf{T}\mathbf{A})^{-1}\mathbf{A}^\mathsf{T}$ .

Solution

Compute the fitted values

$\mathbf{A}\hat{\boldsymbol{\theta}} = \mathbf{A}(\mathbf{A}^\mathsf{T}\mathbf{A})^{-1}\mathbf{A}^\mathsf{T}\mathbf{y}$ .

Show projection properties

$\mathbf{P}_\mathbf{A}^2 = \mathbf{P}_\mathbf{A}$ and $\mathbf{P}_\mathbf{A}^\mathsf{T} = \mathbf{P}_\mathbf{A}$ , so $\mathbf{P}_\mathbf{A}$ is an orthogonal projector onto $\operatorname{range}(\mathbf{A})$ . The residual $\mathbf{y} - \mathbf{A}\hat{\boldsymbol{\theta}}$ is orthogonal to every column of $\mathbf{A}$ . $\blacksquare$

ex-ch06-17

Challenge

(Regularity failure) For the shifted exponential $f_\theta(y) = e^{-(y-\theta)}\mathbf{1}_{y\geq\theta}$ , find the MLE of $\theta$ , its exact distribution, and its convergence rate. Is it asymptotically Gaussian?

Show Hint

The MLE is $\min_i Y_i$ ; compute the distribution of the minimum.

Solution

MLE

The likelihood is zero unless $\theta \leq \min_i y_i$ , then $L_n(\theta) = e^{-\sum_i (y_i - \theta)}$ , increasing in $\theta$ . Hence $\hat\theta_{\text{ml}} = \min_i Y_i$ .

Distribution

$\min_i Y_i - \theta_0 \sim \text{Exp}(n)$ , so $n(\hat\theta_{\text{ml}} - \theta_0) \sim \text{Exp}(1)$ , independent of $n$ .

Convergence rate

Rate is $n^{-1}$ (super-efficient), limit is exponential — not Gaussian. CRLB machinery does not apply: the support boundary breaks regularity. $\blacksquare$

ex-ch06-18

Medium

Show that if $\hat\theta_n$ is an asymptotically efficient estimator, then so is $\hat\theta_n + c/n$ for any fixed constant $c$ .

Show Hint

Compute $\sqrt{n}(\hat\theta_n + c/n - \theta_0)$ and take the limit.

Solution

Reduce to original estimator

$\sqrt{n}(\hat\theta_n + c/n - \theta_0) = \sqrt{n}(\hat\theta_n - \theta_0) + c/\sqrt{n}$ . The second term goes to zero, so by Slutsky the limit is the same Gaussian. $\blacksquare$

Interpret

Any $O(1/n)$ shift of an efficient estimator is asymptotically equivalent — bias correction at the $O(1/n)$ level does not change asymptotic variance.

ex-ch06-19

Medium

Use the delta method with invariance to compute the asymptotic variance of $\hat{1/\theta}_{\text{ml}}$ in the exponential model, and verify it equals $1/(n\theta^2)$ plus higher-order terms.

Show Hint

Delta method: if $\sqrt{n}(\hat\theta - \theta) \xrightarrow{d} \mathcal{N}(0, v)$ , then $\sqrt{n}(g(\hat\theta) - g(\theta)) \xrightarrow{d} \mathcal{N}(0, g'(\theta)^2 v)$ .

Solution

Variance of the rate MLE

$\operatorname{\text{Var}}(\hat\theta_{\text{ml}}) \to \theta^2/n$ (CRLB).

Delta method with $g(v)=1/v$

$g'(\theta) = -1/\theta^2$ . Asymptotic variance of $1/\hat\theta_{\text{ml}}$ is $(1/\theta^4)(\theta^2/n) = 1/(n\theta^2)$ , which is the CRLB for the mean parameter $\tau = 1/\theta$ . $\blacksquare$

ex-ch06-20

Hard

(Tone in colored noise) Let $\mathbf{y} = A\mathbf{s}(f_0) + \mathbf{w}$ with $\mathbf{s}(f)_n = e^{j2\pi f n}$ , $\mathbf{w} \sim \mathcal{CN}(\mathbf{0}, \mathbf{C}_w)$ where $\mathbf{C}_w$ is known. Derive the MLE of $f_0$ and interpret it as a generalized periodogram.

Show Hint

Whiten the observation: $\tilde{\mathbf{y}} = \mathbf{C}_w^{-1/2}\mathbf{y}$ and apply Theorem thm-linear-gaussian-mle as a profile over $A$ .

Solution

Whiten

$\tilde{\mathbf{y}} = \mathbf{C}_w^{-1/2}\mathbf{y}$ , $\tilde{\mathbf{s}}(f) = \mathbf{C}_w^{-1/2}\mathbf{s}(f)$ . The whitened model has i.i.d. noise, so Theorem TGaussian Linear Model: Closed-Form MLE applies.

Profile $A$

For fixed $f$ , $\hat A(f) = \tilde{\mathbf{s}}(f)^H\tilde{\mathbf{y}}/\|\tilde{\mathbf{s}}(f)\|^2$ .

Generalized periodogram

The MLE of $f_0$ is $\hat f_{0,\text{ml}} = \arg\max_f |\mathbf{s}(f)^H \mathbf{C}_w^{-1}\mathbf{y}|^2 / (\mathbf{s}(f)^H \mathbf{C}_w^{-1}\mathbf{s}(f))$ , the peak of the noise-whitened ("generalized") periodogram. $\blacksquare$

Exercises

ex-ch06-01

Log-likelihood

Solve score equation

Unbiasedness

ex-ch06-02

MLE

Fisher information

ex-ch06-03

Log-likelihood

Score

Quadratic in theta

ex-ch06-04

Apply invariance

ex-ch06-05

Distribution of $\bar Y$

Bias computation

ex-ch06-06

Fisher information for $\sigma^2$

Asymptotic variance

ex-ch06-07

MLE formula

Covariance

ex-ch06-08

Log-likelihood

Median minimizes L1

ex-ch06-09

Score equation

Asymptotic normality

ex-ch06-10

Score

FIM

Scoring update

ex-ch06-11

Log-likelihood

Fisher information

CRLB

ex-ch06-12

CRLB

MLE variance

ex-ch06-13

Jensen

KL identification

ex-ch06-14

Conditional log-likelihood

Profile $\sigma^2$

MLE of $\rho$

ex-ch06-15

Identify non-convexity

Practical algorithm

ex-ch06-16

Compute the fitted values

Show projection properties

ex-ch06-17

MLE

Distribution

Convergence rate

ex-ch06-18

Reduce to original estimator

Interpret

ex-ch06-19

Variance of the rate MLE

Delta method with $g(v)=1/v$

ex-ch06-20

Whiten

Profile $A$

Generalized periodogram