Ferkans — Interactive Telecom Tutor

Why Signal Processing Is ML in Disguise

Some of the most widely deployed signal-processing procedures — the periodogram, the matched filter, the DOA spectrum — were not originally derived as maximum likelihood estimators, but they are MLEs under standard AWGN assumptions. Recognizing the ML structure explains why these algorithms are near-optimal, why they achieve the CRLB asymptotically, and when they can be improved by subspace methods (MUSIC, ESPRIT) that exploit additional structure. This section walks through the three canonical signal-processing MLEs and names the parameter each one estimates.

,

Definition:
Deterministic Signal in Complex AWGN

Let $s(\boldsymbol{\theta})[n]$ be a known deterministic signal depending on an unknown parameter $\boldsymbol{\theta}$ . Observe $Y[n] \;=\; s(\boldsymbol{\theta})[n] \;+\; W[n], \qquad n = 0, \ldots, N-1,$ with $W[n] \sim_{\text{i.i.d.}} \mathcal{CN}(0, \sigma^2)$ . The log-likelihood (discarding constants independent of $\boldsymbol{\theta}$ ) is $\ell(\boldsymbol{\theta}) \;=\; -\frac{1}{\sigma^2} \sum_{n=0}^{N-1} \bigl|Y[n] - s(\boldsymbol{\theta})[n]\bigr|^2.$ Maximizing $\ell$ over $\boldsymbol{\theta}$ is equivalent to the least squares problem $\min_{\boldsymbol{\theta}} \|Y - s(\boldsymbol{\theta})\|^2$ .

Theorem: Matched Filter = MLE for Delay Estimation

Let $s[n] = p[n - \tau]$ for a known pulse $p$ and unknown delay $\tau$ . Under the AWGN model above, the MLE of $\tau$ is $\hat\tau_{\text{ml}} \;=\; \arg\max_{\tau}\; \operatorname{Re}\!\left\{\sum_{n=0}^{N-1} Y[n]\, p^*[n - \tau]\right\},$ the peak of the matched-filter output.

Minimizing $\|Y - p_\tau\|^2 = \|Y\|^2 - 2\operatorname{Re}\langle Y, p_\tau\rangle + \|p_\tau\|^2$ becomes, for a delay-invariant pulse energy, maximization of $\operatorname{Re}\langle Y, p_\tau\rangle$ . Correlating $Y$ against the time-reversed conjugate pulse is exactly matched filtering.

Proof

Expand the squared error

$\|Y - p_\tau\|^2 = \|Y\|^2 - 2\operatorname{Re}\sum_n Y[n]p^*[n-\tau] + \sum_n |p[n-\tau]|^2$ .

Drop $\tau$-independent terms

$\|Y\|^2$ is a constant. For a pulse with compact support contained in $[0, N-1]$ , the pulse energy $\sum_n |p[n-\tau]|^2$ is independent of $\tau$ up to edge effects.

Identify the matched filter

Maximizing the likelihood reduces to maximizing $\operatorname{Re}\sum_n Y[n]p^*[n-\tau]$ , the real part of the cross-correlation. This is the matched filter output sampled at delay $\tau$ . $\blacksquare$

,

Theorem: Periodogram = MLE for Frequency of a Single Sinusoid

Let $Y[n] = A e^{j(2\pi f_0 n + \phi)} + W[n]$ , $n = 0, \ldots, N-1$ , with $W[n] \sim \mathcal{CN}(0, \sigma^2)$ , $A > 0$ , $\phi$ , and $f_0$ unknown. The MLE of $f_0$ is the peak of the periodogram: $\hat f_{0,\text{ml}} \;=\; \arg\max_{f}\; \bigl|\hat Y(f)\bigr|^2, \qquad \hat Y(f) \;=\; \sum_{n=0}^{N-1} Y[n] e^{-j 2\pi f n}.$ Once $\hat f_{0,\text{ml}}$ is found, the MLEs of $A$ and $\phi$ are $\hat A e^{j\hat\phi} = \hat Y(\hat f_{0,\text{ml}})/N$ .

Profile-likelihood: for any candidate frequency $f$ , the best-fitting complex amplitude is the DFT coefficient at $f$ . The residual error is minimized when the DFT magnitude at $f$ is maximized. The DFT magnitude squared is precisely the periodogram.

Proof

Concentrate the amplitude

For fixed $f$ , let $c = Ae^{j\phi}$ . The least-squares problem $\min_c \sum_n |Y[n] - c e^{j2\pi f n}|^2$ is linear in $c$ with solution $\hat c(f) = N^{-1}\sum_n Y[n] e^{-j2\pi f n} = \hat Y(f)/N$ .

Profile log-likelihood

Substituting $\hat c(f)$ into $\ell$ , the residual energy is $\sum_n |Y[n]|^2 - |\hat Y(f)|^2/N$ . Minimizing the residual over $f$ is the same as maximizing the periodogram $|\hat Y(f)|^2$ .

Computation via FFT

On the DFT grid $f_k = k/N$ , $\hat Y(f_k)$ is computed in $O(N\log N)$ via the FFT. For off-grid refinement, one performs a local quadratic interpolation or a Newton step. $\blacksquare$

,

Example: Phase Estimation at Known Frequency

Given $Y_i = A\cos(2\pi f_0 i + \phi) + Z_i$ with $Z_i \sim \mathcal{N}(0, \sigma^2)$ and $A, f_0$ known, find the MLE of $\phi$ and its asymptotic variance.

Solution

Least squares

The log-likelihood is proportional to $-\sum_i (y_i - A\cos(2\pi f_0 i + \phi))^2$ . Expand using $\cos(\alpha+\phi) = \cos\alpha\cos\phi - \sin\alpha\sin\phi$ .

Closed form via arctangent

$\hat\phi_{\text{ml}}(\mathbf{y}) \;=\; -\arctan\!\left(\frac{\sum_{i=0}^{n-1} y_i \sin(2\pi f_0 i)}{\sum_{i=0}^{n-1} y_i \cos(2\pi f_0 i)}\right).$ $ The arctangent ratio takes the in-phase and quadrature correlations of the data with the known tone.

Asymptotic variance

The Fisher information for $\phi$ (at large $n$ ) is $J(\phi) = nA^2/(2\sigma^2)$ , giving asymptotic variance $2\sigma^2/(nA^2)$ . The MLE achieves the CRLB asymptotically. In $\text{SNR}$ terms with $\text{SNR} = A^2/(2\sigma^2)$ , $\operatorname{\text{Var}}(\hat\phi) \to 1/(n\,\text{SNR})$ . $\blacksquare$

,

Periodogram MLE for Sinusoidal Frequency

Simulate a noisy complex sinusoid with unknown frequency $f_0$ and display the periodogram $|\hat Y(f)|^2/N$ . The peak of the periodogram is the MLE $\hat f_0$ . As $\text{SNR}$ increases, the peak sharpens and the estimator variance shrinks toward the CRLB.

Parameters

True frequency

f_0

0.17

Samples

N

128

SNR (dB)10

DOA Spectrum: MLE vs Beamforming

For a ULA of $M$ elements observing a single source from angle $\theta_0$ , plot the MLE cost function (likelihood-based) and the conventional beamforming spectrum as functions of hypothesized angle. The MLE is the peak of the matched-beam response; resolution improves with $M$ and $\text{SNR}$ .

Parameters

True DOA (deg)20

Array elements

M

16

Snapshots

L

20

SNR (dB)10

DOA Estimation: the MLE and Its Alternatives

For a narrowband ULA with $M$ elements and $L$ snapshots, the observation is $\mathbf{y}[\ell] = \mathbf{a}(\theta_0) s[\ell] + \mathbf{w}[\ell]$ , where $\mathbf{a}(\theta) = [1, e^{j\kappa d \sin\theta}, \ldots, e^{j(M-1)\kappa d \sin\theta}]^\mathsf{T}$ is the steering vector and $\kappa = 2\pi/\lambda$ . The MLE for a single source is $\hat\theta_{\text{ml}} \;=\; \arg\max_{\theta}\; \frac{1}{L}\sum_{\ell=1}^L \frac{|\mathbf{a}(\theta)^H \mathbf{y}[\ell]|^2}{\|\mathbf{a}(\theta)\|^2}.$ For multiple sources the MLE becomes a high-dimensional non-convex optimization — in practice one uses MUSIC (eigenstructure of the sample covariance) or ESPRIT (rotational invariance of the array) as computationally efficient alternatives that match MLE performance in the high- $\text{SNR}$ , many-snapshot regime.

, ,

DOA Estimators: MLE, MUSIC, ESPRIT

Method	Search Type	Sources	Asymptotic Optimality	Cost
Conventional beamforming	Grid over $\theta$	Arbitrary	Rayleigh-limited resolution	$O(MG)$ per snapshot
MLE	Nonlinear optimization	Any (known $K$ )	Achieves CRLB	Expensive, non-convex
MUSIC	1-D spectral search	$K < M$	Consistent, asymptotic CRLB	Eigendecomposition $O(M^3)$
ESPRIT	Closed-form eigenvalues	$K < M$	Consistent, asymptotic CRLB	Two eigendecompositions

Why This Matters: Timing and Carrier Synchronization in Wireless Receivers

Every digital receiver contains ML estimators: the timing recovery loop is a delay MLE running on a known preamble (matched filter peak detection); the carrier-frequency offset estimator is a frequency MLE (periodogram-based or autocorrelation-based); the channel estimator is an amplitude MLE on pilot subcarriers. Subspace DOA methods like MUSIC are used in mmWave beam-management and angle-domain channel estimation for massive MIMO. Understanding these blocks as MLEs lets you reason about their CRLB-limited accuracy directly from the waveform design.

⚠️Engineering Note

Grid Resolution for Periodogram and DOA MLE

DFT-based periodogram MLE has grid spacing $\Delta f = 1/N$ . The true frequency rarely lies on the grid, so the naive peak underestimates the true location with bias $O(1/N)$ and variance floored by the grid resolution. Standard fixes: zero-pad to a longer FFT for finer sampling, use quadratic interpolation on the three largest bins, or follow the coarse peak with a Newton step. The same applies to angular grids in DOA: a $1°$ grid is too coarse for large apertures at high SNR.

Practical Constraints

•
Zero-padding factor typically 4-8x for off-grid accuracy
•
Quadratic interpolation correction is $O(1)$ extra work
•
Newton refinement recovers the full CRLB asymptotically

📋 Ref: 3GPP TS 38.211 / 38.212 (5G NR synchronization signals)

Historical Note: Fisher's 1922 Invention of Maximum Likelihood

1912-1925

R. A. Fisher introduced the term likelihood and the principle of maximum likelihood in a 1922 paper, sharply distinguishing it from Bayesian posterior inference. He argued that for frequentist inference without prior information, the data themselves determine which parameter value is most "plausible" — a novel foundational stance. Fisher also introduced the information quantity now called Fisher information and identified the asymptotic efficiency of the MLE. Much of the modern theory of parameter estimation builds directly on Fisher's framework.

Historical Note: Wilks, Wald, and Large-Sample Theory

1938-1949

Fisher's asymptotic claims were made rigorous by S. S. Wilks (1938, who proved the chi-squared limit of the log-likelihood ratio) and A. Wald (1949, who gave the first complete proof of MLE consistency under compactness and continuity assumptions). H. Cramer's 1946 book assembled the modern treatment of MLE, the Cramer-Rao bound, and asymptotic normality into a coherent framework that remains the textbook standard.

, ,

Score function

The gradient of the log-density in the parameter, $s(\theta; y) = \nabla_\theta \log f_\theta(y)$ . Its expectation is zero and its variance equals the Fisher information.

Fisher information

The variance of the score, $J(\theta) = \operatorname{\text{Var}}_\theta(s(\theta; Y))$ , equivalently $-\mathbb{E}[\partial^2 \log f_\theta(Y)/\partial\theta^2]$ under regularity. It lower-bounds the variance of unbiased estimators (Cramer-Rao).

Maximum likelihood estimator

The parameter value that maximizes the likelihood of the observed data, $g_{\text{ml}}(\mathbf{y}) = \arg\max_\theta f_\theta(\mathbf{y})$ . Under regularity it is consistent, asymptotically normal, asymptotically efficient, and invariant under reparameterization.

Asymptotic efficiency

Property of an estimator whose asymptotic variance equals the Cramer-Rao lower bound. The MLE is asymptotically efficient in regular models.

Cramer-Rao lower bound (CRLB)

Universal lower bound on the variance of any unbiased estimator, $\operatorname{\text{Var}}(\hat\theta) \geq 1/J(\theta)$ . Developed in Chapter 5; in this chapter we show the MLE attains it asymptotically.

Periodogram

$|\hat Y(f)|^2$ , the squared DFT magnitude of a sequence at frequency $f$ . Its peak is the MLE of the frequency of a single complex sinusoid in AWGN.

Related: Maximum likelihood estimator

Matched filter

Correlation of the received signal with a time-reversed conjugate of the transmitted pulse. It maximizes output SNR and is the MLE for delay estimation in AWGN.

Related: Maximum likelihood estimator

Quick Check

The peak of the periodogram is the MLE of the sinusoidal frequency under which assumption?

Observations are i.i.d. Laplace noise around a sinusoid.

Observations are a sinusoid in additive complex Gaussian noise.

Frequency is on the DFT grid.

Sample size is a power of two.

Correction:

Observations are a sinusoid in additive complex Gaussian noise.

The squared-residual log-likelihood reduces to maximizing $|\hat Y(f)|^2$ .

MLE for Signal Processing

Why Signal Processing Is ML in Disguise

Definition: Deterministic Signal in Complex AWGN

Theorem: Matched Filter = MLE for Delay Estimation

Expand the squared error

Drop $\tau$-independent terms

Identify the matched filter

Theorem: Periodogram = MLE for Frequency of a Single Sinusoid

Concentrate the amplitude

Profile log-likelihood

Computation via FFT

Example: Phase Estimation at Known Frequency

Least squares

Closed form via arctangent

Asymptotic variance

Periodogram MLE for Sinusoidal Frequency

Parameters

DOA Spectrum: MLE vs Beamforming

Parameters

DOA Estimation: the MLE and Its Alternatives

DOA Estimators: MLE, MUSIC, ESPRIT

Why This Matters: Timing and Carrier Synchronization in Wireless Receivers

Grid Resolution for Periodogram and DOA MLE

Historical Note: Fisher's 1922 Invention of Maximum Likelihood

Historical Note: Wilks, Wald, and Large-Sample Theory

Score function

Fisher information

Maximum likelihood estimator

Asymptotic efficiency

Cramer-Rao lower bound (CRLB)

Periodogram

Matched filter

Quick Check

Definition:
Deterministic Signal in Complex AWGN