MLE for Signal Processing

Why Signal Processing Is ML in Disguise

Some of the most widely deployed signal-processing procedures β€” the periodogram, the matched filter, the DOA spectrum β€” were not originally derived as maximum likelihood estimators, but they are MLEs under standard AWGN assumptions. Recognizing the ML structure explains why these algorithms are near-optimal, why they achieve the CRLB asymptotically, and when they can be improved by subspace methods (MUSIC, ESPRIT) that exploit additional structure. This section walks through the three canonical signal-processing MLEs and names the parameter each one estimates.

,

Definition:

Deterministic Signal in Complex AWGN

Let s(ΞΈ)[n]s(\boldsymbol{\theta})[n] be a known deterministic signal depending on an unknown parameter ΞΈ\boldsymbol{\theta}. Observe Y[n]β€…β€Š=β€…β€Šs(ΞΈ)[n]β€…β€Š+β€…β€ŠW[n],n=0,…,Nβˆ’1,Y[n] \;=\; s(\boldsymbol{\theta})[n] \;+\; W[n], \qquad n = 0, \ldots, N-1, with W[n]∼i.i.d.CN(0,Οƒ2)W[n] \sim_{\text{i.i.d.}} \mathcal{CN}(0, \sigma^2). The log-likelihood (discarding constants independent of ΞΈ\boldsymbol{\theta}) is β„“(ΞΈ)β€…β€Š=β€…β€Šβˆ’1Οƒ2βˆ‘n=0Nβˆ’1∣Y[n]βˆ’s(ΞΈ)[n]∣2.\ell(\boldsymbol{\theta}) \;=\; -\frac{1}{\sigma^2} \sum_{n=0}^{N-1} \bigl|Y[n] - s(\boldsymbol{\theta})[n]\bigr|^2. Maximizing β„“\ell over ΞΈ\boldsymbol{\theta} is equivalent to the least squares problem min⁑θβˆ₯Yβˆ’s(ΞΈ)βˆ₯2\min_{\boldsymbol{\theta}} \|Y - s(\boldsymbol{\theta})\|^2.

Theorem: Matched Filter = MLE for Delay Estimation

Let s[n]=p[nβˆ’Ο„]s[n] = p[n - \tau] for a known pulse pp and unknown delay Ο„\tau. Under the AWGN model above, the MLE of Ο„\tau is Ο„^mlβ€…β€Š=β€…β€Šarg⁑maxβ‘Ο„β€…β€ŠRe⁑ ⁣{βˆ‘n=0Nβˆ’1Y[n] pβˆ—[nβˆ’Ο„]},\hat\tau_{\text{ml}} \;=\; \arg\max_{\tau}\; \operatorname{Re}\!\left\{\sum_{n=0}^{N-1} Y[n]\, p^*[n - \tau]\right\}, the peak of the matched-filter output.

Minimizing βˆ₯Yβˆ’pΟ„βˆ₯2=βˆ₯Yβˆ₯2βˆ’2Re⁑⟨Y,pΟ„βŸ©+βˆ₯pΟ„βˆ₯2\|Y - p_\tau\|^2 = \|Y\|^2 - 2\operatorname{Re}\langle Y, p_\tau\rangle + \|p_\tau\|^2 becomes, for a delay-invariant pulse energy, maximization of Re⁑⟨Y,pΟ„βŸ©\operatorname{Re}\langle Y, p_\tau\rangle. Correlating YY against the time-reversed conjugate pulse is exactly matched filtering.

,

Theorem: Periodogram = MLE for Frequency of a Single Sinusoid

Let Y[n]=Aej(2Ο€f0n+Ο•)+W[n]Y[n] = A e^{j(2\pi f_0 n + \phi)} + W[n], n=0,…,Nβˆ’1n = 0, \ldots, N-1, with W[n]∼CN(0,Οƒ2)W[n] \sim \mathcal{CN}(0, \sigma^2), A>0A > 0, Ο•\phi, and f0f_0 unknown. The MLE of f0f_0 is the peak of the periodogram: f^0,mlβ€…β€Š=β€…β€Šarg⁑max⁑fβ€…β€Šβˆ£Y^(f)∣2,Y^(f)β€…β€Š=β€…β€Šβˆ‘n=0Nβˆ’1Y[n]eβˆ’j2Ο€fn.\hat f_{0,\text{ml}} \;=\; \arg\max_{f}\; \bigl|\hat Y(f)\bigr|^2, \qquad \hat Y(f) \;=\; \sum_{n=0}^{N-1} Y[n] e^{-j 2\pi f n}. Once f^0,ml\hat f_{0,\text{ml}} is found, the MLEs of AA and Ο•\phi are A^ejΟ•^=Y^(f^0,ml)/N\hat A e^{j\hat\phi} = \hat Y(\hat f_{0,\text{ml}})/N.

Profile-likelihood: for any candidate frequency ff, the best-fitting complex amplitude is the DFT coefficient at ff. The residual error is minimized when the DFT magnitude at ff is maximized. The DFT magnitude squared is precisely the periodogram.

,

Example: Phase Estimation at Known Frequency

Given Yi=Acos⁑(2Ο€f0i+Ο•)+ZiY_i = A\cos(2\pi f_0 i + \phi) + Z_i with Zi∼N(0,Οƒ2)Z_i \sim \mathcal{N}(0, \sigma^2) and A,f0A, f_0 known, find the MLE of Ο•\phi and its asymptotic variance.

,

Periodogram MLE for Sinusoidal Frequency

Simulate a noisy complex sinusoid with unknown frequency f0f_0 and display the periodogram ∣Y^(f)∣2/N|\hat Y(f)|^2/N. The peak of the periodogram is the MLE f^0\hat f_0. As SNR\text{SNR} increases, the peak sharpens and the estimator variance shrinks toward the CRLB.

Parameters
0.17
128
10

DOA Spectrum: MLE vs Beamforming

For a ULA of MM elements observing a single source from angle ΞΈ0\theta_0, plot the MLE cost function (likelihood-based) and the conventional beamforming spectrum as functions of hypothesized angle. The MLE is the peak of the matched-beam response; resolution improves with MM and SNR\text{SNR}.

Parameters
20
16
20
10

DOA Estimation: the MLE and Its Alternatives

For a narrowband ULA with MM elements and LL snapshots, the observation is y[β„“]=a(ΞΈ0)s[β„“]+w[β„“]\mathbf{y}[\ell] = \mathbf{a}(\theta_0) s[\ell] + \mathbf{w}[\ell], where a(ΞΈ)=[1,ejΞΊdsin⁑θ,…,ej(Mβˆ’1)ΞΊdsin⁑θ]T\mathbf{a}(\theta) = [1, e^{j\kappa d \sin\theta}, \ldots, e^{j(M-1)\kappa d \sin\theta}]^\mathsf{T} is the steering vector and ΞΊ=2Ο€/Ξ»\kappa = 2\pi/\lambda. The MLE for a single source is ΞΈ^mlβ€…β€Š=β€…β€Šarg⁑maxβ‘ΞΈβ€…β€Š1Lβˆ‘β„“=1L∣a(ΞΈ)Hy[β„“]∣2βˆ₯a(ΞΈ)βˆ₯2.\hat\theta_{\text{ml}} \;=\; \arg\max_{\theta}\; \frac{1}{L}\sum_{\ell=1}^L \frac{|\mathbf{a}(\theta)^H \mathbf{y}[\ell]|^2}{\|\mathbf{a}(\theta)\|^2}. For multiple sources the MLE becomes a high-dimensional non-convex optimization β€” in practice one uses MUSIC (eigenstructure of the sample covariance) or ESPRIT (rotational invariance of the array) as computationally efficient alternatives that match MLE performance in the high-SNR\text{SNR}, many-snapshot regime.

, ,

DOA Estimators: MLE, MUSIC, ESPRIT

MethodSearch TypeSourcesAsymptotic OptimalityCost
Conventional beamformingGrid over ΞΈ\thetaArbitraryRayleigh-limited resolutionO(MG)O(MG) per snapshot
MLENonlinear optimizationAny (known KK)Achieves CRLBExpensive, non-convex
MUSIC1-D spectral searchK<MK < MConsistent, asymptotic CRLBEigendecomposition O(M3)O(M^3)
ESPRITClosed-form eigenvaluesK<MK < MConsistent, asymptotic CRLBTwo eigendecompositions

Why This Matters: Timing and Carrier Synchronization in Wireless Receivers

Every digital receiver contains ML estimators: the timing recovery loop is a delay MLE running on a known preamble (matched filter peak detection); the carrier-frequency offset estimator is a frequency MLE (periodogram-based or autocorrelation-based); the channel estimator is an amplitude MLE on pilot subcarriers. Subspace DOA methods like MUSIC are used in mmWave beam-management and angle-domain channel estimation for massive MIMO. Understanding these blocks as MLEs lets you reason about their CRLB-limited accuracy directly from the waveform design.

⚠️Engineering Note

Grid Resolution for Periodogram and DOA MLE

DFT-based periodogram MLE has grid spacing Ξ”f=1/N\Delta f = 1/N. The true frequency rarely lies on the grid, so the naive peak underestimates the true location with bias O(1/N)O(1/N) and variance floored by the grid resolution. Standard fixes: zero-pad to a longer FFT for finer sampling, use quadratic interpolation on the three largest bins, or follow the coarse peak with a Newton step. The same applies to angular grids in DOA: a 1Β°1Β° grid is too coarse for large apertures at high SNR.

Practical Constraints
  • β€’

    Zero-padding factor typically 4-8x for off-grid accuracy

  • β€’

    Quadratic interpolation correction is O(1)O(1) extra work

  • β€’

    Newton refinement recovers the full CRLB asymptotically

πŸ“‹ Ref: 3GPP TS 38.211 / 38.212 (5G NR synchronization signals)

Historical Note: Fisher's 1922 Invention of Maximum Likelihood

1912-1925

R. A. Fisher introduced the term likelihood and the principle of maximum likelihood in a 1922 paper, sharply distinguishing it from Bayesian posterior inference. He argued that for frequentist inference without prior information, the data themselves determine which parameter value is most "plausible" β€” a novel foundational stance. Fisher also introduced the information quantity now called Fisher information and identified the asymptotic efficiency of the MLE. Much of the modern theory of parameter estimation builds directly on Fisher's framework.

Historical Note: Wilks, Wald, and Large-Sample Theory

1938-1949

Fisher's asymptotic claims were made rigorous by S. S. Wilks (1938, who proved the chi-squared limit of the log-likelihood ratio) and A. Wald (1949, who gave the first complete proof of MLE consistency under compactness and continuity assumptions). H. Cramer's 1946 book assembled the modern treatment of MLE, the Cramer-Rao bound, and asymptotic normality into a coherent framework that remains the textbook standard.

, ,

Score function

The gradient of the log-density in the parameter, s(ΞΈ;y)=βˆ‡ΞΈlog⁑fΞΈ(y)s(\theta; y) = \nabla_\theta \log f_\theta(y). Its expectation is zero and its variance equals the Fisher information.

Related: Fisher information, Maximum likelihood estimator

Fisher information

The variance of the score, J(ΞΈ)=Var⁑θ(s(ΞΈ;Y))J(\theta) = \operatorname{\text{Var}}_\theta(s(\theta; Y)), equivalently βˆ’E[βˆ‚2log⁑fΞΈ(Y)/βˆ‚ΞΈ2]-\mathbb{E}[\partial^2 \log f_\theta(Y)/\partial\theta^2] under regularity. It lower-bounds the variance of unbiased estimators (Cramer-Rao).

Related: Score function, Cramer-Rao lower bound (CRLB)

Maximum likelihood estimator

The parameter value that maximizes the likelihood of the observed data, gml(y)=arg⁑max⁑θfθ(y)g_{\text{ml}}(\mathbf{y}) = \arg\max_\theta f_\theta(\mathbf{y}). Under regularity it is consistent, asymptotically normal, asymptotically efficient, and invariant under reparameterization.

Related: Score function, Asymptotic efficiency

Asymptotic efficiency

Property of an estimator whose asymptotic variance equals the Cramer-Rao lower bound. The MLE is asymptotically efficient in regular models.

Related: Maximum likelihood estimator, Cramer-Rao lower bound (CRLB)

Cramer-Rao lower bound (CRLB)

Universal lower bound on the variance of any unbiased estimator, Var⁑(ΞΈ^)β‰₯1/J(ΞΈ)\operatorname{\text{Var}}(\hat\theta) \geq 1/J(\theta). Developed in Chapter 5; in this chapter we show the MLE attains it asymptotically.

Related: Fisher information, Maximum likelihood estimator

Periodogram

∣Y^(f)∣2|\hat Y(f)|^2, the squared DFT magnitude of a sequence at frequency ff. Its peak is the MLE of the frequency of a single complex sinusoid in AWGN.

Related: Maximum likelihood estimator

Matched filter

Correlation of the received signal with a time-reversed conjugate of the transmitted pulse. It maximizes output SNR and is the MLE for delay estimation in AWGN.

Related: Maximum likelihood estimator

Quick Check

The peak of the periodogram is the MLE of the sinusoidal frequency under which assumption?

Observations are i.i.d. Laplace noise around a sinusoid.

Observations are a sinusoid in additive complex Gaussian noise.

Frequency is on the DFT grid.

Sample size is a power of two.