MLE for Signal Processing
Why Signal Processing Is ML in Disguise
Some of the most widely deployed signal-processing procedures β the periodogram, the matched filter, the DOA spectrum β were not originally derived as maximum likelihood estimators, but they are MLEs under standard AWGN assumptions. Recognizing the ML structure explains why these algorithms are near-optimal, why they achieve the CRLB asymptotically, and when they can be improved by subspace methods (MUSIC, ESPRIT) that exploit additional structure. This section walks through the three canonical signal-processing MLEs and names the parameter each one estimates.
Definition: Deterministic Signal in Complex AWGN
Deterministic Signal in Complex AWGN
Let be a known deterministic signal depending on an unknown parameter . Observe with . The log-likelihood (discarding constants independent of ) is Maximizing over is equivalent to the least squares problem .
Theorem: Matched Filter = MLE for Delay Estimation
Let for a known pulse and unknown delay . Under the AWGN model above, the MLE of is the peak of the matched-filter output.
Minimizing becomes, for a delay-invariant pulse energy, maximization of . Correlating against the time-reversed conjugate pulse is exactly matched filtering.
Expand the squared error
.
Drop $\tau$-independent terms
is a constant. For a pulse with compact support contained in , the pulse energy is independent of up to edge effects.
Identify the matched filter
Maximizing the likelihood reduces to maximizing , the real part of the cross-correlation. This is the matched filter output sampled at delay .
Theorem: Periodogram = MLE for Frequency of a Single Sinusoid
Let , , with , , , and unknown. The MLE of is the peak of the periodogram: Once is found, the MLEs of and are .
Profile-likelihood: for any candidate frequency , the best-fitting complex amplitude is the DFT coefficient at . The residual error is minimized when the DFT magnitude at is maximized. The DFT magnitude squared is precisely the periodogram.
Concentrate the amplitude
For fixed , let . The least-squares problem is linear in with solution .
Profile log-likelihood
Substituting into , the residual energy is . Minimizing the residual over is the same as maximizing the periodogram .
Computation via FFT
On the DFT grid , is computed in via the FFT. For off-grid refinement, one performs a local quadratic interpolation or a Newton step.
Example: Phase Estimation at Known Frequency
Given with and known, find the MLE of and its asymptotic variance.
Least squares
The log-likelihood is proportional to . Expand using .
Closed form via arctangent
$ The arctangent ratio takes the in-phase and quadrature correlations of the data with the known tone.
Asymptotic variance
The Fisher information for (at large ) is , giving asymptotic variance . The MLE achieves the CRLB asymptotically. In terms with , .
Periodogram MLE for Sinusoidal Frequency
Simulate a noisy complex sinusoid with unknown frequency and display the periodogram . The peak of the periodogram is the MLE . As increases, the peak sharpens and the estimator variance shrinks toward the CRLB.
Parameters
DOA Spectrum: MLE vs Beamforming
For a ULA of elements observing a single source from angle , plot the MLE cost function (likelihood-based) and the conventional beamforming spectrum as functions of hypothesized angle. The MLE is the peak of the matched-beam response; resolution improves with and .
Parameters
DOA Estimation: the MLE and Its Alternatives
For a narrowband ULA with elements and snapshots, the observation is , where is the steering vector and . The MLE for a single source is For multiple sources the MLE becomes a high-dimensional non-convex optimization β in practice one uses MUSIC (eigenstructure of the sample covariance) or ESPRIT (rotational invariance of the array) as computationally efficient alternatives that match MLE performance in the high-, many-snapshot regime.
DOA Estimators: MLE, MUSIC, ESPRIT
| Method | Search Type | Sources | Asymptotic Optimality | Cost |
|---|---|---|---|---|
| Conventional beamforming | Grid over | Arbitrary | Rayleigh-limited resolution | per snapshot |
| MLE | Nonlinear optimization | Any (known ) | Achieves CRLB | Expensive, non-convex |
| MUSIC | 1-D spectral search | Consistent, asymptotic CRLB | Eigendecomposition | |
| ESPRIT | Closed-form eigenvalues | Consistent, asymptotic CRLB | Two eigendecompositions |
Why This Matters: Timing and Carrier Synchronization in Wireless Receivers
Every digital receiver contains ML estimators: the timing recovery loop is a delay MLE running on a known preamble (matched filter peak detection); the carrier-frequency offset estimator is a frequency MLE (periodogram-based or autocorrelation-based); the channel estimator is an amplitude MLE on pilot subcarriers. Subspace DOA methods like MUSIC are used in mmWave beam-management and angle-domain channel estimation for massive MIMO. Understanding these blocks as MLEs lets you reason about their CRLB-limited accuracy directly from the waveform design.
Grid Resolution for Periodogram and DOA MLE
DFT-based periodogram MLE has grid spacing . The true frequency rarely lies on the grid, so the naive peak underestimates the true location with bias and variance floored by the grid resolution. Standard fixes: zero-pad to a longer FFT for finer sampling, use quadratic interpolation on the three largest bins, or follow the coarse peak with a Newton step. The same applies to angular grids in DOA: a grid is too coarse for large apertures at high SNR.
- β’
Zero-padding factor typically 4-8x for off-grid accuracy
- β’
Quadratic interpolation correction is extra work
- β’
Newton refinement recovers the full CRLB asymptotically
Historical Note: Fisher's 1922 Invention of Maximum Likelihood
1912-1925R. A. Fisher introduced the term likelihood and the principle of maximum likelihood in a 1922 paper, sharply distinguishing it from Bayesian posterior inference. He argued that for frequentist inference without prior information, the data themselves determine which parameter value is most "plausible" β a novel foundational stance. Fisher also introduced the information quantity now called Fisher information and identified the asymptotic efficiency of the MLE. Much of the modern theory of parameter estimation builds directly on Fisher's framework.
Historical Note: Wilks, Wald, and Large-Sample Theory
1938-1949Fisher's asymptotic claims were made rigorous by S. S. Wilks (1938, who proved the chi-squared limit of the log-likelihood ratio) and A. Wald (1949, who gave the first complete proof of MLE consistency under compactness and continuity assumptions). H. Cramer's 1946 book assembled the modern treatment of MLE, the Cramer-Rao bound, and asymptotic normality into a coherent framework that remains the textbook standard.
Score function
The gradient of the log-density in the parameter, . Its expectation is zero and its variance equals the Fisher information.
Fisher information
The variance of the score, , equivalently under regularity. It lower-bounds the variance of unbiased estimators (Cramer-Rao).
Related: Score function, Cramer-Rao lower bound (CRLB)
Maximum likelihood estimator
The parameter value that maximizes the likelihood of the observed data, . Under regularity it is consistent, asymptotically normal, asymptotically efficient, and invariant under reparameterization.
Related: Score function, Asymptotic efficiency
Asymptotic efficiency
Property of an estimator whose asymptotic variance equals the Cramer-Rao lower bound. The MLE is asymptotically efficient in regular models.
Related: Maximum likelihood estimator, Cramer-Rao lower bound (CRLB)
Cramer-Rao lower bound (CRLB)
Universal lower bound on the variance of any unbiased estimator, . Developed in Chapter 5; in this chapter we show the MLE attains it asymptotically.
Periodogram
, the squared DFT magnitude of a sequence at frequency . Its peak is the MLE of the frequency of a single complex sinusoid in AWGN.
Related: Maximum likelihood estimator
Matched filter
Correlation of the received signal with a time-reversed conjugate of the transmitted pulse. It maximizes output SNR and is the MLE for delay estimation in AWGN.
Related: Maximum likelihood estimator
Quick Check
The peak of the periodogram is the MLE of the sinusoidal frequency under which assumption?
Observations are i.i.d. Laplace noise around a sinusoid.
Observations are a sinusoid in additive complex Gaussian noise.
Frequency is on the DFT grid.
Sample size is a power of two.
The squared-residual log-likelihood reduces to maximizing .