The ISI Channel as an Inference Problem
Why Inference? Why Now?
Every chapter of this book has been about extracting a hidden quantity from noisy observations. We have estimated parameters, reconstructed signals, decided between hypotheses β all under an assumption that the channel between source and sensor was benign, or at most a scaling. Reality is rarely so kind. When symbols are transmitted faster than the channel's coherence bandwidth allows, each received sample is a linear combination of several past symbols plus noise. The operational question is simple: what did the transmitter actually send?
This is the equalization problem, and we will approach it exactly as we approached estimation and detection in earlier chapters β by writing down the likelihood and asking what the optimal inference rule does. The pattern will feel familiar. The twist is that the unknown is no longer a parameter but a sequence, and sequences live in a space whose size grows exponentially with length. That growth is the central computational tension of the chapter: the ML rule is well defined, but evaluating it naively is hopeless. The art lies in exploiting structure β the finite memory of the channel β to reduce an exponential search to one that is merely linear in sequence length.
Definition: Discrete-Time ISI Channel
Discrete-Time ISI Channel
Let be a sequence of transmitted symbols drawn from a finite alphabet with and . Let be a causal FIR channel of memory , meaning for and . The discrete-time ISI channel produces
where is circularly symmetric complex Gaussian noise, , independent across .
The channel taps are assumed known at the receiver (having been estimated from pilot symbols in an earlier stage). The unknowns are the transmitted symbols .
The terminology "intersymbol interference" reflects what happens when : the sample depends not only on but on up to past symbols, which interfere with one another at the receiver. When the channel collapses to a memoryless AWGN channel and equalization is unnecessary.
Definition: Maximum-Likelihood Sequence Estimate (MLSE)
Maximum-Likelihood Sequence Estimate (MLSE)
Given the received block and known channel , the maximum-likelihood sequence estimate is
Under the white Gaussian noise model of DDiscrete-Time ISI Channel, the likelihood reduces to a squared-error criterion:
We refer to this rule as maximum-likelihood sequence estimation, or MLSE.
The MLSE rule is a statement, not an algorithm. It specifies what answer we want β the sequence of symbols that best explains the observations β without saying how to find it. The set contains candidates; for a modest block of BPSK symbols, that is sequences. The problem is not to define optimality but to compute it.
Definition: Channel State
Channel State
The channel state at time is the vector of the most recent past symbols,
where has cardinality .
The state summarizes everything about past transmissions that is relevant to the next observation : given and the current input , the noise-free output is completely determined.
Theorem: Markov Structure of the ISI Likelihood
For the channel of DDiscrete-Time ISI Channel, assume the initial state is known (for instance through a preamble of zeros). Then the log-likelihood of a candidate sequence factorizes over transitions of the state process:
where the branch metric is
and the symbols are read off from the pair together with the input .
The ISI channel is finite-memory, so knowing the state at time and the current input determines both the next state and the noise-free output. The likelihood therefore decomposes into per-transition terms β a sum of local costs along a path through the state graph. This is the same structure one sees in hidden Markov models and in the Viterbi decoder for convolutional codes, and it is precisely the structure that makes dynamic programming the right tool.
Observation model
Under the Gaussian noise assumption, the observations are conditionally independent given the input sequence:
Each conditional density is circularly symmetric complex Gaussian with mean and variance .
Log-likelihood expansion
Taking logarithms,
where the constant collects normalization factors independent of .
Localization via the state
The mean depends only on , which is exactly the pair . Equivalently, knowing and determines (as the new symbol that has just entered the state register) and hence determines completely.
Rewrite as a path metric
Define , which depends only on the transition . Then
Maximizing the log-likelihood over is therefore equivalent to finding the sequence of states that minimizes the cumulative branch-metric sum along a path in the state graph.
Key Takeaway
The global optimum of MLSE decomposes into a sum of local costs along paths through a finite-state graph. Finite memory begets dynamic programming β this is the structural observation on which the Viterbi algorithm rests.
Example: Branch Metrics for a Two-Tap Channel with BPSK
Consider a two-tap channel with BPSK modulation (so , , ). The initial state is . Suppose the receiver observes and .
Enumerate all state transitions, compute the corresponding branch metrics, and identify the two-symbol sequence that is most likely to have been transmitted.
Enumerate state transitions
The state records the previous symbol. From any state there are two transitions, one for each value of the new symbol . The noise-free output is .
| (= ) | ||||
|---|---|---|---|---|
| 0 | ||||
| 0 |
Branch metrics at $k=0$
With ,
The surviving path to state has cumulative metric ; the surviving path to has cumulative metric .
Branch metrics at $k=1$
From each of the two possible states , there are again two outgoing transitions. The noise-free outputs and metrics (with ) are
Sum cumulative metrics and take the minimum
The four candidate paths have total metrics
The minimum is , achieved by . The observations are most consistent with first transmitting (giving , measured as ) and then (giving , measured as ).
Operational reading
Even for this tiny example, enumerating paths grows as . The next section introduces the Viterbi algorithm, which performs the same minimization in operations by discarding dominated partial paths at each step.
Output SNR of ZF vs MMSE as Channel Null Deepens
A preview of the tension that motivates MLSE: when the channel develops a spectral null as , the zero-forcing equalizer's output SNR collapses while the MMSE equalizer gracefully trades bias for variance. MLSE avoids inversion entirely and suffers no null-depth penalty β at the price of exponential complexity.
Parameters
The Known-Channel Assumption
Every equalizer in this chapter presumes that has been estimated elsewhere β typically from a pilot or training sequence inserted at the start of each packet. The estimate is not the true ; its error feeds directly into every subsequent computation. In practice one budgets SNR for pilots, designs pilots to have favorable autocorrelation (pseudonoise, ZadoffβChu, or Golay sequences), and re-estimates the channel at the block rate dictated by the coherence time. Treating channel estimation and equalization as separate subsystems is the dominant paradigm in standards such as LTE and 5G NR, but it is not optimal β joint estimation and detection can do better when pilots are sparse.
- β’
Estimate once per coherence interval
- β’
Pilot overhead scales with channel memory and mobility
Common Mistake: MLSE Complexity Is Exponential in Memory, Not Block Length
Mistake:
Students often assume that because MLSE searches over , its complexity must scale as β and therefore conclude that MLSE is simply infeasible for realistic block sizes.
Correction:
The naive enumeration has complexity , but the Viterbi algorithm exploits the Markov structure of the likelihood (theorem TMarkov Structure of the ISI Likelihood) and achieves complexity . Complexity is linear in block length and exponential in channel memory, not block length. For short-memory channels () MLSE is eminently practical; for long wireless channels with , it is not.
Historical Note: Forney Recasts Viterbi for ISI (1972)
1970sThe Viterbi algorithm was introduced in 1967 for decoding convolutional codes. Its applicability to equalization was not immediate; it was G. David Forney Jr. who, in a landmark 1972 paper in the IEEE Transactions on Information Theory, recognized that an ISI channel is structurally identical to a convolutional code with memory β and that Viterbi decoding therefore yields the maximum-likelihood symbol sequence estimate. Forney's paper did more than transfer an algorithm; it established the unified view that channels with memory, convolutional codes, and Markov sources all share a single inference framework. That view is the reason this book places equalization and detection in the same chapter of an inference textbook.
Three Views of the Same Channel
| Quantity | Estimation view | Detection view | Equalization view |
|---|---|---|---|
| Unknown | Continuous parameter | Discrete hypothesis | Discrete sequence |
| Objective | Minimize | Minimize | Minimize sequence error probability |
| ML rule complexity | Closed form (often) | Single comparison | Exponential in (naive) |
| Structural exploit | Linearity, Gaussianity | Sufficient statistic | Finite memory β state machine |
| Canonical algorithm | LMMSE, Wiener filter | Likelihood ratio test | Viterbi / dynamic programming |
Intersymbol interference (ISI)
The phenomenon in which a received sample depends on multiple past transmitted symbols, arising from any channel whose impulse response extends beyond one symbol period. ISI is the signal-processing shadow of bandwidth limitation: the more tightly a symbol's energy is packed in time, the more it spills into adjacent symbol periods through the channel.
Related: Maximum-likelihood sequence estimation (MLSE), Channel memory
Maximum-likelihood sequence estimation (MLSE)
The rule that returns the transmitted sequence maximizing the conditional density of the received block under a known channel model. For Gaussian noise, MLSE reduces to a minimum-distance search over sequences. Implemented efficiently via the Viterbi algorithm when the channel has finite memory.
Channel memory
The integer such that the impulse response vanishes for . A channel with memory causes each output sample to depend on consecutive input symbols. The size of the MLSE trellis, , is exponential in this quantity.
Related: Intersymbol interference (ISI)