The ISI Channel as an Inference Problem

Why Inference? Why Now?

Every chapter of this book has been about extracting a hidden quantity from noisy observations. We have estimated parameters, reconstructed signals, decided between hypotheses β€” all under an assumption that the channel between source and sensor was benign, or at most a scaling. Reality is rarely so kind. When symbols are transmitted faster than the channel's coherence bandwidth allows, each received sample is a linear combination of several past symbols plus noise. The operational question is simple: what did the transmitter actually send?

This is the equalization problem, and we will approach it exactly as we approached estimation and detection in earlier chapters β€” by writing down the likelihood and asking what the optimal inference rule does. The pattern will feel familiar. The twist is that the unknown is no longer a parameter but a sequence, and sequences live in a space whose size grows exponentially with length. That growth is the central computational tension of the chapter: the ML rule is well defined, but evaluating it naively is hopeless. The art lies in exploiting structure β€” the finite memory of the channel β€” to reduce an exponential search to one that is merely linear in sequence length.

Definition:

Discrete-Time ISI Channel

Let {x[n]}\{x[n]\} be a sequence of transmitted symbols drawn from a finite alphabet AβŠ‚C\mathcal{A} \subset \mathbb{C} with M=∣A∣M = |\mathcal{A}| and E[∣x[n]∣2]=Es\mathbb{E}[|x[n]|^2] = E_s. Let {h[k]}k=0L\{h[k]\}_{k=0}^{L} be a causal FIR channel of memory LL, meaning h[k]=0h[k] = 0 for k<0k < 0 and k>Lk > L. The discrete-time ISI channel produces

y[n]β€…β€Š=β€…β€Šβˆ‘k=0Lh[k] x[nβˆ’k]β€…β€Š+β€…β€Šw[n],n=0,1,…,Tβˆ’1,y[n] \;=\; \sum_{k=0}^{L} h[k]\, x[n-k] \;+\; w[n], \qquad n = 0, 1, \ldots, T-1,

where {w[n]}\{w[n]\} is circularly symmetric complex Gaussian noise, w[n]∼CN(0,N0)w[n] \sim \mathcal{CN}(0, N_0), independent across nn.

The channel taps h=[h[0],…,h[L]]T\mathbf{h} = [h[0], \ldots, h[L]]^T are assumed known at the receiver (having been estimated from pilot symbols in an earlier stage). The unknowns are the transmitted symbols x=[x[0],…,x[Tβˆ’1]]T∈AT\mathbf{x} = [x[0], \ldots, x[T-1]]^T \in \mathcal{A}^T.

The terminology "intersymbol interference" reflects what happens when Lβ‰₯1L \geq 1: the sample y[n]y[n] depends not only on x[n]x[n] but on up to LL past symbols, which interfere with one another at the receiver. When L=0L = 0 the channel collapses to a memoryless AWGN channel and equalization is unnecessary.

Definition:

Maximum-Likelihood Sequence Estimate (MLSE)

Given the received block y=[y[0],…,y[Tβˆ’1]]T\mathbf{y} = [y[0], \ldots, y[T-1]]^T and known channel h\mathbf{h}, the maximum-likelihood sequence estimate is

x^MLβ€…β€Š=β€…β€Šarg⁑max⁑x∈ATβ€…β€ŠfY∣X(y∣x).\hat{\mathbf{x}}_{\text{ML}} \;=\; \arg\max_{\mathbf{x} \in \mathcal{A}^T}\; f_{\mathbf{Y}|\mathbf{X}}(\mathbf{y}|\mathbf{x}).

Under the white Gaussian noise model of DDiscrete-Time ISI Channel, the likelihood reduces to a squared-error criterion:

x^MLβ€…β€Š=β€…β€Šarg⁑min⁑x∈ATβ€…β€Šβˆ‘n=0Tβˆ’1∣y[n]βˆ’βˆ‘k=0Lh[k] x[nβˆ’k]∣2.\hat{\mathbf{x}}_{\text{ML}} \;=\; \arg\min_{\mathbf{x} \in \mathcal{A}^T}\; \sum_{n=0}^{T-1} \left| y[n] - \sum_{k=0}^{L} h[k]\, x[n-k] \right|^2.

We refer to this rule as maximum-likelihood sequence estimation, or MLSE.

The MLSE rule is a statement, not an algorithm. It specifies what answer we want β€” the sequence of symbols that best explains the observations β€” without saying how to find it. The set AT\mathcal{A}^T contains MTM^T candidates; for a modest block of T=100T = 100 BPSK symbols, that is 2100β‰ˆ10302^{100} \approx 10^{30} sequences. The problem is not to define optimality but to compute it.

Definition:

Channel State

The channel state at time kk is the vector of the LL most recent past symbols,

skβ€…β€Š=β€…β€Š(x[kβˆ’1],x[kβˆ’2],…,x[kβˆ’L])β€…β€Šβˆˆβ€…β€ŠS,s_k \;=\; (x[k-1], x[k-2], \ldots, x[k-L]) \;\in\; \mathcal{S},

where S=AL\mathcal{S} = \mathcal{A}^L has cardinality ∣S∣=ML|\mathcal{S}| = M^L.

The state sks_k summarizes everything about past transmissions that is relevant to the next observation y[k]y[k]: given sks_k and the current input x[k]x[k], the noise-free output βˆ‘β„“=0Lh[β„“]x[kβˆ’β„“]\sum_{\ell=0}^{L} h[\ell] x[k-\ell] is completely determined.

Theorem: Markov Structure of the ISI Likelihood

For the channel of DDiscrete-Time ISI Channel, assume the initial state s0s_0 is known (for instance through a preamble of zeros). Then the log-likelihood of a candidate sequence x∈AT\mathbf{x} \in \mathcal{A}^T factorizes over transitions (sk,sk+1)(s_k, s_{k+1}) of the state process:

ln⁑fY∣X(y∣x)β€…β€Š=β€…β€Šβˆ’1N0βˆ‘k=0Tβˆ’1Ξ³k(sk,sk+1)β€…β€Š+β€…β€Šconst,\ln f_{\mathbf{Y}|\mathbf{X}}(\mathbf{y}|\mathbf{x}) \;=\; -\frac{1}{N_0} \sum_{k=0}^{T-1} \gamma_k(s_k, s_{k+1}) \;+\; \text{const},

where the branch metric is

Ξ³k(sk,sk+1)β€…β€Š=β€…β€Šβˆ£y[k]βˆ’βˆ‘β„“=0Lh[β„“]x[kβˆ’β„“]∣2,\gamma_k(s_k, s_{k+1}) \;=\; \left| y[k] - \sum_{\ell=0}^{L} h[\ell] x[k-\ell] \right|^2,

and the symbols x[k],x[kβˆ’1],…,x[kβˆ’L]x[k], x[k-1], \ldots, x[k-L] are read off from the pair (sk,sk+1)(s_k, s_{k+1}) together with the input x[k]x[k].

The ISI channel is finite-memory, so knowing the state at time kk and the current input determines both the next state and the noise-free output. The likelihood therefore decomposes into per-transition terms β€” a sum of local costs along a path through the state graph. This is the same structure one sees in hidden Markov models and in the Viterbi decoder for convolutional codes, and it is precisely the structure that makes dynamic programming the right tool.

Key Takeaway

The global optimum of MLSE decomposes into a sum of local costs along paths through a finite-state graph. Finite memory begets dynamic programming β€” this is the structural observation on which the Viterbi algorithm rests.

Example: Branch Metrics for a Two-Tap Channel with BPSK

Consider a two-tap channel h=[h[0],h[1]]T=[1,0.5]T\mathbf{h} = [h[0], h[1]]^T = [1, 0.5]^T with BPSK modulation A={βˆ’1,+1}\mathcal{A} = \{-1, +1\} (so M=2M = 2, L=1L = 1, ∣S∣=2|\mathcal{S}| = 2). The initial state is s0=+1s_0 = +1. Suppose the receiver observes y[0]=1.2y[0] = 1.2 and y[1]=βˆ’0.7y[1] = -0.7.

Enumerate all state transitions, compute the corresponding branch metrics, and identify the two-symbol sequence that is most likely to have been transmitted.

Output SNR of ZF vs MMSE as Channel Null Deepens

A preview of the tension that motivates MLSE: when the channel h=[1,βˆ’Ξ±]h = [1, -\alpha] develops a spectral null as Ξ±β†’1\alpha \to 1, the zero-forcing equalizer's output SNR collapses while the MMSE equalizer gracefully trades bias for variance. MLSE avoids inversion entirely and suffers no null-depth penalty β€” at the price of exponential complexity.

Parameters
15
0.95
⚠️Engineering Note

The Known-Channel Assumption

Every equalizer in this chapter presumes that h\mathbf{h} has been estimated elsewhere β€” typically from a pilot or training sequence inserted at the start of each packet. The estimate h^\hat{\mathbf{h}} is not the true h\mathbf{h}; its error feeds directly into every subsequent computation. In practice one budgets SNR for pilots, designs pilots to have favorable autocorrelation (pseudonoise, Zadoff–Chu, or Golay sequences), and re-estimates the channel at the block rate dictated by the coherence time. Treating channel estimation and equalization as separate subsystems is the dominant paradigm in standards such as LTE and 5G NR, but it is not optimal β€” joint estimation and detection can do better when pilots are sparse.

Practical Constraints
  • β€’

    Estimate h^\hat{\mathbf{h}} once per coherence interval

  • β€’

    Pilot overhead scales with channel memory LL and mobility

Common Mistake: MLSE Complexity Is Exponential in Memory, Not Block Length

Mistake:

Students often assume that because MLSE searches over AT\mathcal{A}^T, its complexity must scale as MTM^T β€” and therefore conclude that MLSE is simply infeasible for realistic block sizes.

Correction:

The naive enumeration has complexity MTM^T, but the Viterbi algorithm exploits the Markov structure of the likelihood (theorem TMarkov Structure of the ISI Likelihood) and achieves complexity O(ML+1β‹…T)O(M^{L+1} \cdot T). Complexity is linear in block length and exponential in channel memory, not block length. For short-memory channels (L=1,2,3L = 1, 2, 3) MLSE is eminently practical; for long wireless channels with L∼50L \sim 50, it is not.

Historical Note: Forney Recasts Viterbi for ISI (1972)

1970s

The Viterbi algorithm was introduced in 1967 for decoding convolutional codes. Its applicability to equalization was not immediate; it was G. David Forney Jr. who, in a landmark 1972 paper in the IEEE Transactions on Information Theory, recognized that an ISI channel is structurally identical to a convolutional code with memory LL β€” and that Viterbi decoding therefore yields the maximum-likelihood symbol sequence estimate. Forney's paper did more than transfer an algorithm; it established the unified view that channels with memory, convolutional codes, and Markov sources all share a single inference framework. That view is the reason this book places equalization and detection in the same chapter of an inference textbook.

Three Views of the Same Channel

QuantityEstimation viewDetection viewEqualization view
UnknownContinuous parameter θ\thetaDiscrete hypothesis H∈{0,1}H \in \{0, 1\}Discrete sequence x∈AT\mathbf{x} \in \mathcal{A}^T
ObjectiveMinimize E[βˆ₯ΞΈ^βˆ’ΞΈβˆ₯2]\mathbb{E}[\|\hat{\theta} - \theta\|^2]Minimize P(error)P(\text{error})Minimize sequence error probability
ML rule complexityClosed form (often)Single comparisonExponential in TT (naive)
Structural exploitLinearity, GaussianitySufficient statisticFinite memory β†’ state machine
Canonical algorithmLMMSE, Wiener filterLikelihood ratio testViterbi / dynamic programming

Intersymbol interference (ISI)

The phenomenon in which a received sample depends on multiple past transmitted symbols, arising from any channel whose impulse response extends beyond one symbol period. ISI is the signal-processing shadow of bandwidth limitation: the more tightly a symbol's energy is packed in time, the more it spills into adjacent symbol periods through the channel.

Related: Maximum-likelihood sequence estimation (MLSE), Channel memory

Maximum-likelihood sequence estimation (MLSE)

The rule that returns the transmitted sequence maximizing the conditional density of the received block under a known channel model. For Gaussian noise, MLSE reduces to a minimum-distance search over sequences. Implemented efficiently via the Viterbi algorithm when the channel has finite memory.

Related: Intersymbol interference (ISI), Viterbi algorithm

Channel memory

The integer LL such that the impulse response h[k]h[k] vanishes for k>Lk > L. A channel with memory LL causes each output sample to depend on L+1L+1 consecutive input symbols. The size of the MLSE trellis, MLM^L, is exponential in this quantity.

Related: Intersymbol interference (ISI)