Random (Stochastic) Processes — Fundamentals

Why Stochastic Processes for Communications?

A communication signal is not a single number drawn from a probability distribution --- it is a random function of time. When we write the received baseband waveform

r(t)=s(t)+n(t),r(t) = s(t) + n(t),

neither the noise n(t)n(t) nor, in a fading channel, the signal s(t)=h(t)x(t)s(t) = h(t)\,x(t) is deterministic. At each instant tt these are random variables, but they also possess temporal structure: the noise sample at time tt is correlated with the sample at t+τt + \tau if τ\tau is small enough, and the channel gain h(t)h(t) drifts according to the Doppler spread.

A single random variable captures the statistics of one observation. A random vector captures finitely many observations. A stochastic process captures the statistics of an entire time-indexed family of random variables and is therefore the natural mathematical language for signals, noise, interference, and channels.

This section introduces the foundational concepts:

  • Stationarity (when does the statistical character of a process not change with time?).
  • Autocorrelation and power spectral density (how is power distributed across frequency?).
  • Ergodicity (when can we replace ensemble averages with time averages?).
  • LTI filtering of random processes (the key link between signal processing and probability).

These tools are prerequisites for everything that follows: noise analysis, matched-filter detection, Wiener filtering, and the capacity of band-limited channels.

Definition:

Stochastic Process

A stochastic (random) process is a family of random variables

{X(t),  tT}\{X(t),\; t \in T\}

defined on a common probability space (Ω,F,P)(\Omega, \mathcal{F}, P) and indexed by a parameter tt belonging to an index set TT.

Interpretations and terminology:

  1. Fixed tt, varying ω\omega: For a fixed time t0Tt_0 \in T, X(t0)=X(t0,ω)X(t_0) = X(t_0, \omega) is an ordinary random variable on (Ω,F,P)(\Omega, \mathcal{F}, P).

  2. Fixed ω\omega, varying tt: For a fixed outcome ω0Ω\omega_0 \in \Omega, the function tX(t,ω0)t \mapsto X(t, \omega_0) is a deterministic function of time called a sample path (or realisation) of the process.

  3. Both varying: The full object X(t,ω)X(t, \omega) is a function of two variables --- time and randomness.

Classification by index set:

Index set TT Name Notation
T=RT = \mathbb{R} (or an interval) Continuous-time process X(t)X(t)
T=ZT = \mathbb{Z} (or N0\mathbb{N}_0) Discrete-time process X[n]X[n]

Classification by state space:

  • Continuous-valued: X(t)RX(t) \in \mathbb{R} or C\mathbb{C} (e.g., thermal noise voltage).
  • Discrete-valued: X(t){s1,s2,}X(t) \in \{s_1, s_2, \ldots\} (e.g., the state of a Markov chain modelling a fading channel).

In this text, unless stated otherwise, X(t)X(t) denotes a continuous-time, complex-valued stochastic process.

In telecommunications, the most common stochastic processes are: (i) additive white Gaussian noise (AWGN), (ii) the time-varying channel gain h(t)h(t) in a fading environment, and (iii) the information-bearing signal x(t)x(t) itself, which is modelled as random to apply information-theoretic results. Each of these is a family of random variables indexed by continuous time.

,

Definition:

Mean, Autocorrelation, and Autocovariance Functions

Let {X(t),  tT}\{X(t),\; t \in T\} be a stochastic process.

Mean function: μX(t)=E[X(t)].\mu_X(t) = E[X(t)].

Autocorrelation function: RX(t1,t2)=E[X(t1)X(t2)],R_X(t_1, t_2) = E[X(t_1)\,X^*(t_2)], where ^* denotes complex conjugation.

Autocovariance function: CX(t1,t2)=E[(X(t1)μX(t1))(X(t2)μX(t2))]=RX(t1,t2)μX(t1)μX(t2).C_X(t_1, t_2) = E\bigl[(X(t_1) - \mu_X(t_1))(X(t_2) - \mu_X(t_2))^*\bigr] = R_X(t_1, t_2) - \mu_X(t_1)\,\mu_X^*(t_2).

Cross-correlation between two processes X(t)X(t) and Y(t)Y(t): RXY(t1,t2)=E[X(t1)Y(t2)].R_{XY}(t_1, t_2) = E[X(t_1)\,Y^*(t_2)].

The autocorrelation function is the fundamental second-order descriptor of a stochastic process. It captures how the process at time t1t_1 is statistically related to the process at time t2t_2.

The complex conjugation in E[X(t1)X(t2)]E[X(t_1)\,X^*(t_2)] ensures that RX(t,t)=E[X(t)2]0R_X(t,t) = E[|X(t)|^2] \geq 0, which we interpret as the instantaneous power of the process at time tt. For real-valued processes, the conjugation has no effect and may be dropped.

,

Definition:

Strict-Sense Stationarity (SSS)

A stochastic process {X(t),  tT}\{X(t),\; t \in T\} is strict-sense stationary (SSS) if its complete statistical description is invariant under time shifts. Formally, for every positive integer nn, every set of time instants t1,t2,,tnTt_1, t_2, \ldots, t_n \in T, and every time shift τ\tau such that t1+τ,,tn+τTt_1+\tau, \ldots, t_n+\tau \in T, the joint distribution of (X(t1+τ),X(t2+τ),,X(tn+τ))(X(t_1+\tau), X(t_2+\tau), \ldots, X(t_n+\tau)) is identical to that of (X(t1),X(t2),,X(tn))(X(t_1), X(t_2), \ldots, X(t_n)):

FX(t1+τ),,X(tn+τ)(x1,,xn)=FX(t1),,X(tn)(x1,,xn)F_{X(t_1+\tau),\ldots,X(t_n+\tau)}(x_1,\ldots,x_n) = F_{X(t_1),\ldots,X(t_n)}(x_1,\ldots,x_n)

for all (x1,,xn)Rn(x_1, \ldots, x_n) \in \mathbb{R}^n (or Cn\mathbb{C}^n) and all valid τ\tau.

Consequences of SSS:

  • Setting n=1n = 1: the first-order distribution FX(t)F_{X(t)} does not depend on tt. In particular, μX(t)=μ\mu_X(t) = \mu (constant mean) and Var(X(t))\mathrm{Var}(X(t)) is constant.

  • Setting n=2n = 2: the joint distribution of (X(t1),X(t2))(X(t_1), X(t_2)) depends only on the difference t1t2t_1 - t_2, not on the absolute times. Hence RX(t1,t2)=RX(t1t2)R_X(t_1, t_2) = R_X(t_1 - t_2).

SSS is a very strong condition: it requires all finite-dimensional distributions to be time-invariant. In practice it is rarely verified directly, and the weaker notion of wide-sense stationarity is used instead.

Strict-sense stationarity implies wide-sense stationarity (WSS) but the converse is false in general. The one important exception is the Gaussian process: because a Gaussian process is completely determined by its mean and autocorrelation, a Gaussian WSS process is automatically SSS.

,

Definition:

Wide-Sense Stationarity (WSS)

A stochastic process {X(t)}\{X(t)\} is wide-sense stationary (WSS) if it satisfies two conditions:

  1. Constant mean: μX(t)=E[X(t)]=μfor all t.\mu_X(t) = E[X(t)] = \mu \quad \text{for all } t.

  2. Autocorrelation depends only on the time difference (lag): RX(t1,t2)=RX(t1t2).R_X(t_1, t_2) = R_X(t_1 - t_2). Writing τ=t1t2\tau = t_1 - t_2: RX(τ)=E[X(t+τ)X(t)]for all t.R_X(\tau) = E[X(t + \tau)\,X^*(t)] \quad \text{for all } t.

Key properties of the WSS autocorrelation function RX(τ)R_X(\tau):

  • RX(0)0R_X(0) \geq 0: Since RX(0)=E[X(t)2]R_X(0) = E[|X(t)|^2] is the average power of the process.

  • Maximum at the origin: RX(τ)RX(0)|R_X(\tau)| \leq R_X(0) for all τ\tau.

  • Hermitian symmetry: RX(τ)=RX(τ)R_X(-\tau) = R_X^*(\tau). For real-valued processes this simplifies to RX(τ)=RX(τ)R_X(-\tau) = R_X(\tau) (even symmetry).

  • Positive semidefiniteness: For any set of times t1,,tnt_1, \ldots, t_n and complex coefficients a1,,ana_1, \ldots, a_n: i=1nk=1naiakRX(titk)0.\sum_{i=1}^{n}\sum_{k=1}^{n} a_i\,a_k^*\,R_X(t_i - t_k) \geq 0.

Wide-sense stationarity is the "working assumption" throughout signal processing and communications. It is much easier to verify than SSS (only the first two moments must be checked), and it suffices for all linear processing operations: matched filtering, Wiener filtering, linear MMSE estimation, and spectral analysis all require only the mean and autocorrelation function.

, ,

Theorem: Properties of the WSS Autocorrelation Function

Let {X(t)}\{X(t)\} be a WSS process with autocorrelation RX(τ)R_X(\tau). Then:

  1. RX(0)0R_X(0) \geq 0.
  2. RX(τ)=RX(τ)R_X(-\tau) = R_X^*(\tau) for all τ\tau (Hermitian symmetry).
  3. RX(τ)RX(0)|R_X(\tau)| \leq R_X(0) for all τ\tau (maximum at the origin).

Property 1 says average power is nonnegative. Property 2 reflects the conjugate symmetry inherent in the inner product E[X(t+τ)X(t)]E[X(t+\tau)X^*(t)]. Property 3 states that a process is most correlated with itself at zero lag --- intuitively, the best predictor of X(t)X(t) is X(t)X(t) itself.

,

Definition:

Power Spectral Density (PSD)

Let {X(t)}\{X(t)\} be a WSS process with autocorrelation function RX(τ)R_X(\tau). The power spectral density (PSD) of X(t)X(t) is the Fourier transform of RX(τ)R_X(\tau):

SX(f)=RX(τ)ej2πfτdτ.S_X(f) = \int_{-\infty}^{\infty} R_X(\tau)\,e^{-j2\pi f\tau}\,d\tau.

Conversely, the autocorrelation function is recovered by the inverse Fourier transform:

RX(τ)=SX(f)ej2πfτdf.R_X(\tau) = \int_{-\infty}^{\infty} S_X(f)\,e^{j2\pi f\tau}\,df.

Units: If X(t)X(t) has units of volts (V), then RX(τ)R_X(\tau) has units of V2\mathrm{V}^2 and SX(f)S_X(f) has units of V2/Hz\mathrm{V}^2/\mathrm{Hz} (watts per hertz for a 1-ohm load).

Key properties of the PSD:

  • SX(f)0S_X(f) \geq 0 for all ff (nonnegative).
  • For real-valued processes: SX(f)=SX(f)S_X(-f) = S_X(f) (even symmetry).
  • For complex-valued processes: SX(f)=SX(f)S_X(-f) = S_X^*(f) in general, but if the process is circularly symmetric, SX(f)S_X(f) is real and nonnegative.

The PSD tells us how the average power of the process is distributed across frequency. In communications, the PSD of the transmitted signal determines the occupied bandwidth and hence the spectral efficiency, while the PSD of the noise determines the noise power in any given frequency band.

, ,

Theorem: Wiener--Khinchin Theorem

Let {X(t)}\{X(t)\} be a WSS process with autocorrelation function RX(τ)R_X(\tau). Then the power spectral density SX(f)S_X(f) and the autocorrelation RX(τ)R_X(\tau) form a Fourier transform pair:

SX(f)=F{RX}(f)=RX(τ)ej2πfτdτ,S_X(f) = \mathcal{F}\{R_X\}(f) = \int_{-\infty}^{\infty} R_X(\tau)\,e^{-j2\pi f\tau}\,d\tau,

RX(τ)=F1{SX}(τ)=SX(f)ej2πfτdf.R_X(\tau) = \mathcal{F}^{-1}\{S_X\}(\tau) = \int_{-\infty}^{\infty} S_X(f)\,e^{j2\pi f\tau}\,df.

Moreover:

  1. Nonnegativity: SX(f)0S_X(f) \geq 0 for all ff.
  2. Average power: Setting τ=0\tau = 0 in the inverse relation: PX=RX(0)=E[X(t)2]=SX(f)df.P_X = R_X(0) = E[|X(t)|^2] = \int_{-\infty}^{\infty} S_X(f)\,df. The total area under the PSD equals the average power of the process.

The Wiener--Khinchin theorem is the stochastic analog of Parseval's theorem for deterministic signals. Just as the energy of a deterministic signal can be computed by integrating its energy spectral density X(f)2|X(f)|^2, the average power of a WSS random process can be computed by integrating its PSD SX(f)S_X(f). The PSD replaces the (non-existent) Fourier transform of a random signal with a well-defined, deterministic spectral description.

, ,

Definition:

Ergodicity

A WSS stochastic process {X(t)}\{X(t)\} is ergodic if time averages computed from a single, infinitely long sample path converge to the corresponding ensemble (statistical) averages.

Mean-ergodic: The process is mean-ergodic if the time-averaged mean converges (in mean square) to the ensemble mean: X(t)T    12TTTX(t)dt  T  E[X(t)]=μ.\langle X(t) \rangle_T \;\triangleq\; \frac{1}{2T}\int_{-T}^{T} X(t)\,dt \;\xrightarrow{T \to \infty}\; E[X(t)] = \mu.

Autocorrelation-ergodic: The process is autocorrelation-ergodic if 12TTTX(t+τ)X(t)dt  T  RX(τ)for each τ.\frac{1}{2T}\int_{-T}^{T} X(t+\tau)\,X^*(t)\,dt \;\xrightarrow{T \to \infty}\; R_X(\tau) \quad \text{for each } \tau.

A process that is both mean-ergodic and autocorrelation-ergodic is simply called ergodic (in the wide sense).

Sufficient condition for mean-ergodicity: limT12T2T2T(1τ2T)CX(τ)dτ=0,\lim_{T \to \infty} \frac{1}{2T}\int_{-2T}^{2T} \left(1 - \frac{|\tau|}{2T}\right) C_X(\tau)\,d\tau = 0, where CX(τ)=RX(τ)μ2C_X(\tau) = R_X(\tau) - |\mu|^2 is the autocovariance. In particular, if CX(τ)0C_X(\tau) \to 0 as τ|\tau| \to \infty (i.e., the process "forgets" its past), the process is mean-ergodic.

Ergodicity is the bridge between theory and measurement. In practice, we observe one realisation of a stochastic process (e.g., one received signal waveform) and estimate its statistics (mean, power, PSD) by time-averaging. This procedure is justified only if the process is ergodic. Most WSS processes encountered in communications (stationary noise, stationary fading channels observed over time scales much longer than the coherence time) are assumed ergodic, but the assumption must be verified --- see the Pitfall below.

,

Theorem: WSS Process Through a Linear Time-Invariant (LTI) System

Let {X(t)}\{X(t)\} be a WSS process with autocorrelation RX(τ)R_X(\tau) and PSD SX(f)S_X(f). Let h(t)h(t) be the impulse response of a stable LTI system with frequency response H(f)=F{h}(f)H(f) = \mathcal{F}\{h\}(f). The output process

Y(t)=h(α)X(tα)dα=(hX)(t)Y(t) = \int_{-\infty}^{\infty} h(\alpha)\,X(t - \alpha)\,d\alpha = (h * X)(t)

is also WSS, and its statistics are:

  1. Mean: μY=μXH(0).\mu_Y = \mu_X \cdot H(0).

  2. Autocorrelation: RY(τ)=h(α)h(β)RX(τα+β)dαdβ.R_Y(\tau) = \int_{-\infty}^{\infty}\int_{-\infty}^{\infty} h(\alpha)\,h^*(\beta)\,R_X(\tau - \alpha + \beta)\,d\alpha\,d\beta.

  3. Power spectral density: SY(f)=H(f)2SX(f).\boxed{S_Y(f) = |H(f)|^2\,S_X(f).}

  4. Cross-PSD (input--output): SYX(f)=H(f)SX(f).S_{YX}(f) = H(f)\,S_X(f).

  5. Output power: PY=RY(0)=H(f)2SX(f)df.P_Y = R_Y(0) = \int_{-\infty}^{\infty} |H(f)|^2\,S_X(f)\,df.

The PSD filtering rule SY(f)=H(f)2SX(f)S_Y(f) = |H(f)|^2 S_X(f) is arguably the single most important result in stochastic signal processing. It says that an LTI filter shapes the power spectrum of a random process by multiplying by the squared magnitude of its frequency response --- exactly as one would expect from the deterministic relation Y(f)=H(f)X(f)Y(f) = H(f)X(f), but now applied to power densities rather than amplitude spectra. Phase information in H(f)H(f) does not affect the output PSD.

, ,

PSD Filtering of a WSS Process Through an LTI System

Visualise the key result SY(f)=H(f)2SX(f)S_Y(f) = |H(f)|^2\,S_X(f). Choose an input PSD shape (white noise, band-limited, or colored/shaped), a filter type (lowpass, bandpass, or raised cosine), and adjust bandwidth and roll-off. The plot shows three panels: (1) the input PSD SX(f)S_X(f), (2) the filter power response H(f)2|H(f)|^2, and (3) the output PSD SY(f)S_Y(f). The shaded area under SY(f)S_Y(f) equals the output power PYP_Y.

Parameters
1
0.5

Example: Bandpass Filtering of White Noise

White noise X(t)X(t) with two-sided PSD SX(f)=N0/2S_X(f) = N_0/2 (watts/Hz) is passed through an ideal bandpass filter centred at fcf_c with one-sided bandwidth WW (i.e., the filter passes ffcW/2|f - f_c| \leq W/2 and f+fcW/2|f + f_c| \leq W/2, and rejects everything else).

(a) Find the output PSD SY(f)S_Y(f).

(b) Compute the output noise power PYP_Y.

(c) Define and compute the noise equivalent bandwidth of the filter.

,

Strict-Sense Stationarity vs. Wide-Sense Stationarity vs. Ergodicity

PropertySSSWSSErgodic
DefinitionAll finite-dimensional distributions are shift-invariantConstant mean; autocorrelation depends only on lag τ\tauTime averages converge to ensemble averages
What it constrainsAll joint distributions of (X(t1),,X(tn))(X(t_1), \ldots, X(t_n))Only the first two moments (mean and autocorrelation)Relationship between time and statistical domains
Implication hierarchySSS \Rightarrow WSS (always)WSS ⇏\not\Rightarrow SSS in generalErgodic \Rightarrow WSS (assumed); WSS ⇏\not\Rightarrow Ergodic
Gaussian exceptionFor Gaussian processes: WSS \Leftrightarrow SSSSame (second-order statistics fully determine all distributions)Gaussian WSS with CX(τ)0C_X(\tau) \to 0 is ergodic
Practical verificationExtremely difficult (requires all joint distributions)Moderate (check constant mean and lag-dependent autocorrelation)Requires checking decay of autocovariance
Wireless exampleAWGN is SSS (Gaussian i.i.d. samples)Stationary fading channel observed over many coherence timesNoise in a long observation window: time-average \approx ensemble average
Failure exampleNon-stationary interference (e.g., bursty traffic)Birth-death process with time-varying rateX(t)=Acos(2πf0t)X(t) = A\cos(2\pi f_0 t) with random AA: WSS but not mean-ergodic if AA is constant per realisation
Used forTheoretical completeness; Gaussian process analysisPSD, Wiener--Khinchin theorem, LTI filter analysisJustifying estimation of μ\mu, RX(τ)R_X(\tau), SX(f)S_X(f) from a single realisation

The Wiener-Khinchin Theorem: From Autocorrelation to Power Spectrum

A split-screen animation showing a WSS process realisation, its autocorrelation function RX(τ)R_X(\tau), and the resulting power spectral density SX(f)S_X(f) connected by the Fourier transform.
The Wiener-Khinchin theorem establishes a Fourier-transform pair between the autocorrelation function RX(τ)R_X(\tau) and the power spectral density SX(f)S_X(f). A narrow autocorrelation (fast decorrelation) corresponds to a wide spectrum, and vice versa.

Why This Matters: Power Spectral Density of Digitally Modulated Signals

In a linearly modulated digital communication system, the transmitted baseband signal is

x(t)=n=anp(tnTs),x(t) = \sum_{n=-\infty}^{\infty} a_n\,p(t - nT_s),

where {an}\{a_n\} are the (random) data symbols with E[an]=0E[a_n] = 0, E[an2]=σa2E[|a_n|^2] = \sigma_a^2, and E[anam]=0E[a_n a_m^*] = 0 for nmn \neq m (uncorrelated symbols), p(t)p(t) is the pulse-shaping filter, and TsT_s is the symbol period.

Treating {an}\{a_n\} as a WSS discrete-time process, the PSD of x(t)x(t) is given by

Sx(f)=σa2TsP(f)2,S_x(f) = \frac{\sigma_a^2}{T_s}\,|P(f)|^2,

where P(f)=F{p(t)}P(f) = \mathcal{F}\{p(t)\} is the Fourier transform of the pulse shape.

Key implications for system design:

  • The occupied bandwidth of x(t)x(t) is determined entirely by P(f)2|P(f)|^2. A rectangular pulse gives a sinc2\mathrm{sinc}^2 PSD with slow spectral roll-off; a raised-cosine pulse confines the spectrum to bandwidth (1+β)/(2Ts)(1+\beta)/(2T_s) with roll-off factor β[0,1]\beta \in [0,1].

  • Spectral efficiency (bits/s/Hz) is η=Rb/W\eta = R_b / W where RbR_b is the bit rate and WW is the occupied bandwidth. For MM-QAM with a raised-cosine pulse: η=log2M1+βbits/s/Hz.\eta = \frac{\log_2 M}{1 + \beta} \quad \text{bits/s/Hz}.

  • When the modulated signal passes through a channel with transfer function Hc(f)H_c(f), the received PSD is Sr(f)=Hc(f)2Sx(f)S_r(f) = |H_c(f)|^2\,S_x(f), directly applying the LTI filtering theorem of this section.

  • The noise power in the receiver bandwidth WW is Pn=N0WP_n = N_0 W, so the received SNR is SNR=PsN0W,\text{SNR} = \frac{P_s}{N_0 W}, linking the PSD framework to link-budget analysis.

Quick Check

Let {X(t)}\{X(t)\} be a WSS process with autocorrelation RX(τ)R_X(\tau). Which of the following is guaranteed to be true?

RX(τ)0R_X(\tau) \geq 0 for all τ\tau

RX(τ)RX(0)|R_X(\tau)| \leq R_X(0) for all τ\tau

RX(τ)R_X(\tau) is always a monotonically decreasing function of τ|\tau|

SX(f)S_X(f) can take negative values for some frequencies

Quick Check

White noise with PSD SX(f)=N0/2S_X(f) = N_0/2 is passed through a filter with H(f)2=11+(f/B)2|H(f)|^2 = \frac{1}{1 + (f/B)^2} (a Lorentzian/first-order RC filter). The total output noise power is:

N0BN_0 B

π2N0B\frac{\pi}{2} N_0 B

πN0B2\frac{\pi N_0 B}{2}

N0B/πN_0 B / \pi

Quick Check

Consider the process X(t)=AX(t) = A for all tt, where AA is a random variable with E[A]=0E[A] = 0 and E[A2]=1E[A^2] = 1. This process is:

WSS and ergodic

WSS but not mean-ergodic

Not WSS

SSS but not WSS

Stochastic Process

A family {X(t),tT}\{X(t), t \in T\} of random variables defined on a common probability space, indexed by a parameter tt (usually time). Each fixed tt yields a random variable; each fixed outcome ω\omega yields a deterministic sample path.

Related: Stochastic Process, Wide-Sense Stationarity (WSS)

Wide-Sense Stationarity (WSS)

A stochastic process is WSS if its mean is constant and its autocorrelation function depends only on the time lag τ=t1t2\tau = t_1 - t_2: E[X(t)]=μE[X(t)] = \mu for all tt and RX(t1,t2)=RX(t1t2)R_X(t_1,t_2) = R_X(t_1 - t_2). WSS is the standard assumption for spectral analysis and LTI filter theory applied to random signals.

Related: Wide-Sense Stationarity (WSS), Strict-Sense Stationarity (SSS), Properties of the WSS Autocorrelation Function

Autocorrelation Function

For a WSS process, RX(τ)=E[X(t+τ)X(t)]R_X(\tau) = E[X(t+\tau)X^*(t)]. It measures the statistical similarity between the process at two time instants separated by lag τ\tau. The Fourier transform of RX(τ)R_X(\tau) is the power spectral density SX(f)S_X(f).

Related: Mean, Autocorrelation, and Autocovariance Functions, Power Spectral Density (PSD), Wiener--Khinchin Theorem

Power Spectral Density (PSD)

The Fourier transform of the autocorrelation function of a WSS process: SX(f)=RX(τ)ej2πfτdτS_X(f) = \int R_X(\tau) e^{-j2\pi f\tau}\,d\tau. It describes the distribution of average power across frequency, is always nonnegative, and satisfies SX(f)df=RX(0)=E[X(t)2]\int S_X(f)\,df = R_X(0) = E[|X(t)|^2].

Related: Power Spectral Density (PSD), Wiener--Khinchin Theorem, WSS Process Through a Linear Time-Invariant (LTI) System

Wiener--Khinchin Theorem

The theorem establishing that the PSD and autocorrelation function of a WSS process form a Fourier transform pair: SX(f)=F{RX(τ)}S_X(f) = \mathcal{F}\{R_X(\tau)\}. It guarantees SX(f)0S_X(f) \geq 0 and provides the average-power relation PX=SX(f)df=RX(0)P_X = \int S_X(f)\,df = R_X(0).

Related: Wiener--Khinchin Theorem, Power Spectral Density (PSD), Mean, Autocorrelation, and Autocovariance Functions

Ergodicity

The property that time averages from a single sample path converge to ensemble averages. A mean-ergodic process satisfies 12TTTX(t)dtE[X(t)]\frac{1}{2T}\int_{-T}^{T} X(t)\,dt \to E[X(t)] as TT \to \infty. Ergodicity justifies estimating statistical quantities (mean, PSD) from a single long observation of the process.

Related: Ergodicity, Wide-Sense Stationarity (WSS)

Common Mistake: Assuming Ergodicity Without Checking

Mistake:

"The process is WSS, so we can estimate its mean and PSD from a single time record."

Correction:

WSS does not imply ergodicity. A classic counterexample is the constant-amplitude process X(t)=AX(t) = A for all tt, where AA is a zero-mean unit-variance random variable. This process is WSS:

  • E[X(t)]=E[A]=0E[X(t)] = E[A] = 0 (constant mean).
  • RX(τ)=E[A2]=1R_X(\tau) = E[A^2] = 1 (depends only on τ\tau, trivially).

Yet the time average of every sample path is X(t)T=A\langle X(t)\rangle_T = A, which is a random variable --- it does not converge to E[A]=0E[A] = 0. The process is WSS but not ergodic.

When is the assumption safe? A sufficient condition for mean-ergodicity is that the autocovariance CX(τ)C_X(\tau) decays to zero as τ|\tau| \to \infty. Physically, this means the process "forgets" its past --- future samples are asymptotically uncorrelated with past samples. Most stationary noise and interference processes in communications satisfy this condition. However:

  • Processes with a nonzero DC component (e.g., a random but time-invariant mean) may fail mean-ergodicity.
  • Processes with persistent periodic components (e.g., X(t)=Acos(2πf0t+Θ)X(t) = A\cos(2\pi f_0 t + \Theta) with random AA and Θ\Theta) require separate analysis for each component.

Always verify the decay of CX(τ)C_X(\tau) before replacing ensemble averages with time averages in a measurement or simulation.

Key Takeaway

Two results form the bedrock of noise and signal analysis in communication systems:

  1. Wiener--Khinchin theorem: The PSD and autocorrelation of a WSS process are a Fourier transform pair, and SX(f)0S_X(f) \geq 0. This lets us move freely between the time domain (autocorrelation, correlation times, fading memory) and the frequency domain (bandwidth, spectral shape, noise floors).

  2. LTI filtering rule: SY(f)=H(f)2SX(f)S_Y(f) = |H(f)|^2\,S_X(f). This single equation answers the core question of receiver design: how much noise power appears at the output of a filter? It directly yields:

    • Output noise power: Pn=H(f)2Sn(f)df=N0BeqP_n = \int |H(f)|^2 S_n(f)\,df = N_0 B_{\text{eq}} for white noise.
    • Noise equivalent bandwidth: Beq=12H(fmax)2H(f)2dfB_{\text{eq}} = \frac{1}{2|H(f_{\max})|^2}\int |H(f)|^2\,df.
    • SNR at the filter output, and hence bit-error-rate performance.

Together, these results transform the abstract notion of a "random waveform" into concrete, computable quantities --- bandwidth, power, and signal-to-noise ratio --- that drive every design decision in a communication link.

The bridge to later chapters:

  • Matched filtering (Chapter 3): the filter H(f)H(f) that maximises output SNR is derived by applying the LTI rule to signal-plus-noise.
  • Wiener filtering: the filter that minimises mean-square error between the desired and actual output is expressed via SX(f)S_X(f) and the cross-PSD SXY(f)S_{XY}(f).
  • Channel capacity (Chapter 7): the water-filling power allocation across frequency sub-bands is an optimisation over the channel PSD Hc(f)2/Sn(f)|H_c(f)|^2 / S_n(f).

Historical Note: Norbert Wiener and Aleksandr Khinchin

The relationship between autocorrelation and power spectrum was established independently by two mathematicians in the early 1930s.

Aleksandr Yakovlevich Khinchin (1894--1959), a Soviet mathematician, proved in 1934 that the autocorrelation function of a stationary process is the Fourier transform of a nonnegative measure (the spectral measure), as a consequence of Bochner's theorem on positive-definite functions.

Norbert Wiener (1894--1964), an American mathematician at MIT, arrived at the same result from a different direction: his 1930 work on "generalised harmonic analysis" defined the power spectrum for functions that are not square-integrable (and hence do not have ordinary Fourier transforms) by using time-averaged periodograms. Wiener later applied this theory to the optimal filtering problem during World War II, resulting in the Wiener filter --- the foundation of statistical signal processing.

The theorem bearing both their names is one of the most widely applied results in engineering. Every spectrum analyser, every noise-figure measurement, and every link-budget calculation implicitly relies on the Wiener--Khinchin correspondence between time-domain correlations and frequency-domain power distributions.

,