Random (Stochastic) Processes — Fundamentals
Why Stochastic Processes for Communications?
A communication signal is not a single number drawn from a probability distribution --- it is a random function of time. When we write the received baseband waveform
neither the noise nor, in a fading channel, the signal is deterministic. At each instant these are random variables, but they also possess temporal structure: the noise sample at time is correlated with the sample at if is small enough, and the channel gain drifts according to the Doppler spread.
A single random variable captures the statistics of one observation. A random vector captures finitely many observations. A stochastic process captures the statistics of an entire time-indexed family of random variables and is therefore the natural mathematical language for signals, noise, interference, and channels.
This section introduces the foundational concepts:
- Stationarity (when does the statistical character of a process not change with time?).
- Autocorrelation and power spectral density (how is power distributed across frequency?).
- Ergodicity (when can we replace ensemble averages with time averages?).
- LTI filtering of random processes (the key link between signal processing and probability).
These tools are prerequisites for everything that follows: noise analysis, matched-filter detection, Wiener filtering, and the capacity of band-limited channels.
Definition: Stochastic Process
Stochastic Process
A stochastic (random) process is a family of random variables
defined on a common probability space and indexed by a parameter belonging to an index set .
Interpretations and terminology:
-
Fixed , varying : For a fixed time , is an ordinary random variable on .
-
Fixed , varying : For a fixed outcome , the function is a deterministic function of time called a sample path (or realisation) of the process.
-
Both varying: The full object is a function of two variables --- time and randomness.
Classification by index set:
| Index set | Name | Notation |
|---|---|---|
| (or an interval) | Continuous-time process | |
| (or ) | Discrete-time process |
Classification by state space:
- Continuous-valued: or (e.g., thermal noise voltage).
- Discrete-valued: (e.g., the state of a Markov chain modelling a fading channel).
In this text, unless stated otherwise, denotes a continuous-time, complex-valued stochastic process.
In telecommunications, the most common stochastic processes are: (i) additive white Gaussian noise (AWGN), (ii) the time-varying channel gain in a fading environment, and (iii) the information-bearing signal itself, which is modelled as random to apply information-theoretic results. Each of these is a family of random variables indexed by continuous time.
Definition: Mean, Autocorrelation, and Autocovariance Functions
Mean, Autocorrelation, and Autocovariance Functions
Let be a stochastic process.
Mean function:
Autocorrelation function: where denotes complex conjugation.
Autocovariance function:
Cross-correlation between two processes and :
The autocorrelation function is the fundamental second-order descriptor of a stochastic process. It captures how the process at time is statistically related to the process at time .
The complex conjugation in ensures that , which we interpret as the instantaneous power of the process at time . For real-valued processes, the conjugation has no effect and may be dropped.
Definition: Strict-Sense Stationarity (SSS)
Strict-Sense Stationarity (SSS)
A stochastic process is strict-sense stationary (SSS) if its complete statistical description is invariant under time shifts. Formally, for every positive integer , every set of time instants , and every time shift such that , the joint distribution of is identical to that of :
for all (or ) and all valid .
Consequences of SSS:
-
Setting : the first-order distribution does not depend on . In particular, (constant mean) and is constant.
-
Setting : the joint distribution of depends only on the difference , not on the absolute times. Hence .
SSS is a very strong condition: it requires all finite-dimensional distributions to be time-invariant. In practice it is rarely verified directly, and the weaker notion of wide-sense stationarity is used instead.
Strict-sense stationarity implies wide-sense stationarity (WSS) but the converse is false in general. The one important exception is the Gaussian process: because a Gaussian process is completely determined by its mean and autocorrelation, a Gaussian WSS process is automatically SSS.
Definition: Wide-Sense Stationarity (WSS)
Wide-Sense Stationarity (WSS)
A stochastic process is wide-sense stationary (WSS) if it satisfies two conditions:
-
Constant mean:
-
Autocorrelation depends only on the time difference (lag): Writing :
Key properties of the WSS autocorrelation function :
-
: Since is the average power of the process.
-
Maximum at the origin: for all .
-
Hermitian symmetry: . For real-valued processes this simplifies to (even symmetry).
-
Positive semidefiniteness: For any set of times and complex coefficients :
Wide-sense stationarity is the "working assumption" throughout signal processing and communications. It is much easier to verify than SSS (only the first two moments must be checked), and it suffices for all linear processing operations: matched filtering, Wiener filtering, linear MMSE estimation, and spectral analysis all require only the mean and autocorrelation function.
Theorem: Properties of the WSS Autocorrelation Function
Let be a WSS process with autocorrelation . Then:
- .
- for all (Hermitian symmetry).
- for all (maximum at the origin).
Property 1 says average power is nonnegative. Property 2 reflects the conjugate symmetry inherent in the inner product . Property 3 states that a process is most correlated with itself at zero lag --- intuitively, the best predictor of is itself.
For property 1, evaluate at .
For property 2, swap the roles of and in the definition.
For property 3, use the Cauchy-Schwarz inequality.
Proof of (1): $R_X(0) \geq 0$
|X(t)|^2 \geq 0\square$
Proof of (2): Hermitian symmetry
By the WSS assumption, for all . Replacing with : Now set (a valid relabelling since WSS holds for all ): The conjugation arises because applied with , .
Proof of (3): Maximum at the origin
Apply the Cauchy--Schwarz inequality for random variables: By WSS, and . Therefore which gives .
Definition: Power Spectral Density (PSD)
Power Spectral Density (PSD)
Let be a WSS process with autocorrelation function . The power spectral density (PSD) of is the Fourier transform of :
Conversely, the autocorrelation function is recovered by the inverse Fourier transform:
Units: If has units of volts (V), then has units of and has units of (watts per hertz for a 1-ohm load).
Key properties of the PSD:
- for all (nonnegative).
- For real-valued processes: (even symmetry).
- For complex-valued processes: in general, but if the process is circularly symmetric, is real and nonnegative.
The PSD tells us how the average power of the process is distributed across frequency. In communications, the PSD of the transmitted signal determines the occupied bandwidth and hence the spectral efficiency, while the PSD of the noise determines the noise power in any given frequency band.
Theorem: Wiener--Khinchin Theorem
Let be a WSS process with autocorrelation function . Then the power spectral density and the autocorrelation form a Fourier transform pair:
Moreover:
- Nonnegativity: for all .
- Average power: Setting in the inverse relation: The total area under the PSD equals the average power of the process.
The Wiener--Khinchin theorem is the stochastic analog of Parseval's theorem for deterministic signals. Just as the energy of a deterministic signal can be computed by integrating its energy spectral density , the average power of a WSS random process can be computed by integrating its PSD . The PSD replaces the (non-existent) Fourier transform of a random signal with a well-defined, deterministic spectral description.
The Fourier transform pair is essentially a definition (given in Def. def-psd); the nontrivial content is the nonnegativity of .
For nonnegativity, use the positive-semidefiniteness of .
Proof sketch: nonnegativity of $S_X(f)$
The autocorrelation of a WSS process is a positive-semidefinite function, meaning that for any , any times , and any complex weights : This is verified directly:
By Bochner's theorem (a classical result in harmonic analysis), a continuous positive-semidefinite function has a nonnegative Fourier transform. Therefore for all .
Average power relation
Setting in the inverse Fourier transform: Since , we obtain
Definition: Ergodicity
Ergodicity
A WSS stochastic process is ergodic if time averages computed from a single, infinitely long sample path converge to the corresponding ensemble (statistical) averages.
Mean-ergodic: The process is mean-ergodic if the time-averaged mean converges (in mean square) to the ensemble mean:
Autocorrelation-ergodic: The process is autocorrelation-ergodic if
A process that is both mean-ergodic and autocorrelation-ergodic is simply called ergodic (in the wide sense).
Sufficient condition for mean-ergodicity: where is the autocovariance. In particular, if as (i.e., the process "forgets" its past), the process is mean-ergodic.
Ergodicity is the bridge between theory and measurement. In practice, we observe one realisation of a stochastic process (e.g., one received signal waveform) and estimate its statistics (mean, power, PSD) by time-averaging. This procedure is justified only if the process is ergodic. Most WSS processes encountered in communications (stationary noise, stationary fading channels observed over time scales much longer than the coherence time) are assumed ergodic, but the assumption must be verified --- see the Pitfall below.
Theorem: WSS Process Through a Linear Time-Invariant (LTI) System
Let be a WSS process with autocorrelation and PSD . Let be the impulse response of a stable LTI system with frequency response . The output process
is also WSS, and its statistics are:
-
Mean:
-
Autocorrelation:
-
Power spectral density:
-
Cross-PSD (input--output):
-
Output power:
The PSD filtering rule is arguably the single most important result in stochastic signal processing. It says that an LTI filter shapes the power spectrum of a random process by multiplying by the squared magnitude of its frequency response --- exactly as one would expect from the deterministic relation , but now applied to power densities rather than amplitude spectra. Phase information in does not affect the output PSD.
Start from and compute .
Exchange expectation and integration (justified by stability of ).
Take the Fourier transform of and use the convolution theorem.
Step 1: Output autocorrelation
\tauY(t)$ is WSS.
Step 2: Fourier transform to obtain PSD
Taking the Fourier transform of with respect to : Substituting the double-integral expression for and exchanging orders of integration: where we substituted . Recognising the three integrals:
Step 3: Mean of the output
\square$
Step 4: Output power
Setting in the inverse Fourier transform of :
PSD Filtering of a WSS Process Through an LTI System
Visualise the key result . Choose an input PSD shape (white noise, band-limited, or colored/shaped), a filter type (lowpass, bandpass, or raised cosine), and adjust bandwidth and roll-off. The plot shows three panels: (1) the input PSD , (2) the filter power response , and (3) the output PSD . The shaded area under equals the output power .
Parameters
Example: Bandpass Filtering of White Noise
White noise with two-sided PSD (watts/Hz) is passed through an ideal bandpass filter centred at with one-sided bandwidth (i.e., the filter passes and , and rejects everything else).
(a) Find the output PSD .
(b) Compute the output noise power .
(c) Define and compute the noise equivalent bandwidth of the filter.
Step 1: Characterise the filter
The ideal bandpass filter has frequency response magnitude: This is a rectangular window of total width centred at .
Step 2: Output PSD
By the LTI filtering theorem: The output PSD is a flat spectrum confined to the passband of the filter: the noise is now band-limited to bandwidth around .
Step 3: Output noise power
$
Step 4: Noise equivalent bandwidth
For a general (non-ideal) filter with peak gain , the noise equivalent bandwidth is defined as The factor of 2 in the denominator converts from two-sided to one-sided bandwidth. For the ideal bandpass filter above, and confirming consistency. For realistic filters (e.g., Butterworth, Chebyshev), is slightly larger than the 3-dB bandwidth.
Strict-Sense Stationarity vs. Wide-Sense Stationarity vs. Ergodicity
| Property | SSS | WSS | Ergodic |
|---|---|---|---|
| Definition | All finite-dimensional distributions are shift-invariant | Constant mean; autocorrelation depends only on lag | Time averages converge to ensemble averages |
| What it constrains | All joint distributions of | Only the first two moments (mean and autocorrelation) | Relationship between time and statistical domains |
| Implication hierarchy | SSS WSS (always) | WSS SSS in general | Ergodic WSS (assumed); WSS Ergodic |
| Gaussian exception | For Gaussian processes: WSS SSS | Same (second-order statistics fully determine all distributions) | Gaussian WSS with is ergodic |
| Practical verification | Extremely difficult (requires all joint distributions) | Moderate (check constant mean and lag-dependent autocorrelation) | Requires checking decay of autocovariance |
| Wireless example | AWGN is SSS (Gaussian i.i.d. samples) | Stationary fading channel observed over many coherence times | Noise in a long observation window: time-average ensemble average |
| Failure example | Non-stationary interference (e.g., bursty traffic) | Birth-death process with time-varying rate | with random : WSS but not mean-ergodic if is constant per realisation |
| Used for | Theoretical completeness; Gaussian process analysis | PSD, Wiener--Khinchin theorem, LTI filter analysis | Justifying estimation of , , from a single realisation |
The Wiener-Khinchin Theorem: From Autocorrelation to Power Spectrum
Why This Matters: Power Spectral Density of Digitally Modulated Signals
In a linearly modulated digital communication system, the transmitted baseband signal is
where are the (random) data symbols with , , and for (uncorrelated symbols), is the pulse-shaping filter, and is the symbol period.
Treating as a WSS discrete-time process, the PSD of is given by
where is the Fourier transform of the pulse shape.
Key implications for system design:
-
The occupied bandwidth of is determined entirely by . A rectangular pulse gives a PSD with slow spectral roll-off; a raised-cosine pulse confines the spectrum to bandwidth with roll-off factor .
-
Spectral efficiency (bits/s/Hz) is where is the bit rate and is the occupied bandwidth. For -QAM with a raised-cosine pulse:
-
When the modulated signal passes through a channel with transfer function , the received PSD is , directly applying the LTI filtering theorem of this section.
-
The noise power in the receiver bandwidth is , so the received SNR is linking the PSD framework to link-budget analysis.
Quick Check
Let be a WSS process with autocorrelation . Which of the following is guaranteed to be true?
for all
for all
is always a monotonically decreasing function of
can take negative values for some frequencies
This follows from the Cauchy--Schwarz inequality applied to the random variables and . The autocorrelation achieves its maximum magnitude at zero lag, where it equals the average power .
Quick Check
White noise with PSD is passed through a filter with (a Lorentzian/first-order RC filter). The total output noise power is:
The output power is . The integral evaluates to using the standard result . The noise equivalent bandwidth of this filter is .
Quick Check
Consider the process for all , where is a random variable with and . This process is:
WSS and ergodic
WSS but not mean-ergodic
Not WSS
SSS but not WSS
The process is WSS: (constant mean), and depends only on (it is actually constant). However, in general. The time average equals , which is random, so it does not converge to the ensemble mean. The autocovariance does not decay to zero, violating the sufficient condition for mean-ergodicity.
Stochastic Process
A family of random variables defined on a common probability space, indexed by a parameter (usually time). Each fixed yields a random variable; each fixed outcome yields a deterministic sample path.
Wide-Sense Stationarity (WSS)
A stochastic process is WSS if its mean is constant and its autocorrelation function depends only on the time lag : for all and . WSS is the standard assumption for spectral analysis and LTI filter theory applied to random signals.
Related: Wide-Sense Stationarity (WSS), Strict-Sense Stationarity (SSS), Properties of the WSS Autocorrelation Function
Autocorrelation Function
For a WSS process, . It measures the statistical similarity between the process at two time instants separated by lag . The Fourier transform of is the power spectral density .
Related: Mean, Autocorrelation, and Autocovariance Functions, Power Spectral Density (PSD), Wiener--Khinchin Theorem
Power Spectral Density (PSD)
The Fourier transform of the autocorrelation function of a WSS process: . It describes the distribution of average power across frequency, is always nonnegative, and satisfies .
Related: Power Spectral Density (PSD), Wiener--Khinchin Theorem, WSS Process Through a Linear Time-Invariant (LTI) System
Wiener--Khinchin Theorem
The theorem establishing that the PSD and autocorrelation function of a WSS process form a Fourier transform pair: . It guarantees and provides the average-power relation .
Related: Wiener--Khinchin Theorem, Power Spectral Density (PSD), Mean, Autocorrelation, and Autocovariance Functions
Ergodicity
The property that time averages from a single sample path converge to ensemble averages. A mean-ergodic process satisfies as . Ergodicity justifies estimating statistical quantities (mean, PSD) from a single long observation of the process.
Related: Ergodicity, Wide-Sense Stationarity (WSS)
Common Mistake: Assuming Ergodicity Without Checking
Mistake:
"The process is WSS, so we can estimate its mean and PSD from a single time record."
Correction:
WSS does not imply ergodicity. A classic counterexample is the constant-amplitude process for all , where is a zero-mean unit-variance random variable. This process is WSS:
- (constant mean).
- (depends only on , trivially).
Yet the time average of every sample path is , which is a random variable --- it does not converge to . The process is WSS but not ergodic.
When is the assumption safe? A sufficient condition for mean-ergodicity is that the autocovariance decays to zero as . Physically, this means the process "forgets" its past --- future samples are asymptotically uncorrelated with past samples. Most stationary noise and interference processes in communications satisfy this condition. However:
- Processes with a nonzero DC component (e.g., a random but time-invariant mean) may fail mean-ergodicity.
- Processes with persistent periodic components (e.g., with random and ) require separate analysis for each component.
Always verify the decay of before replacing ensemble averages with time averages in a measurement or simulation.
Key Takeaway
Two results form the bedrock of noise and signal analysis in communication systems:
-
Wiener--Khinchin theorem: The PSD and autocorrelation of a WSS process are a Fourier transform pair, and . This lets us move freely between the time domain (autocorrelation, correlation times, fading memory) and the frequency domain (bandwidth, spectral shape, noise floors).
-
LTI filtering rule: . This single equation answers the core question of receiver design: how much noise power appears at the output of a filter? It directly yields:
- Output noise power: for white noise.
- Noise equivalent bandwidth: .
- SNR at the filter output, and hence bit-error-rate performance.
Together, these results transform the abstract notion of a "random waveform" into concrete, computable quantities --- bandwidth, power, and signal-to-noise ratio --- that drive every design decision in a communication link.
The bridge to later chapters:
- Matched filtering (Chapter 3): the filter that maximises output SNR is derived by applying the LTI rule to signal-plus-noise.
- Wiener filtering: the filter that minimises mean-square error between the desired and actual output is expressed via and the cross-PSD .
- Channel capacity (Chapter 7): the water-filling power allocation across frequency sub-bands is an optimisation over the channel PSD .
Historical Note: Norbert Wiener and Aleksandr Khinchin
The relationship between autocorrelation and power spectrum was established independently by two mathematicians in the early 1930s.
Aleksandr Yakovlevich Khinchin (1894--1959), a Soviet mathematician, proved in 1934 that the autocorrelation function of a stationary process is the Fourier transform of a nonnegative measure (the spectral measure), as a consequence of Bochner's theorem on positive-definite functions.
Norbert Wiener (1894--1964), an American mathematician at MIT, arrived at the same result from a different direction: his 1930 work on "generalised harmonic analysis" defined the power spectrum for functions that are not square-integrable (and hence do not have ordinary Fourier transforms) by using time-averaged periodograms. Wiener later applied this theory to the optimal filtering problem during World War II, resulting in the Wiener filter --- the foundation of statistical signal processing.
The theorem bearing both their names is one of the most widely applied results in engineering. Every spectrum analyser, every noise-figure measurement, and every link-budget calculation implicitly relies on the Wiener--Khinchin correspondence between time-domain correlations and frequency-domain power distributions.