Ergodicity

From Ensemble to Time Averages

All the statistical quantities we have defined β€” mean, autocorrelation, variance β€” are ensemble averages: expectations over the probability space (Ξ©,F,P)(\Omega, \mathcal{F}, \mathbb{P}). But in practice, we typically observe a single realization of the process (one received signal, one channel trace). How can we estimate ΞΌ=E[X(t)]\mu = \mathbb{E}[X(t)] from a single sample path? The answer is ergodicity: under certain conditions, time averages computed from a single long realization converge to the ensemble averages. Ergodicity is the assumption that justifies virtually all practical estimation in communications.

Definition:

Time Average

For a continuous-time process {X(t)}\{X(t)\}, the time average over [βˆ’T,T][-T, T] is ⟨X⟩T=12Tβˆ«βˆ’TTX(t) dt.\langle X \rangle_T = \frac{1}{2T}\int_{-T}^{T} X(t)\,dt.

For a discrete-time process {Xn}\{X_n\}, the time average over {βˆ’N,…,N}\{-N, \ldots, N\} is ⟨X⟩N=12N+1βˆ‘n=βˆ’NNXn.\langle X \rangle_N = \frac{1}{2N+1}\sum_{n=-N}^{N} X_n.

Definition:

Mean-Ergodic Process

A WSS process {X(t)}\{X(t)\} is mean-ergodic if its time average converges to the ensemble mean in mean square: lim⁑Tβ†’βˆžE[∣⟨X⟩Tβˆ’ΞΌβˆ£2]=0,\lim_{T \to \infty} \mathbb{E}\left[\left|\langle X \rangle_T - \mu\right|^2\right] = 0, i.e., ⟨X⟩Tβ†’m.s.ΞΌ\langle X \rangle_T \xrightarrow{\text{m.s.}} \mu.

Mean-ergodicity says that a single long observation suffices to estimate the mean. This is weaker than full ergodicity (where time averages of all functions of the process converge to ensemble averages), but it is the most practically important case.

Theorem: Condition for Mean-Ergodicity

A WSS process {X(t)}\{X(t)\} with autocovariance cXX(Ο„)c_{XX}(\tau) is mean-ergodic if and only if lim⁑Tβ†’βˆž12Tβˆ«βˆ’2T2T(1βˆ’βˆ£Ο„βˆ£2T)cXX(Ο„) dΟ„=0.\lim_{T \to \infty} \frac{1}{2T}\int_{-2T}^{2T}\left(1 - \frac{|\tau|}{2T}\right)c_{XX}(\tau)\,d\tau = 0.

Sufficient condition: {X(t)}\{X(t)\} is mean-ergodic if limβ‘βˆ£Ο„βˆ£β†’βˆžcXX(Ο„)=0.\lim_{|\tau| \to \infty} c_{XX}(\tau) = 0.

For discrete-time: {Xn}\{X_n\} is mean-ergodic if lim⁑∣kβˆ£β†’βˆžcxx[k]=0\lim_{|k| \to \infty} c_{xx}[k] = 0.

The condition cXX(Ο„)β†’0c_{XX}(\tau) \to 0 as βˆ£Ο„βˆ£β†’βˆž|\tau| \to \infty means that the process "forgets" its past β€” distant samples are approximately uncorrelated. This ensures that the time average ⟨X⟩T\langle X \rangle_T averages over effectively independent observations and converges to ΞΌ\mu by a law-of-large-numbers-type argument.

Example: Mean-Ergodicity of an Exponentially Correlated Process

Let {X(t)}\{X(t)\} be a zero-mean WSS process with cXX(Ο„)=Οƒ2eβˆ’Ξ±βˆ£Ο„βˆ£c_{XX}(\tau) = \sigma^2 e^{-\alpha|\tau|}, Ξ±>0\alpha > 0. Is {X(t)}\{X(t)\} mean-ergodic?

Example: A Non-Ergodic WSS Process

Let X(t)=AX(t) = A for all tt, where AA is a random variable with E[A]=0\mathbb{E}[A] = 0 and Var(A)=Οƒ2\text{Var}(A) = \sigma^2. Show that {X(t)}\{X(t)\} is WSS but not mean-ergodic.

Ergodic Time-Average Convergence

Watch the running time average ⟨X⟩T\langle X \rangle_T converge (or not) to the ensemble mean as TT increases. Compare an ergodic process (exponential ACF) with a non-ergodic one (constant random variable).

Parameters
1000
5
42

Definition:

Correlation-Ergodic Process

A WSS process {X(t)}\{X(t)\} is correlation-ergodic if the time-averaged product converges to the autocorrelation: 12Tβˆ«βˆ’TTX(t+Ο„)Xβˆ—(t) dtβ†’m.s.rXX(Ο„)\frac{1}{2T}\int_{-T}^{T} X(t+\tau)X^*(t)\,dt \xrightarrow{\text{m.s.}} r_{XX}(\tau) as Tβ†’βˆžT \to \infty, for every Ο„\tau.

A sufficient condition (for Gaussian WSS processes) is cXX(Ο„)β†’0c_{XX}(\tau) \to 0 as βˆ£Ο„βˆ£β†’βˆž|\tau| \to \infty.

Theorem: Ergodic Theorem for Stationary Processes

If {X(t)}\{X(t)\} is strict-sense stationary and E[∣X(t)∣]<∞\mathbb{E}[|X(t)|] < \infty, then 12Tβˆ«βˆ’TTg(X(t)) dtβ†’a.s.E[g(X(0))]\frac{1}{2T}\int_{-T}^{T} g(X(t))\,dt \xrightarrow{\text{a.s.}} \mathbb{E}[g(X(0))] as Tβ†’βˆžT \to \infty, for every measurable function gg with E[∣g(X(0))∣]<∞\mathbb{E}[|g(X(0))|] < \infty, provided the process is ergodic (the shift-invariant Οƒ\sigma-algebra is trivial).

For discrete-time: 12N+1βˆ‘n=βˆ’NNg(Xn)β†’a.s.E[g(X0)]\frac{1}{2N+1}\sum_{n=-N}^{N} g(X_n) \xrightarrow{\text{a.s.}} \mathbb{E}[g(X_0)].

This is the Birkhoff ergodic theorem applied to stationary processes. It says that time averages of any function of the process converge to ensemble averages, provided the process is ergodic in the measure-theoretic sense. Mean-ergodicity is the special case g(x)=xg(x) = x.

Practical Implications of Ergodicity

In communications, ergodicity has far-reaching consequences:

  • Channel estimation: We can estimate the channel's mean power and autocorrelation from a single long observation, rather than requiring an ensemble of independent channel realizations.
  • Bit error rate: The long-run fraction of erroneous bits in a single transmission equals the probability of bit error (BER).
  • Capacity: The achievable rate on a stationary ergodic channel converges to the ergodic capacity E[log⁑(1+SNR)]\mathbb{E}[\log(1 + \text{SNR})], averaged over the fading distribution.
  • Estimation: Sample mean and sample autocorrelation are consistent estimators of ΞΌ\mu and rXX(Ο„)r_{XX}(\tau).

Quick Check

A WSS process has autocovariance cXX(Ο„)=Οƒ2cos⁑(2Ο€f0Ο„)c_{XX}(\tau) = \sigma^2 \cos(2\pi f_0 \tau). Is it mean-ergodic?

Yes, because cXX(Ο„)c_{XX}(\tau) is bounded

No, because cXX(Ο„)c_{XX}(\tau) does not decay to zero

Yes, because the process is WSS

Common Mistake: Assuming Ergodicity Without Verification

Mistake:

Treating every stationary process as ergodic and equating time averages with ensemble averages without checking conditions.

Correction:

Ergodicity requires additional conditions beyond stationarity β€” specifically, the autocovariance must decay to zero (sufficient condition for mean-ergodicity). A stationary process with persistent correlations (e.g., X(t)=AX(t) = A with random AA) is not ergodic. Always check whether the autocovariance decays before assuming ergodicity.

Common Mistake: Confusing Ergodicity with Finite-Sample Accuracy

Mistake:

Believing that ergodicity guarantees accurate estimates from short observations.

Correction:

Ergodicity guarantees convergence as Tβ†’βˆžT \to \infty. For finite TT, the time-average estimate has variance Var(⟨X⟩T)β‰ˆ12T∫cXX(Ο„) dΟ„\text{Var}(\langle X \rangle_T) \approx \frac{1}{2T}\int c_{XX}(\tau)\,d\tau, which may be large if the correlation decays slowly. The "effective number of independent samples" is approximately 2T/Ο„corr2T / \tau_{\text{corr}}, where Ο„corr=∫0∞∣cXX(Ο„)∣/cXX(0) dΟ„\tau_{\text{corr}} = \int_0^\infty |c_{XX}(\tau)|/c_{XX}(0)\,d\tau is the correlation time.

⚠️Engineering Note

Correlation Time and Estimation Quality

The correlation time Ο„c=∫0∞∣cXX(Ο„)∣/cXX(0) dΟ„\tau_c = \int_0^\infty |c_{XX}(\tau)| / c_{XX}(0)\,d\tau determines how many "effectively independent" samples a time interval of length TT contains: roughly T/Ο„cT / \tau_c. For estimating the mean from a single realization of length TT, the variance of the estimate scales as Οƒ2Ο„c/T\sigma^2 \tau_c / T.

In wireless channel estimation, the coherence time TcT_c plays the role of Ο„c\tau_c. A pilot-based estimator using NpN_p pilot symbols spaced at Tsβ‰ͺTcT_s \ll T_c has effective degrees of freedom approximately NpTs/TcN_p T_s / T_c, not NpN_p.

Why This Matters: Ergodic Capacity of Fading Channels

For a stationary ergodic fading channel with gain H(t)H(t) and noise power N0N_0, the ergodic capacity is C=E[log⁑2(1+∣H∣2β‹…SNR)]C = \mathbb{E}[\log_2(1 + |H|^2 \cdot \text{SNR})]. Ergodicity guarantees that a single codeword spanning many fading realizations (i.e., codeword length ≫\gg coherence time) experiences all channel states, and the achievable rate equals this ensemble average. When the codeword length is comparable to the coherence time, we enter the outage regime, and ergodic capacity no longer applies β€” a fundamentally different analysis is needed.

See full treatment in Chapter 14

Historical Note: Birkhoff's Ergodic Theorem

1931

George David Birkhoff proved his pointwise ergodic theorem in 1931, establishing that time averages of integrable functions converge almost surely for measure-preserving transformations. The term "ergodic" comes from the Greek ergon (work) and hodos (path), coined by Boltzmann in statistical mechanics to describe systems that visit all accessible states over time. Birkhoff's theorem provided the rigorous mathematical foundation for Boltzmann's physical intuition and, decades later, became the theoretical justification for estimating channel statistics from single observations in communications.

πŸŽ“CommIT Contribution(1999)

Ergodic vs. Outage Capacity in Fading Channels

G. Caire, S. Shamai β€” IEEE Trans. Inform. Theory, vol. 45, no. 6, pp. 2007--2019

Caire and Shamai (1999) provided a unified framework for analyzing fading channels with various levels of channel state information at the transmitter and receiver. Their work clarified when ergodic capacity (the ensemble average E[log⁑(1+SNRβ‹…βˆ£H∣2)]\mathbb{E}[\log(1 + \text{SNR}\cdot|H|^2)]) applies versus when outage-based metrics are appropriate. The distinction hinges on whether the coding block length spans many independent fading realizations (ergodic regime) or few (quasi-static regime). This paper exemplifies how the abstract concept of ergodicity has direct, quantitative implications for system design.

ergodic-capacityfadingView Paper β†’

Ergodic Process

A stationary process for which time averages converge to ensemble averages. Mean-ergodicity requires the autocovariance to decay to zero.

Related: Wide-Sense Stationary (WSS), Strict-Sense Stationary (SSS)

Time Average

⟨X⟩T=12Tβˆ«βˆ’TTX(t) dt\langle X \rangle_T = \frac{1}{2T}\int_{-T}^{T} X(t)\,dt. The average of a single realization over a time window. Converges to the ensemble mean for ergodic processes.

Related: Ergodic Process

Key Takeaway

Ergodicity bridges the gap between mathematical expectation (over the ensemble) and practical estimation (from a single time series). A WSS process is mean-ergodic if its autocovariance decays to zero β€” a condition satisfied by most physical processes with finite memory. Without ergodicity, statistical estimation from a single observation is fundamentally impossible.