Modes of Convergence

Why Multiple Modes of Convergence?

In calculus, a sequence of numbers either converges or it does not. But a sequence of random variables can converge in several distinct senses, depending on how strictly we demand agreement between XnX_n and the limit XX. The distinction is not pedantic β€” it determines which tools we can use and which conclusions we can draw. The Weak Law guarantees convergence in probability; the Strong Law upgrades this to almost sure convergence; the CLT delivers convergence in distribution. Each mode tells a different story about what happens as nβ†’βˆžn \to \infty.

Definition:

Almost Sure Convergence

A sequence {Xn}\{X_n\} converges to XX almost surely (a.s.), written Xn→a.s.XX_n \xrightarrow{\text{a.s.}} X, if

P ⁣(lim⁑nβ†’βˆžXn=X)=1.\mathbb{P}\!\left(\lim_{n \to \infty} X_n = X\right) = 1.

Equivalently, for every Ο΅>0\epsilon > 0:

P ⁣(β‹‚n=1βˆžβ‹ƒk=n∞{∣Xkβˆ’X∣β‰₯Ο΅})=0.\mathbb{P}\!\left(\bigcap_{n=1}^{\infty} \bigcup_{k=n}^{\infty} \{|X_k - X| \geq \epsilon\}\right) = 0.

The set of outcomes Ο‰\omega where Xn(Ο‰)β†’ΜΈX(Ο‰)X_n(\omega) \not\to X(\omega) has probability zero.

Almost sure convergence is pathwise: for (almost) every realization of the random experiment, the sequence of numbers X1(Ο‰),X2(Ο‰),…X_1(\omega), X_2(\omega), \ldots converges to X(Ο‰)X(\omega) in the ordinary calculus sense.

Definition:

Convergence in Probability

A sequence {Xn}\{X_n\} converges to XX in probability, written Xn→PXX_n \xrightarrow{P} X, if for every ϡ>0\epsilon > 0:

lim⁑nβ†’βˆžP(∣Xnβˆ’X∣β‰₯Ο΅)=0.\lim_{n \to \infty} \mathbb{P}(|X_n - X| \geq \epsilon) = 0.

This says that the probability of a large deviation between XnX_n and XX vanishes, but it does not preclude occasional excursions.

Definition:

Convergence in rr-th Mean (LrL^r)

For rβ‰₯1r \geq 1, a sequence {Xn}\{X_n\} converges to XX in LrL^r, written Xnβ†’LrXX_n \xrightarrow{L^r} X, if

lim⁑nβ†’βˆžE ⁣[∣Xnβˆ’X∣r]=0.\lim_{n \to \infty} \mathbb{E}\!\left[|X_n - X|^r\right] = 0.

For r=2r = 2, this is mean-square convergence: E[(Xnβˆ’X)2]β†’0\mathbb{E}[(X_n - X)^2] \to 0.

LrL^r convergence controls the rr-th moment of the deviation. It is the strongest mode when rr is large, but it requires the moments to exist.

Definition:

Convergence in Distribution

A sequence {Xn}\{X_n\} converges to XX in distribution, written Xn→dXX_n \xrightarrow{d} X, if

lim⁑nβ†’βˆžFXn(x)=FX(x)\lim_{n \to \infty} F_{X_n}(x) = F_X(x)

at every point xx where FXF_X is continuous.

Equivalently, by the Levy continuity theorem (TLevy Continuity Theorem): Xnβ†’dXX_n \xrightarrow{d} X if and only if Ο•XXn(u)β†’Ο•XX(u){\phi_X}_{X_n}(u) \to {\phi_X}_{X}(u) for all u∈Ru \in \mathbb{R}.

Convergence in distribution is the weakest mode. It says nothing about individual realizations β€” only that the CDFs align. The random variables need not even live on the same probability space.

Definition:

Sample Mean

Given a sequence X1,X2,…X_1, X_2, \ldots of random variables, the sample mean (empirical average) is

XΛ‰n=1nβˆ‘i=1nXi.\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i.

When the XiX_i are i.i.d. with mean μ\mu, we have E[Xˉn]=μ\mathbb{E}[\bar{X}_n] = \mu and Var(Xˉn)=σ2/n\text{Var}(\bar{X}_n) = \sigma^2/n. The law of large numbers describes the sense in which Xˉn\bar{X}_n converges to μ\mu.

Theorem: Relationships Between Convergence Modes

The four modes of convergence satisfy the following implications:

  1. Xnβ†’a.s.Xβ€…β€ŠβŸΉβ€…β€ŠXnβ†’PXX_n \xrightarrow{\text{a.s.}} X \;\Longrightarrow\; X_n \xrightarrow{P} X
  2. Xnβ†’LrXβ€…β€ŠβŸΉβ€…β€ŠXnβ†’PXX_n \xrightarrow{L^r} X \;\Longrightarrow\; X_n \xrightarrow{P} X (for any rβ‰₯1r \geq 1)
  3. Xnβ†’PXβ€…β€ŠβŸΉβ€…β€ŠXnβ†’dXX_n \xrightarrow{P} X \;\Longrightarrow\; X_n \xrightarrow{d} X

No other general implications hold. In particular:

  • Convergence in probability does not imply a.s. convergence.
  • Convergence in distribution does not imply convergence in probability.
  • LrL^r convergence and a.s. convergence are not comparable in general.

Exception: If XX is a constant cc, then Xn→dcX_n \xrightarrow{d} c implies Xn→PcX_n \xrightarrow{P} c.

Almost sure convergence controls every sample path; LrL^r controls the average deviation; convergence in probability allows rare large deviations; convergence in distribution only matches the histograms. Each is weaker than the one above.

Four Modes of Convergence

ModeNotationDefinitionRequires same space?Strength
Almost sureXnβ†’a.s.XX_n \xrightarrow{\text{a.s.}} XP(lim⁑Xn=X)=1\mathbb{P}(\lim X_n = X) = 1YesStrong
In probabilityXnβ†’PXX_n \xrightarrow{P} XP(∣Xnβˆ’X∣β‰₯Ο΅)β†’0\mathbb{P}(|X_n - X| \geq \epsilon) \to 0YesMedium
LrL^r meanXnβ†’LrXX_n \xrightarrow{L^r} XE[∣Xnβˆ’X∣r]β†’0\mathbb{E}[|X_n - X|^r] \to 0YesMedium
In distributionXn→dXX_n \xrightarrow{d} XFXn(x)→FX(x)F_{X_n}(x) \to F_X(x) at cont. pointsNoWeak

Example: Convergence in Probability but Not Almost Surely

Construct a sequence {Xn}\{X_n\} that converges to 00 in probability but not almost surely. This shows that the implication a.s. β‡’\Rightarrow in probability cannot be reversed.

Example: LrL^r Convergence Without Almost Sure Convergence

Let Xn=n⋅1[0,1/n2]X_n = n \cdot \mathbf{1}_{[0, 1/n^2]} on [0,1][0,1] with Lebesgue measure. Show that Xn→L10X_n \xrightarrow{L^1} 0 but XnX_n does not converge to 00 a.s. Then modify the example to get L1L^1 convergence with a.s. convergence.

Convergence Modes: A Visual Comparison

Compare different sequences that illustrate convergence in probability vs. almost sure convergence. Each trajectory is a single realization.

Parameters
500
5

Common Mistake: Convergence in Distribution β‰ \neq Convergence in Probability

Mistake:

Assuming that Xnβ†’dXX_n \xrightarrow{d} X means XnX_n is "close to" XX in some probabilistic sense, and using this to conclude statements about ∣Xnβˆ’X∣|X_n - X|.

Correction:

Convergence in distribution says only that the CDFs agree in the limit β€” the random variables need not even be defined on the same probability space. You cannot write P(∣Xnβˆ’X∣>Ο΅)\mathbb{P}(|X_n - X| > \epsilon) unless they share a common space.

Exception: If the limit X=cX = c is a constant, then convergence in distribution does imply convergence in probability: FXn(x)β†’1xβ‰₯cF_{X_n}(x) \to \mathbf{1}_{x \geq c} forces all the probability mass to collapse to cc.

Common Mistake: A.S. Convergence Is Not Pointwise Convergence Everywhere

Mistake:

Interpreting Xnβ†’a.s.XX_n \xrightarrow{\text{a.s.}} X as "Xn(Ο‰)β†’X(Ο‰)X_n(\omega) \to X(\omega) for every Ο‰βˆˆΞ©\omega \in \Omega." This would be sure convergence, which is strictly stronger.

Correction:

Almost sure convergence allows a set NβŠ‚Ξ©N \subset \Omega with P(N)=0\mathbb{P}(N) = 0 where convergence fails. The word "almost" is doing essential work: there may be exceptional outcomes, but collectively they have zero probability.

Quick Check

If Xn→L2XX_n \xrightarrow{L^2} X, which of the following is guaranteed?

Xn→a.s.XX_n \xrightarrow{\text{a.s.}} X

Xn→PXX_n \xrightarrow{P} X

Xn→L3XX_n \xrightarrow{L^3} X

None of the above

Historical Note: The Long Road to Clarifying Convergence

1909–1933

The distinction between convergence in probability and almost sure convergence was not immediately clear in the early development of probability theory. Emile Borel (1909) and Francesco Cantelli (1917) established the lemmas that connect these concepts. The full taxonomy of convergence modes was systematized by Andrei Kolmogorov in his Grundbegriffe der Wahrscheinlichkeitsrechnung (1933), which placed probability on a rigorous measure-theoretic foundation. It was only after Kolmogorov that the subtle differences between the modes β€” and the counterexamples showing they are genuinely distinct β€” became standard textbook material.

Almost Sure Convergence

Xnβ†’a.s.XX_n \xrightarrow{\text{a.s.}} X means P(lim⁑nβ†’βˆžXn=X)=1\mathbb{P}(\lim_{n \to \infty} X_n = X) = 1. The sequence converges pathwise except on a set of probability zero.

Related: Convergence in Probability, Convergence in Distribution

Convergence in Probability

Xnβ†’PXX_n \xrightarrow{P} X means P(∣Xnβˆ’X∣β‰₯Ο΅)β†’0\mathbb{P}(|X_n - X| \geq \epsilon) \to 0 for every Ο΅>0\epsilon > 0. The probability of large deviations vanishes, but occasional excursions are allowed.

Related: Almost Sure Convergence, Convergence in Distribution

Convergence in Distribution

Xn→dXX_n \xrightarrow{d} X means FXn(x)→FX(x)F_{X_n}(x) \to F_X(x) at every continuity point of FXF_X. The weakest mode of convergence — only the shape of the distribution converges.

Related: Convergence in Probability, Almost Sure Convergence

Key Takeaway

The convergence hierarchy is: almost sure β‡’\Rightarrow in probability β‡’\Rightarrow in distribution, with LrL^r convergence also implying convergence in probability. The reverse implications fail in general, except that convergence in distribution to a constant upgrades to convergence in probability.