Modes of Convergence
Why Multiple Modes of Convergence?
In calculus, a sequence of numbers either converges or it does not. But a sequence of random variables can converge in several distinct senses, depending on how strictly we demand agreement between and the limit . The distinction is not pedantic β it determines which tools we can use and which conclusions we can draw. The Weak Law guarantees convergence in probability; the Strong Law upgrades this to almost sure convergence; the CLT delivers convergence in distribution. Each mode tells a different story about what happens as .
Definition: Almost Sure Convergence
Almost Sure Convergence
A sequence converges to almost surely (a.s.), written , if
Equivalently, for every :
The set of outcomes where has probability zero.
Almost sure convergence is pathwise: for (almost) every realization of the random experiment, the sequence of numbers converges to in the ordinary calculus sense.
Definition: Convergence in Probability
Convergence in Probability
A sequence converges to in probability, written , if for every :
This says that the probability of a large deviation between and vanishes, but it does not preclude occasional excursions.
Definition: Convergence in -th Mean ()
Convergence in -th Mean ()
For , a sequence converges to in , written , if
For , this is mean-square convergence: .
convergence controls the -th moment of the deviation. It is the strongest mode when is large, but it requires the moments to exist.
Definition: Convergence in Distribution
Convergence in Distribution
A sequence converges to in distribution, written , if
at every point where is continuous.
Equivalently, by the Levy continuity theorem (TLevy Continuity Theorem): if and only if for all .
Convergence in distribution is the weakest mode. It says nothing about individual realizations β only that the CDFs align. The random variables need not even live on the same probability space.
Definition: Sample Mean
Sample Mean
Given a sequence of random variables, the sample mean (empirical average) is
When the are i.i.d. with mean , we have and . The law of large numbers describes the sense in which converges to .
Theorem: Relationships Between Convergence Modes
The four modes of convergence satisfy the following implications:
- (for any )
No other general implications hold. In particular:
- Convergence in probability does not imply a.s. convergence.
- Convergence in distribution does not imply convergence in probability.
- convergence and a.s. convergence are not comparable in general.
Exception: If is a constant , then implies .
Almost sure convergence controls every sample path; controls the average deviation; convergence in probability allows rare large deviations; convergence in distribution only matches the histograms. Each is weaker than the one above.
a.s. implies in probability
Fix . Define . Almost sure convergence means . Since , we have (if infinitely many had probability bounded away from zero, would have positive probability). Hence .
$L^r$ implies in probability (Markov)
By Markov's inequality applied to :
In probability implies in distribution
Fix where is continuous. For any :
Taking : . Similarly . Since is continuous at , let : .
Four Modes of Convergence
| Mode | Notation | Definition | Requires same space? | Strength |
|---|---|---|---|---|
| Almost sure | Yes | Strong | ||
| In probability | Yes | Medium | ||
| mean | Yes | Medium | ||
| In distribution | at cont. points | No | Weak |
Example: Convergence in Probability but Not Almost Surely
Construct a sequence that converges to in probability but not almost surely. This shows that the implication a.s. in probability cannot be reversed.
The typewriter sequence
On , define where corresponds to the pair obtained by enumerating: , , , , and so on.
Convergence in probability
For : as (since ). So .
Failure of a.s. convergence
For every , there are infinitely many with (since as grows, will eventually fall in for some ). Thus for any , so a.s. convergence fails completely.
Example: Convergence Without Almost Sure Convergence
Let on with Lebesgue measure. Show that but does not converge to a.s. Then modify the example to get convergence with a.s. convergence.
$L^1$ convergence
. So .
Almost sure convergence also holds here
Actually, , so by Borel-Cantelli, only finitely often a.s., hence a.s.
For a true counterexample, use the typewriter sequence from EConvergence in Probability but Not Almost Surely scaled by : . Then (so convergence), but a.s. convergence fails.
Convergence Modes: A Visual Comparison
Compare different sequences that illustrate convergence in probability vs. almost sure convergence. Each trajectory is a single realization.
Parameters
Common Mistake: Convergence in Distribution Convergence in Probability
Mistake:
Assuming that means is "close to" in some probabilistic sense, and using this to conclude statements about .
Correction:
Convergence in distribution says only that the CDFs agree in the limit β the random variables need not even be defined on the same probability space. You cannot write unless they share a common space.
Exception: If the limit is a constant, then convergence in distribution does imply convergence in probability: forces all the probability mass to collapse to .
Common Mistake: A.S. Convergence Is Not Pointwise Convergence Everywhere
Mistake:
Interpreting as " for every ." This would be sure convergence, which is strictly stronger.
Correction:
Almost sure convergence allows a set with where convergence fails. The word "almost" is doing essential work: there may be exceptional outcomes, but collectively they have zero probability.
Quick Check
If , which of the following is guaranteed?
None of the above
convergence implies convergence in probability by Markov's inequality: .
Historical Note: The Long Road to Clarifying Convergence
1909β1933The distinction between convergence in probability and almost sure convergence was not immediately clear in the early development of probability theory. Emile Borel (1909) and Francesco Cantelli (1917) established the lemmas that connect these concepts. The full taxonomy of convergence modes was systematized by Andrei Kolmogorov in his Grundbegriffe der Wahrscheinlichkeitsrechnung (1933), which placed probability on a rigorous measure-theoretic foundation. It was only after Kolmogorov that the subtle differences between the modes β and the counterexamples showing they are genuinely distinct β became standard textbook material.
Almost Sure Convergence
means . The sequence converges pathwise except on a set of probability zero.
Related: Convergence in Probability, Convergence in Distribution
Convergence in Probability
means for every . The probability of large deviations vanishes, but occasional excursions are allowed.
Related: Almost Sure Convergence, Convergence in Distribution
Convergence in Distribution
means at every continuity point of . The weakest mode of convergence β only the shape of the distribution converges.
Related: Convergence in Probability, Almost Sure Convergence
Key Takeaway
The convergence hierarchy is: almost sure in probability in distribution, with convergence also implying convergence in probability. The reverse implications fail in general, except that convergence in distribution to a constant upgrades to convergence in probability.