The Law of Large Numbers and Central Limit Theorem

The Two Great Limit Theorems

The law of large numbers (LLN) and the central limit theorem (CLT) are the most important results in probability theory. The LLN says that sample averages converge to the true mean; the CLT says how: the fluctuations around the mean are Gaussian with standard deviation Οƒ/n\sigma/\sqrt{n}.

Both theorems have elegant proofs via characteristic functions. The strategy is the same: compute the CF of the (normalized) partial sum, show it converges pointwise to a known CF, and invoke the Levy continuity theorem.

Definition:

Convergence in Distribution

A sequence of random variables XnX_n converges in distribution to XX, written Xn→DXX_n \xrightarrow{D} X, if

lim⁑nβ†’βˆžFXn(x)=F(x)\lim_{n \to \infty} F_{X_n}(x) = F(x)

at every continuity point xx of FF.

Convergence in distribution is the weakest mode of convergence. It does not require the XnX_n to be defined on the same probability space. In particular, Xn→DcX_n \xrightarrow{D} c (a constant) is equivalent to Xn→PcX_n \xrightarrow{P} c (convergence in probability).

Theorem: Levy Continuity Theorem

Let {Xn}\{X_n\} be a sequence of random variables with CFs {Ο•n}\{\phi_n\}.

(a) If Xnβ†’DXX_n \xrightarrow{D} X with CF Ο•\phi, then Ο•n(u)β†’Ο•(u)\phi_n(u) \to \phi(u) for all u∈Ru \in \mathbb{R}.

(b) Conversely, if Ο•(u)=lim⁑nβ†’βˆžΟ•n(u)\phi(u) = \lim_{n \to \infty}\phi_n(u) exists for all uu and Ο•\phi is continuous at u=0u = 0, then Ο•\phi is a valid CF of some CDF FF, and Xnβ†’DXX_n \xrightarrow{D} X where F=FF = F.

Convergence in distribution is equivalent to pointwise convergence of CFs, provided the limit is continuous at the origin. This is the bridge that allows us to prove limit theorems by working entirely in the transform domain.

,

Theorem: The Law of Large Numbers (via CF)

Let X1,X2,…X_1, X_2, \ldots be i.i.d. with finite mean ΞΌ=E[X1]\mu = \mathbb{E}[X_1] and partial sum Sn=βˆ‘i=1nXiS_n = \sum_{i=1}^n X_i. Then

Snn→Dμ.\frac{S_n}{n} \xrightarrow{D} \mu.

The sample average Sn/nS_n/n concentrates around the true mean ΞΌ\mu. As nn grows, the CF of Sn/nS_n/n converges to ejΞΌue^{j\mu u}, which is the CF of the constant ΞΌ\mu. A degenerate (constant) limit in distribution implies convergence in probability.

Theorem: The Central Limit Theorem (via CF)

Let X1,X2,…X_1, X_2, \ldots be i.i.d. with mean ΞΌ\mu and variance Οƒ2>0\sigma^2 > 0. Then

Snβˆ’nΞΌΟƒnβ†’DN(0,1).\frac{S_n - n\mu}{\sigma\sqrt{n}} \xrightarrow{D} \mathcal{N}(0, 1).

After centering and scaling, the distribution of the partial sum approaches a Gaussian. The CF of the standardized sum converges to eβˆ’u2/2e^{-u^2/2}, the CF of the standard Gaussian. This is because the higher cumulants (ΞΊ3,ΞΊ4,…\kappa_3, \kappa_4, \ldots) scale as nβˆ’1/2,nβˆ’1,…n^{-1/2}, n^{-1}, \ldots relative to ΞΊ2\kappa_2, so they vanish in the limit.

The CLT in Action: CF Convergence to Gaussian

Watch how the characteristic function of the standardized sum Un=(Snβˆ’nΞΌ)/(Οƒn)U_n = (S_n - n\mu)/(\sigma\sqrt{n}) converges to the Gaussian CF eβˆ’u2/2e^{-u^2/2} as nn increases. The real part (top) and imaginary part (bottom) are shown.

Parameters
1

Visualizing the Central Limit Theorem

This animation shows the PDF of Un=(Snβˆ’nΞΌ)/(Οƒn)U_n = (S_n - n\mu)/(\sigma\sqrt{n}) for increasing nn, overlaid with the standard Gaussian density. The convergence is visible even for small nn and becomes compelling by nβ‰ˆ30n \approx 30.
The standardized sum of nn i.i.d. random variables converges in distribution to N(0,1)\mathcal{N}(0,1) as nn grows.

Historical Note: The Long Road to the Central Limit Theorem

18th-20th century

The CLT has a rich history spanning three centuries. De Moivre (1733) proved the first version for Bernoulli trials. Laplace (1810) extended it using generating functions. Chebyshev (1887) attempted a proof via moments, which Markov (1898) completed. The modern proof via characteristic functions, clean and general, is due to Levy (1925) and Lindeberg (1922). Lindeberg's condition β€” the most general sufficient condition for the CLT β€” removes the identical distribution requirement, needing only that no single summand dominates.

The CLT is arguably the most practically important theorem in all of mathematics: it explains why the Gaussian distribution appears so ubiquitously in nature and engineering.

Common Mistake: The CLT Is About the Limit, Not the Rate

Mistake:

Assuming that the CLT implies a good Gaussian approximation for small nn (e.g., n=5n = 5). The CLT says nothing about the quality of the approximation for finite nn.

Correction:

The Berry-Esseen theorem quantifies the rate: the CDF error is bounded by Cρ/(Οƒ3n)C\rho/(\sigma^3\sqrt{n}) where ρ=E[∣Xβˆ’ΞΌβˆ£3]\rho = \mathbb{E}[|X-\mu|^3]. The constant C<0.4748C < 0.4748 (Shevtsova, 2011). For heavy-tailed distributions or asymmetric distributions, convergence can be slow. Always check with simulations or Berry-Esseen before trusting the Gaussian approximation for moderate nn.

Quick Check

In the CLT proof via CFs, what is the key property of the Gaussian CF eβˆ’u2/2e^{-u^2/2} that ensures the limit is well-defined?

It is continuous at u=0u = 0, satisfying the Levy continuity theorem

It is bounded above by 11

It is the only CF that is real-valued

It is analytic on the whole real line