The Central Limit Theorem

Why the Gaussian Appears Everywhere

The LLN tells us that XΛ‰nβ‰ˆΞΌ\bar{X}_n \approx \mu for large nn. The CLT answers the follow-up question: how is XΛ‰n\bar{X}_n distributed around ΞΌ\mu? The answer β€” Gaussian, regardless of the underlying distribution β€” is one of the most remarkable facts in all of mathematics.

For telecommunications, this has a profound operational consequence. Thermal noise is the sum of tiny random contributions from many electrons. Aggregate interference in a dense network is the sum of many weak signals. Channel estimation errors accumulate over many pilot symbols. In all these cases, the CLT explains why Gaussian models work so well: the sum of many small independent effects is approximately Gaussian, no matter what the individual effects look like.

Theorem: Central Limit Theorem (CLT)

Let X1,X2,…X_1, X_2, \ldots be i.i.d. with mean ΞΌ\mu, variance Οƒ2∈(0,∞)\sigma^2 \in (0, \infty). Define the standardized sum:

Zn=XΛ‰nβˆ’ΞΌΟƒ/n=βˆ‘i=1nXiβˆ’nΞΌΟƒn.Z_n = \frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}} = \frac{\sum_{i=1}^n X_i - n\mu}{\sigma\sqrt{n}}.

Then:

Zn→dN(0,1),Z_n \xrightarrow{d} \mathcal{N}(0,1),

that is, lim⁑nβ†’βˆžP(Zn≀z)=Ξ¦(z)\lim_{n \to \infty} \mathbb{P}(Z_n \leq z) = \Phi(z) for all z∈Rz \in \mathbb{R}, where Ξ¦\Phi is the standard normal CDF.

The LLN says XΛ‰nβ‰ˆΞΌ\bar{X}_n \approx \mu with fluctuations of order Οƒ/n\sigma/\sqrt{n}. The CLT says the shape of those fluctuations is Gaussian, regardless of the shape of the original distribution. The characteristic function proof reveals why: the Ch.F of the standardized sum converges to eβˆ’u2/2e^{-u^2/2} because all higher-order cumulants vanish after division by n\sqrt{n}.

,

The Operational Content of the CLT

The CLT gives us a practical approximation: for large nn,

XΛ‰nβ‰ˆN ⁣(ΞΌ,Οƒ2n).\bar{X}_n \approx \mathcal{N}\!\left(\mu, \frac{\sigma^2}{n}\right).

This means:

  • P(XΛ‰n>ΞΌ+Ξ΄)β‰ˆQ(Ξ΄n/Οƒ)\mathbb{P}(\bar{X}_n > \mu + \delta) \approx Q(\delta\sqrt{n}/\sigma)
  • The 95% confidence interval for ΞΌ\mu is approximately XΛ‰nΒ±1.96 σ/n\bar{X}_n \pm 1.96\,\sigma/\sqrt{n}
  • The "error" is of order Οƒ/n\sigma/\sqrt{n}, the universal convergence rate for i.i.d. averaging

The question "how large must nn be for this approximation to be good?" is answered by the Berry-Esseen theorem below.

Theorem: Berry-Esseen Theorem

Let X1,X2,…X_1, X_2, \ldots be i.i.d. with mean ΞΌ\mu, variance Οƒ2>0\sigma^2 > 0, and finite third absolute moment ρ=E[∣X1βˆ’ΞΌβˆ£3]<∞\rho = \mathbb{E}[|X_1 - \mu|^3] < \infty. Then for all nβ‰₯1n \geq 1 and all z∈Rz \in \mathbb{R}:

∣P ⁣(XΛ‰nβˆ’ΞΌΟƒ/n≀z)βˆ’Ξ¦(z)βˆ£β‰€Cρσ3n,\left|\mathbb{P}\!\left(\frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}} \leq z\right) - \Phi(z)\right| \leq \frac{C\rho}{\sigma^3 \sqrt{n}},

where CC is a universal constant. The best known value is C≀0.4748C \leq 0.4748 (Shevtsova, 2011).

The CLT says the CDF converges to the Gaussian CDF, but how fast? Berry-Esseen says the convergence rate is O(1/n)O(1/\sqrt{n}), uniformly over all zz. The constant depends on ρ/Οƒ3\rho/\sigma^3, which measures how "non-Gaussian" the original distribution is. For symmetric distributions, the bound is tighter.

, ,

CLT Convergence: Histogram of ZnZ_n Approaching the Gaussian

For i.i.d. samples from a chosen distribution, observe how the histogram of the standardized sum ZnZ_n approaches the standard normal bell curve as nn grows.

Parameters
5

Berry-Esseen Rate: sup⁑z∣Fn(z)βˆ’Ξ¦(z)∣\sup_z |F_n(z) - \Phi(z)| vs. nn

Observe how the maximum CDF error between the standardized sum and the Gaussian decreases as O(1/n)O(1/\sqrt{n}), matching the Berry-Esseen prediction.

Parameters
0.5
200

Example: CLT for Coin Flips: The de Moivre-Laplace Theorem

Let X1,…,XnX_1, \ldots, X_n be i.i.d. Bernoulli(1/2)\text{Bernoulli}(1/2). Use the CLT to approximate P(Snβ‰₯60)\mathbb{P}(S_n \geq 60) where Sn=βˆ‘i=1100XiS_n = \sum_{i=1}^{100} X_i (number of heads in 100 fair coin flips).

Example: CLT for Waiting Times

A call center receives calls with i.i.d. exponential inter-arrival times with rate Ξ»=2\lambda = 2 calls/minute. Approximate the probability that the total time for 100 calls exceeds 55 minutes.

Common Mistake: The CLT Is an Asymptotic Statement β€” Small nn Requires Caution

Mistake:

Applying the CLT with n=5n = 5 or n=10n = 10 and trusting the Gaussian approximation for tail probabilities.

Correction:

The CLT guarantees convergence as nβ†’βˆžn \to \infty, but the rate depends on the underlying distribution. For heavy-tailed distributions (large ρ/Οƒ3\rho/\sigma^3), the Berry-Esseen bound shows convergence can be slow. In particular:

  • Bernoulli with p=0.5p = 0.5: excellent by n=30n = 30
  • Exponential: reasonable by n=50n = 50
  • Chi-squared with 1 d.f. (very skewed): may need n>100n > 100

For tail probabilities (z>2z > 2), the approximation degrades faster than at the center.

Historical Note: The Central Limit Theorem: 200 Years of Refinement

1733–1942

Abraham de Moivre (1733) proved the CLT for fair coin flips. Pierre-Simon Laplace (1812) extended it to general distributions, though his proof lacked rigor by modern standards. The first rigorous proof using characteristic functions was given by Aleksandr Lyapunov (1901). Lindeberg (1922) and Feller (1935) established the definitive necessary and sufficient conditions for the CLT to hold for independent (not necessarily identically distributed) summands. The Berry-Esseen theorem (1941-42) finally quantified the rate of convergence.

The name "central" was coined by George Polya in 1920, reflecting its central importance in probability theory β€” not any geometric meaning.

πŸ”§Engineering Note

Why Gaussian Noise Models Work in Communications

In a communication receiver, the thermal noise at the antenna is the aggregate effect of random electron motion across billions of charge carriers. Each contributes a tiny random voltage, and the CLT guarantees that their sum is approximately Gaussian. This justifies the N(0,Οƒ2)\mathcal{N}(0, \sigma^2) noise model used throughout signal processing and information theory.

More precisely, the noise in a bandwidth WW over a time interval TT is the sum of roughly 2WT2WT independent noise "samples" (by the sampling theorem). For typical values (W=10W = 10 MHz, T=1T = 1 ms), this is 20,00020{,}000 independent contributions β€” more than enough for the CLT to provide an excellent approximation.

Practical Constraints
  • β€’

    The Gaussian model breaks down for impulsive noise (e.g., lightning, power line interference)

  • β€’

    Non-Gaussian interference arises in ultra-dense networks where a few strong interferers dominate

Quick Check

The CLT says that Zn=XΛ‰nβˆ’ΞΌΟƒ/nβ†’dN(0,1)Z_n = \frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}} \xrightarrow{d} \mathcal{N}(0,1). What convergence mode is this?

Almost sure convergence

Convergence in probability

Convergence in distribution

L2L^2 convergence

Central Limit Theorem

The standardized sum of i.i.d. random variables with finite variance converges in distribution to N(0,1)\mathcal{N}(0,1). The convergence rate is O(1/n)O(1/\sqrt{n}) by Berry-Esseen.

Related: Weak Law of Large Numbers, Convergence in Distribution

Berry-Esseen Theorem

Quantifies the CLT convergence rate: sup⁑z∣FZn(z)βˆ’Ξ¦(z)βˆ£β‰€Cρ/(Οƒ3n)\sup_z |F_{Z_n}(z) - \Phi(z)| \leq C\rho/(\sigma^3\sqrt{n}) where ρ=E[∣Xβˆ’ΞΌβˆ£3]\rho = \mathbb{E}[|X - \mu|^3] and C≀0.4748C \leq 0.4748.

Related: Central Limit Theorem

Key Takeaway

The CLT is the reason Gaussian models dominate communications engineering. Whenever a quantity is the sum of many small independent contributions β€” noise, interference, estimation errors β€” it is approximately Gaussian, regardless of the individual distributions. The Berry-Esseen theorem tells us the approximation error is O(1/n)O(1/\sqrt{n}).