The Weak Law of Large Numbers

Why the Law of Large Numbers Matters

The sample mean XΛ‰n=1nβˆ‘i=1nXi\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i is the most basic statistical estimator. Every Monte Carlo simulation, every sample average in a communication receiver, every training loss in machine learning relies on the principle that averaging many independent copies of a random quantity produces something close to the true mean. The Weak Law of Large Numbers makes this precise: XΛ‰n\bar{X}_n converges to ΞΌ\mu in probability. The proof is a clean application of Chebyshev's inequality β€” and the simplicity of the argument is part of its beauty.

Theorem: Weak Law of Large Numbers (WLLN)

Let X1,X2,…X_1, X_2, \ldots be i.i.d. random variables with mean ΞΌ=E[X1]\mu = \mathbb{E}[X_1] and finite variance Οƒ2=Var(X1)<∞\sigma^2 = \text{Var}(X_1) < \infty. Then the sample mean converges to ΞΌ\mu in probability:

Xˉn→Pμ,\bar{X}_n \xrightarrow{P} \mu,

that is, for every Ο΅>0\epsilon > 0:

lim⁑nβ†’βˆžP ⁣(∣XΛ‰nβˆ’ΞΌβˆ£β‰₯Ο΅)=0.\lim_{n \to \infty} \mathbb{P}\!\left(|\bar{X}_n - \mu| \geq \epsilon\right) = 0.

The variance of Xˉn\bar{X}_n is σ2/n\sigma^2/n, which shrinks to zero. Chebyshev's inequality translates vanishing variance into vanishing tail probability. The more samples we average, the tighter the distribution of Xˉn\bar{X}_n concentrates around μ\mu.

,

Alternative Proof via Characteristic Functions

The WLLN can also be proved using characteristic functions, as Caire does in the course. The Ch.F of Xˉn\bar{X}_n is ϕXXˉn(u)=(ϕXX(u/n))n{\phi_X}_{\bar{X}_n}(u) = ({\phi_X}_{X}(u/n))^n. Taylor-expanding ϕXX(u/n)=1+jμu/n+o(1/n){\phi_X}_{X}(u/n) = 1 + j\mu u/n + o(1/n) and using the limit (1+a/n+o(1/n))n→ea(1 + a/n + o(1/n))^n \to e^a, we get ϕXXˉn(u)→ejμu{\phi_X}_{\bar{X}_n}(u) \to e^{j\mu u}, which is the Ch.F of the constant μ\mu. By the Levy continuity theorem, Xˉn→dμ\bar{X}_n \xrightarrow{d} \mu, and since the limit is a constant, convergence in distribution upgrades to convergence in probability.

This proof only requires E[∣X1∣]<∞\mathbb{E}[|X_1|] < \infty (no finite variance needed), so it is strictly more general than the Chebyshev proof above.

Example: Empirical Frequency of a Biased Coin

Let X1,X2,…X_1, X_2, \ldots be i.i.d. Bernoulli(p)\text{Bernoulli}(p) with p=0.3p = 0.3. How many coin tosses nn are needed so that the empirical frequency XΛ‰n\bar{X}_n is within 0.010.01 of the true probability pp with probability at least 0.950.95?

WLLN in Action: Sample Mean Trajectories

Watch multiple independent trajectories of Xˉn\bar{X}_n converge to μ\mu as nn grows. Choose the underlying distribution and observe how the convergence rate depends on the variance.

Parameters
0.5

Bernoulli: p; Exponential: lambda; Uniform: b (on [0,b]); Gaussian: sigma

2000
10

Why This Matters: Monte Carlo Simulation in Communications

Every bit error rate (BER) simulation in wireless communications is a direct application of the WLLN. We transmit nn symbols through a simulated channel, count the errors EE, and report P^e=E/n\hat{P}_e = E/n as the estimated error probability. The WLLN guarantees P^eβ†’PPe\hat{P}_e \xrightarrow{P} P_e as nβ†’βˆžn \to \infty.

The practical question is: how large must nn be? For Pe=10βˆ’5P_e = 10^{-5} (a typical target in 5G), the Chebyshev bound suggests nβ‰ˆPe(1βˆ’Pe)/(Ο΅2Ξ΄)n \approx P_e(1-P_e)/(\epsilon^2 \delta), but the CLT (Section 11.4) gives the more useful rule of thumb: we need roughly 100/Pe100/P_e errors, so nβ‰ˆ107n \approx 10^7 symbol transmissions.

See full treatment in The Linear MMSE Estimator

⚠️Engineering Note

Confidence Intervals for Monte Carlo BER Estimates

The Chebyshev-based WLLN bound is overly conservative for practical Monte Carlo design. In practice, we use the CLT approximation:

P^eΒ±zΞ±/2P^e(1βˆ’P^e)n\hat{P}_e \pm z_{\alpha/2} \sqrt{\frac{\hat{P}_e(1-\hat{P}_e)}{n}}

where zΞ±/2=1.96z_{\alpha/2} = 1.96 for a 95% confidence interval. For Pe=10βˆ’5P_e = 10^{-5} and a relative accuracy of 10%, this requires nβ‰ˆ3.84Γ—107n \approx 3.84 \times 10^7 β€” feasible but not cheap. This is why importance sampling and other variance reduction techniques are essential in practice.

Practical Constraints
  • β€’

    For Pe<10βˆ’6P_e < 10^{-6}, direct Monte Carlo becomes impractical (>10910^9 samples)

  • β€’

    Importance sampling can reduce the required sample count by orders of magnitude

Quick Check

The Chebyshev-based proof of the WLLN requires which condition on the i.i.d. sequence?

Finite mean only

Finite variance

Finite fourth moment

The distribution must be continuous

Weak Law of Large Numbers

States that Xˉn→Pμ\bar{X}_n \xrightarrow{P} \mu for i.i.d. {Xi}\{X_i\} with finite mean μ\mu. "Weak" refers to convergence in probability, as opposed to the "strong" law which gives almost sure convergence.

Related: Convergence in Probability, Strong Law of Large Numbers

Common Mistake: Chebyshev's Bound Is Loose β€” Do Not Use It for Design

Mistake:

Using the WLLN's Chebyshev bound P(∣XΛ‰nβˆ’ΞΌβˆ£β‰₯Ο΅)≀σ2/(nΟ΅2)\mathbb{P}(|\bar{X}_n - \mu| \geq \epsilon) \leq \sigma^2/(n\epsilon^2) to determine the required sample size in a real system.

Correction:

The Chebyshev bound is distribution-free and therefore very conservative. For system design, use the CLT normal approximation (Section 11.4) or, for small error probabilities, the Chernoff/Hoeffding exponential bounds (FSP Ch. 9). The Chebyshev bound is a proof tool, not a design tool.