Convergence Concepts and Limit Theorems

Why Convergence Concepts Matter in Telecommunications

Throughout wireless communications and information theory, we repeatedly encounter statements of the form "as the number of samples (or users, or antennas, or code length) grows, a random quantity approaches a deterministic limit." Making such statements precise requires a rigorous notion of convergence for random variables --- and it turns out that there are several inequivalent notions, each with different strengths and applications.

Two limit theorems dominate the field:

  • The Law of Large Numbers (LLN) guarantees that the sample mean Xˉn=1ni=1nXi\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i converges to the true mean μ\mu as nn \to \infty. This underpins:

    • Channel estimation: averaging nn noisy pilot measurements to recover the true channel coefficient hh;
    • Ergodic capacity: the time-averaged mutual information converges to the ergodic capacity over many fading realisations;
    • Monte Carlo simulation: computing bit-error rates by averaging over many independent trials.
  • The Central Limit Theorem (CLT) asserts that the normalised sum of iid random variables converges in distribution to a Gaussian, regardless of the original distribution. This explains:

    • why thermal noise is well modelled as Gaussian (superposition of many independent microscopic contributions);
    • why aggregate interference in large cellular networks is approximately Gaussian;
    • why the Rayleigh fading model arises from many scattered paths (Section 2.3).

The Chernoff bound provides exponentially tight tail probabilities that are essential for analysing error exponents in coding theory and outage probabilities in fading channels.

This section formalises the four modes of convergence, establishes their hierarchy, proves the LLN and CLT, and derives the Chernoff bound --- equipping us with the limit-theorem toolkit needed for the remainder of this text.

Definition:

Convergence in Probability

A sequence of random variables {Xn}n=1\{X_n\}_{n=1}^{\infty} defined on a probability space (Ω,F,P)(\Omega, \mathcal{F}, P) is said to converge in probability to a random variable XX, written

XnPX,X_n \xrightarrow{P} X,

if for every ϵ>0\epsilon > 0,

limnP ⁣(XnX>ϵ)=0.\lim_{n \to \infty} P\!\bigl(|X_n - X| > \epsilon\bigr) = 0.

Equivalently, for every ϵ>0\epsilon > 0 and every δ>0\delta > 0, there exists N=N(ϵ,δ)N = N(\epsilon, \delta) such that for all nNn \geq N,

P ⁣(XnX>ϵ)<δ.P\!\bigl(|X_n - X| > \epsilon\bigr) < \delta.

Interpretation: For large nn, the probability that XnX_n deviates from XX by more than any prescribed tolerance ϵ\epsilon becomes arbitrarily small. However, convergence in probability does not guarantee that Xn(ω)X(ω)X_n(\omega) \to X(\omega) for every (or even almost every) sample point ω\omega.

In channel estimation, if h^n\hat{h}_n is an estimator of the channel coefficient hh based on nn pilot symbols, then h^nPh\hat{h}_n \xrightarrow{P} h means the estimator is consistent: with enough pilots, the estimate is arbitrarily close to the true channel with high probability.

, ,

Definition:

Almost Sure Convergence

A sequence {Xn}\{X_n\} converges almost surely (a.s.) to XX, written

Xna.s.X,X_n \xrightarrow{a.s.} X,

if

P ⁣(limnXn=X)=1.P\!\left(\lim_{n \to \infty} X_n = X\right) = 1.

That is, the set of sample points ωΩ\omega \in \Omega for which Xn(ω)X(ω)X_n(\omega) \to X(\omega) as nn \to \infty has probability one.

Equivalent formulation (via limsup):

P ⁣(ϵ>0lim infn{XnXϵ})=1,P\!\left(\bigcap_{\epsilon > 0} \liminf_{n \to \infty} \{|X_n - X| \leq \epsilon\}\right) = 1,

or equivalently, for every ϵ>0\epsilon > 0,

P ⁣(lim supn{XnX>ϵ})=0.P\!\left(\limsup_{n \to \infty} \{|X_n - X| > \epsilon\}\right) = 0.

Comparison with convergence in probability: Almost sure convergence is pointwise convergence of Xn(ω)X(ω)X_n(\omega) \to X(\omega) outside a null set, whereas convergence in probability only controls the probability of deviation at each fixed nn. Almost sure convergence is strictly stronger.

The distinction matters in practice. If an adaptive equaliser's tap weights converge almost surely to the optimal Wiener solution, then on (almost) every sample path the equaliser eventually "locks on" and stays near the optimum. Mere convergence in probability would allow the equaliser to occasionally wander far from the optimum, even at late times --- though with vanishing probability.

,

Definition:

Convergence in Distribution

A sequence {Xn}\{X_n\} converges in distribution to XX, written

XndX,X_n \xrightarrow{d} X,

if

limnFXn(x)=FX(x)\lim_{n \to \infty} F_{X_n}(x) = F_X(x)

at every point xx where FXF_X is continuous. Here FXnF_{X_n} and FXF_X denote the cumulative distribution functions of XnX_n and XX, respectively.

Key properties:

  • Convergence in distribution is the weakest of the four convergence modes.
  • It does not require XnX_n and XX to be defined on the same probability space.
  • By the Portmanteau theorem, XndXX_n \xrightarrow{d} X is equivalent to E[g(Xn)]E[g(X)]E[g(X_n)] \to E[g(X)] for every bounded continuous function gg.
  • Levy's continuity theorem: XndXX_n \xrightarrow{d} X if and only if ΦXn(ω)ΦX(ω)\Phi_{X_n}(\omega) \to \Phi_X(\omega) pointwise for all ω\omega, where Φ\Phi denotes the characteristic function.

The CLT is a statement about convergence in distribution: the standardised sum converges in distribution to N(0,1)\mathcal{N}(0, 1). This does not mean the sum "becomes" Gaussian in any pathwise sense --- only that its CDF (and hence all tail probabilities) approaches the Gaussian CDF.

, ,

Definition:

Convergence in Mean Square (L2L^2 Convergence)

A sequence {Xn}\{X_n\} converges in mean square (or in L2L^2) to XX, written

XnL2X,X_n \xrightarrow{L^2} X,

if

limnE ⁣[XnX2]=0.\lim_{n \to \infty} E\!\bigl[|X_n - X|^2\bigr] = 0.

This requires E[Xn2]<E[|X_n|^2] < \infty and E[X2]<E[|X|^2] < \infty (i.e., XnX_n and XX must be square-integrable).

Immediate consequence: If XnL2XX_n \xrightarrow{L^2} X, then

E[Xn]E[X]andVar(XnX)0.E[X_n] \to E[X] \quad \text{and} \quad \mathrm{Var}(X_n - X) \to 0.

More generally, convergence in LpL^p (i.e., E[XnXp]0E[|X_n - X|^p] \to 0) is defined analogously for p1p \geq 1.

In estimation theory, the mean-squared error MSE=E[h^nh2]\mathrm{MSE} = E[|\hat{h}_n - h|^2] is precisely the L2L^2 distance between the estimator and the true parameter. An estimator that converges in mean square is consistent in a particularly strong sense: both its bias and its variance vanish. The MMSE (minimum mean-squared error) estimator minimises this distance at every nn.

,

Theorem: Hierarchy of Convergence Modes

The four convergence modes are related by the following implications:

a.s.    in probability    in distribution.\text{a.s.} \;\Longrightarrow\; \text{in probability} \;\Longrightarrow\; \text{in distribution}.

mean square    in probability    in distribution.\text{mean square} \;\Longrightarrow\; \text{in probability} \;\Longrightarrow\; \text{in distribution}.

In a diagram:

a.s.mean square (L2)in probabilityin distribution\begin{array}{ccc} \text{a.s.} & & \text{mean square ($L^2$)} \\ & \searrow \quad \swarrow & \\ & \text{in probability} & \\ & \downarrow & \\ & \text{in distribution} & \end{array}

No other implications hold in general:

  • Convergence in probability does ̸ ⁣ ⁣\not\!\!\Rightarrow a.s. convergence.
  • Convergence in probability does ̸ ⁣ ⁣\not\!\!\Rightarrow mean square convergence.
  • Mean square convergence does ̸ ⁣ ⁣\not\!\!\Rightarrow a.s. convergence.
  • Convergence in distribution does ̸ ⁣ ⁣\not\!\!\Rightarrow convergence in probability (unless XX is a constant).

Special case: If XndcX_n \xrightarrow{d} c where cc is a deterministic constant, then XnPcX_n \xrightarrow{P} c.

The hierarchy reflects increasing "strength" of control over the random fluctuations of XnX_n around XX:

  • In distribution controls only the shape of the distribution of XnX_n (the CDF approaches the target CDF).
  • In probability controls the probability of deviation at each fixed nn (the tail P(XnX>ϵ)0P(|X_n - X| > \epsilon) \to 0).
  • Almost surely controls the sample paths (the sequence Xn(ω)X(ω)X_n(\omega) \to X(\omega) for a.e. ω\omega).
  • Mean square controls the average squared deviation (E[XnX2]0E[|X_n - X|^2] \to 0).

The key insight for "mean square \Rightarrow in probability" is Markov's inequality: if the expected squared deviation is small, then the probability of a large deviation must also be small. The implication "a.s. \Rightarrow in probability" uses the fact that pointwise convergence on a set of measure one is stronger than just having small deviation probabilities.

, ,

Theorem: Weak Law of Large Numbers (WLLN)

Let X1,X2,X_1, X_2, \ldots be independent and identically distributed (iid) random variables with finite mean μ=E[Xi]\mu = E[X_i] and finite variance σ2=Var(Xi)<\sigma^2 = \mathrm{Var}(X_i) < \infty. Define the sample mean

Xˉn=1ni=1nXi.\bar{X}_n = \frac{1}{n}\sum_{i=1}^{n} X_i.

Then Xˉn\bar{X}_n converges to μ\mu in probability:

XˉnPμ.\bar{X}_n \xrightarrow{P} \mu.

That is, for every ϵ>0\epsilon > 0,

limnP ⁣(Xˉnμ>ϵ)=0.\lim_{n \to \infty} P\!\bigl(|\bar{X}_n - \mu| > \epsilon\bigr) = 0.

Averaging nn iid samples reduces the variance by a factor of 1/n1/n: Var(Xˉn)=σ2/n\mathrm{Var}(\bar{X}_n) = \sigma^2/n. By Chebyshev's inequality, the probability that Xˉn\bar{X}_n deviates from μ\mu by more than ϵ\epsilon is at most σ2/(nϵ2)\sigma^2/(n\epsilon^2), which vanishes as nn \to \infty.

Physically: averaging many noisy channel measurements "averages out" the noise, leaving the true channel coefficient. The more pilots we transmit, the better the estimate --- this is the LLN in action.

, ,

Theorem: Strong Law of Large Numbers (SLLN)

Let X1,X2,X_1, X_2, \ldots be iid random variables with finite mean μ=E[Xi]\mu = E[X_i]. Then the sample mean converges to μ\mu almost surely:

Xˉn=1ni=1nXia.s.μ.\bar{X}_n = \frac{1}{n}\sum_{i=1}^{n} X_i \xrightarrow{a.s.} \mu.

That is,

P ⁣(limnXˉn=μ)=1.P\!\left(\lim_{n \to \infty} \bar{X}_n = \mu\right) = 1.

Note that finite variance is not required; finite mean suffices.

The SLLN strengthens the WLLN from convergence in probability to almost sure convergence. It asserts that on (almost) every sample path ω\omega, the running average Xˉn(ω)\bar{X}_n(\omega) eventually settles down to μ\mu and stays there. This is the rigorous justification for the "frequency interpretation" of probability: if we repeat an experiment indefinitely, the relative frequency of any event converges to its probability.

The proof requires more sophisticated tools from measure theory (e.g., the Borel--Cantelli lemma, truncation arguments, or Kolmogorov's inequality) and is beyond our scope here. The key point is that the SLLN provides a pathwise guarantee that the WLLN does not.

,

Theorem: Central Limit Theorem (CLT)

Let X1,X2,X_1, X_2, \ldots be iid random variables with finite mean μ=E[Xi]\mu = E[X_i] and finite variance σ2=Var(Xi)>0\sigma^2 = \mathrm{Var}(X_i) > 0. Define the standardised partial sum

Zn=Xˉnμσ/n=i=1nXinμσn.Z_n = \frac{\bar{X}_n - \mu}{\sigma / \sqrt{n}} = \frac{\sum_{i=1}^{n} X_i - n\mu}{\sigma\sqrt{n}}.

Then ZnZ_n converges in distribution to a standard normal random variable:

ZndN(0,1).Z_n \xrightarrow{d} \mathcal{N}(0, 1).

Equivalently, for every zRz \in \mathbb{R},

limnP(Znz)=Φ(z)=12πzet2/2dt.\lim_{n \to \infty} P(Z_n \leq z) = \Phi(z) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{z} e^{-t^2/2}\,dt.

The CLT is arguably the most important theorem in probability. It says that the sum of many independent, identically distributed random variables --- regardless of their individual distribution --- is approximately Gaussian after proper centering and scaling. The only requirements are a finite mean and a finite, positive variance.

The proof strategy via characteristic functions is elegant: show that the characteristic function of ZnZ_n converges pointwise to eω2/2e^{-\omega^2/2}, which is the characteristic function of N(0,1)\mathcal{N}(0, 1). By Levy's continuity theorem, pointwise convergence of characteristic functions implies convergence in distribution.

, ,

Central Limit Theorem in Action

Parameters
20
10000

Theorem: Chernoff Bound

Let XX be a random variable with moment-generating function MX(s)=E[esX]M_X(s) = E[e^{sX}]. Then for any aRa \in \mathbb{R},

P(Xa)infs>0  esaMX(s).P(X \geq a) \leq \inf_{s > 0}\; e^{-sa}\,M_X(s).

Similarly, for the left tail,

P(Xa)infs<0  esaMX(s).P(X \leq a) \leq \inf_{s < 0}\; e^{-sa}\,M_X(s).

The bound is obtained by optimising the free parameter ss to get the tightest possible exponential bound.

The Chernoff bound starts from Markov's inequality applied to the exponential function esXe^{sX} (which is nonnegative and monotone increasing for s>0s > 0). The extra parameter ss is then optimised to yield the tightest bound. Because the bound has an exponential form esaMX(s)e^{-sa}M_X(s), it typically decays exponentially in the "excess" aE[X]a - E[X], making it far sharper than Chebyshev's inequality for large deviations.

The Chernoff bound is the foundation of large deviation theory and is directly related to the error exponent in channel coding: the probability of decoding error decays exponentially with block length nn, and the exponent is characterised via a Chernoff-type optimisation.

, ,

Example: Chernoff Bound for the Sum of iid Bernoulli Random Variables

Let X1,X2,,XnX_1, X_2, \ldots, X_n be iid Bernoulli(p)\mathrm{Bernoulli}(p) random variables with 0<p<10 < p < 1, and let Sn=i=1nXiS_n = \sum_{i=1}^{n} X_i. Use the Chernoff bound to show that for any a>npa > np (i.e., above the mean),

P(Sna)exp ⁣(nD(a/np)),P(S_n \geq a) \leq \exp\!\bigl(-n\,D(a/n \,\|\, p)\bigr),

where D(qp)=qlnqp+(1q)ln1q1pD(q \| p) = q \ln\frac{q}{p} + (1 - q)\ln\frac{1-q}{1-p} is the Kullback--Leibler divergence (relative entropy) between Bernoulli(q)\mathrm{Bernoulli}(q) and Bernoulli(p)\mathrm{Bernoulli}(p).

Evaluate numerically for n=100n = 100, p=0.3p = 0.3, and a=50a = 50 (i.e., P(S10050)P(S_{100} \geq 50)).

,

Central Limit Theorem: Watching Convergence to Gaussian

Histograms of the standardised sum of nn i.i.d. uniform random variables for n=1,2,3,5,10,30n = 1, 2, 3, 5, 10, 30, with the theoretical Gaussian PDF overlaid. The convergence is visually striking even for moderate nn.
As nn grows, the sum of i.i.d. random variables — regardless of the original distribution — converges in distribution to a Gaussian. By n=30n = 30 the histogram is nearly indistinguishable from the bell curve.

Why This Matters: CLT Justifies the Gaussian Interference Model

In a large cellular network with KK co-channel interferers, the aggregate interference at a receiver is

I=k=1KPkhksk,I = \sum_{k=1}^{K} \sqrt{P_k}\,h_k\,s_k,

where PkP_k is the received power from the kk-th interferer, hkh_k is its fading coefficient, and sks_k is its data symbol. The terms Pkhksk\sqrt{P_k}\,h_k\,s_k are (approximately) independent and identically distributed.

When KK is large, the Central Limit Theorem guarantees that II is approximately Gaussian, regardless of the distributions of hkh_k and sks_k. This justifies the widespread modelling assumption:

ICN(0,σI2),I \approx \mathcal{CN}(0, \sigma_I^2),

where σI2=k=1KPkE[hk2]E[sk2]\sigma_I^2 = \sum_{k=1}^K P_k\,E[|h_k|^2]\,E[|s_k|^2].

Practical implications:

  1. SINR analysis: Under the Gaussian interference assumption, the signal-to-interference-plus-noise ratio (SINR) fully determines the achievable rate via C=log2(1+SINR)C = \log_2(1 + \mathrm{SINR}), just as for AWGN channels.

  2. Stochastic geometry: In Poisson cellular network models (e.g., the PPP model of Andrews et al., 2011), the aggregate interference from infinitely many base stations is analysed using the CLT and its refinements.

  3. Massive MIMO: When a base station with MM antennas serves KK users, the effective interference after matched filtering involves sums of MM iid terms. As MM \to \infty, these sums "harden" (by the LLN) and their fluctuations are Gaussian (by the CLT), leading to channel hardening and favourable propagation --- the two pillars of massive MIMO theory.

See full treatment in Chapter 4، Section 5

Historical Note: De Moivre, Laplace, and Lyapunov: The Long Road to the CLT

The Central Limit Theorem has one of the longest gestation periods in the history of mathematics, spanning nearly two centuries:

  • Abraham de Moivre (1733) proved the earliest version: the binomial distribution Bin(n,1/2)\mathrm{Bin}(n, 1/2), properly standardised, converges to what we now call the normal distribution. He published this in The Doctrine of Chances as a computational tool for approximating binomial probabilities. De Moivre did not have the concept of a "distribution" --- he worked directly with the ratio of the middle binomial coefficient to 2n2^n.

  • Pierre-Simon Laplace (1812) extended de Moivre's result to arbitrary p1/2p \neq 1/2 in his monumental Theorie analytique des probabilites, obtaining what is now called the de Moivre--Laplace theorem. Laplace also recognised the broader principle: sums of "errors" tend to follow the "law of errors" (the Gaussian).

  • Pafnuty Chebyshev (1867) and his student Andrey Markov (1900) proved versions of the CLT under increasingly general conditions, using the method of moments.

  • Aleksandr Lyapunov (1901) gave the first rigorous proof of the CLT for independent (not necessarily identically distributed) random variables using characteristic functions, under a condition now known as the Lyapunov condition. This is essentially the modern proof strategy.

  • Jarl Waldemar Lindeberg (1922) and William Feller (1935) established the Lindeberg--Feller theorem, giving necessary and sufficient conditions for the CLT to hold for independent (non-identically distributed) summands.

The CLT that we prove in this section (for iid summands with finite variance) is the simplest and most commonly used version. The general Lindeberg--Feller form is needed in wireless when interferers have unequal powers or different fading statistics.

,

Quick Check

Consider the following statement: "If XnPXX_n \xrightarrow{P} X, then Xna.s.XX_n \xrightarrow{a.s.} X." Is this statement true or false?

True

False

Quick Check

Let X1,,XnX_1, \ldots, X_n be iid with mean μ=5\mu = 5 and variance σ2=4\sigma^2 = 4. For n=100n = 100, what is the approximate distribution of the sample mean Xˉ100\bar{X}_{100} according to the CLT?

N(5,4)\mathcal{N}(5, 4)

N(5,0.04)\mathcal{N}(5, 0.04)

N(0,1)\mathcal{N}(0, 1)

N(5,0.4)\mathcal{N}(5, 0.4)

Quick Check

The Chernoff bound is obtained by applying Markov's inequality to which random variable?

XX directly

Xμ2|X - \mu|^2 (Chebyshev approach)

esXe^{sX} for optimised s>0s > 0

X2X^2

Convergence in Probability

A sequence {Xn}\{X_n\} converges in probability to XX (XnPXX_n \xrightarrow{P} X) if P(XnX>ϵ)0P(|X_n - X| > \epsilon) \to 0 for every ϵ>0\epsilon > 0. This is weaker than almost sure convergence but stronger than convergence in distribution. It is the mode of convergence established by the Weak Law of Large Numbers.

Related: Convergence in Probability, Weak Law of Large Numbers (WLLN), Hierarchy of Convergence Modes

Almost Sure Convergence

A sequence {Xn}\{X_n\} converges almost surely to XX (Xna.s.XX_n \xrightarrow{a.s.} X) if P(limnXn=X)=1P(\lim_{n\to\infty} X_n = X) = 1. This is pathwise convergence outside a null set and is strictly stronger than convergence in probability. It is the mode of convergence established by the Strong Law of Large Numbers.

Related: Almost Sure Convergence, Strong Law of Large Numbers (SLLN), Hierarchy of Convergence Modes

Convergence in Distribution

A sequence {Xn}\{X_n\} converges in distribution to XX (XndXX_n \xrightarrow{d} X) if FXn(x)FX(x)F_{X_n}(x) \to F_X(x) at all continuity points of FXF_X. This is the weakest convergence mode and does not require the random variables to be defined on the same probability space. It is the mode of convergence in the Central Limit Theorem.

Related: Convergence in Distribution, Central Limit Theorem (CLT), Hierarchy of Convergence Modes

Law of Large Numbers (LLN)

The Law of Large Numbers states that the sample mean Xˉn=1ni=1nXi\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i of iid random variables converges to the population mean μ\mu. The Weak Law (WLLN) gives convergence in probability; the Strong Law (SLLN) gives almost sure convergence. The WLLN requires finite variance; the SLLN requires only finite mean.

Related: Weak Law of Large Numbers (WLLN), Strong Law of Large Numbers (SLLN), Convergence in Probability, Almost Sure Convergence

Central Limit Theorem (CLT)

For iid random variables with mean μ\mu and finite variance σ2>0\sigma^2 > 0, the standardised sum Zn=(Xˉnμ)/(σ/n)Z_n = (\bar{X}_n - \mu)/(\sigma/\sqrt{n}) converges in distribution to N(0,1)\mathcal{N}(0,1). This is the fundamental reason why the Gaussian distribution appears ubiquitously in communications: thermal noise, aggregate interference, and fading envelopes all arise from summing many independent contributions.

Related: Central Limit Theorem (CLT), Central Limit Theorem in Action, CLT Justifies the Gaussian Interference Model, De Moivre, Laplace, and Lyapunov: The Long Road to the CLT

Chernoff Bound

An exponential tail bound: P(Xa)infs>0esaMX(s)P(X \geq a) \leq \inf_{s>0} e^{-sa} M_X(s), obtained by applying Markov's inequality to esXe^{sX} and optimising over the tilting parameter ss. The Chernoff bound provides exponentially tight estimates and is the basis for error exponent analysis in coding theory and large deviations theory.

Related: Chernoff Bound, Chernoff Bound for the Sum of iid Bernoulli Random Variables

Common Mistake: Using the CLT for Small Sample Sizes

Mistake:

"The CLT says the sum is Gaussian, so even for n=3n = 3 or n=5n = 5 samples the Gaussian approximation should be accurate."

Correction:

The CLT is an asymptotic result: it guarantees convergence in distribution as nn \to \infty, but says nothing about the rate of convergence or the accuracy at any finite nn. The quality of the Gaussian approximation at finite nn depends critically on the shape of the underlying distribution:

  • Symmetric, light-tailed distributions (e.g., uniform, symmetric triangular): the approximation is excellent even for n=510n = 5{-}10.
  • Skewed distributions (e.g., exponential, chi-squared with few degrees of freedom): the Gaussian approximation can be poor for n<2030n < 20{-}30. The skewness of the sum decreases as 1/n1/\sqrt{n} (by the Berry--Esseen theorem, the CDF error is bounded by CE[X13]/(σ3n)C \cdot E[|X_1|^3] / (\sigma^3 \sqrt{n}) where C0.4748C \leq 0.4748), so highly skewed distributions need larger nn.
  • Heavy-tailed distributions (e.g., Pareto with infinite variance): the CLT does not apply at all, and the sum converges to a stable distribution instead of a Gaussian.

In wireless: The Gaussian interference model (justified by the CLT) is reliable in dense networks with K2030K \geq 20{-}30 interferers. For sparse networks with K=35K = 3{-}5 dominant interferers, the Gaussian assumption can significantly underestimate the tail of the interference distribution (and hence underestimate outage probability). In such cases, the exact interference distribution or a more refined approximation (e.g., using the Gamma distribution) should be used.

Key Takeaway

The core messages of this section:

  1. Four modes, one hierarchy. Almost sure and mean square convergence each imply convergence in probability, which in turn implies convergence in distribution. No other implications hold in general. Choosing the right convergence mode is a modelling decision: the WLLN uses convergence in probability, the SLLN uses almost sure convergence, and the CLT uses convergence in distribution.

  2. The LLN: averaging works. The sample mean of iid observations converges to the true mean. This is why pilot- based channel estimation, Monte Carlo simulation, and ergodic capacity arguments are all valid: with enough samples, the average faithfully represents the expectation.

  3. The CLT: sums become Gaussian. The standardised sum of iid random variables converges in distribution to N(0,1)\mathcal{N}(0, 1), regardless of the original distribution (as long as the variance is finite). This single theorem explains why:

    • Thermal noise is Gaussian (many microscopic contributions).
    • Aggregate interference in large networks is Gaussian.
    • The Rayleigh fading model arises from many scattered paths (the in-phase and quadrature components are Gaussian by the CLT, so the envelope is Rayleigh).
  4. The Chernoff bound: exponential tail control. For sums of iid random variables, the Chernoff bound provides exponentially decaying tail probabilities. The decay rate is governed by the KL divergence (for Bernoulli sums) or more generally by the Legendre--Fenchel transform of the log-MGF. This is the key tool for analysing error exponents in coding theory.

  5. Respect the limits. The CLT is asymptotic; for small nn or heavy-tailed distributions, the Gaussian approximation can be dangerously inaccurate. Always check whether nn is "large enough" for the specific distribution at hand.