Ferkans — Interactive Telecom Tutor

Why We Need Multivariate Convergence

In many applications, we observe not a single average but a vector of averages. A MIMO receiver estimates a channel vector $\hat{\mathbf{h}}$ from pilot observations; a maximum likelihood estimator produces a parameter vector $\hat{\boldsymbol{\theta}}$ . The multivariate CLT tells us these vector estimates are approximately Gaussian, and the delta method lets us propagate this to nonlinear functions of the estimate — for instance, the estimated SNR $\|\hat{\mathbf{h}}\|^2$ or the estimated rate $\log(1 + \widehat{\text{SNR}})$ .

Theorem: Multivariate Central Limit Theorem

Let $\mathbf{X}_1, \mathbf{X}_2, \ldots \in \mathbb{R}^d$ be i.i.d. random vectors with mean $\boldsymbol{\mu} = \mathbb{E}[\mathbf{X}_1]$ and covariance matrix $\boldsymbol{\Sigma} = \text{Cov}(\mathbf{X}_1)$ (with all entries finite). Then:

$\sqrt{n}\!\left(\bar{\mathbf{X}}_n - \boldsymbol{\mu}\right) \xrightarrow{d} \mathcal{N}(\mathbf{0}, \boldsymbol{\Sigma}),$

where $\bar{\mathbf{X}}_n = \frac{1}{n}\sum_{i=1}^n \mathbf{X}_i$ .

Each coordinate of $\bar{\mathbf{X}}_n$ satisfies a scalar CLT. The multivariate CLT additionally captures the correlations between coordinates in the limiting Gaussian distribution.

Proof

Reduce to one-dimensional CLT via Cramer-Wold

The Cramer-Wold device says: $\mathbf{Z}_n \xrightarrow{d} \mathbf{Z}$ in $\mathbb{R}^d$ if and only if $\mathbf{a}^\mathsf{T} \mathbf{Z}_n \xrightarrow{d} \mathbf{a}^\mathsf{T} \mathbf{Z}$ for every $\mathbf{a} \in \mathbb{R}^d$ .

Apply the scalar CLT to each projection

Fix any $\mathbf{a} \in \mathbb{R}^d$ . The scalars $Y_i = \mathbf{a}^\mathsf{T} \mathbf{X}_i$ are i.i.d. with mean $\mathbf{a}^\mathsf{T}\boldsymbol{\mu}$ and variance $\mathbf{a}^\mathsf{T}\boldsymbol{\Sigma}\mathbf{a}$ .

By the scalar CLT: $\sqrt{n}\!\left(\frac{1}{n}\sum_{i=1}^n Y_i - \mathbf{a}^\mathsf{T}\boldsymbol{\mu}\right) \xrightarrow{d} \mathcal{N}(0, \mathbf{a}^\mathsf{T}\boldsymbol{\Sigma}\mathbf{a}).$

Conclude

This holds for every $\mathbf{a}$ , and $\mathcal{N}(0, \mathbf{a}^\mathsf{T}\boldsymbol{\Sigma}\mathbf{a})$ is exactly the distribution of $\mathbf{a}^\mathsf{T}\mathbf{Z}$ where $\mathbf{Z} \sim \mathcal{N}(\mathbf{0}, \boldsymbol{\Sigma})$ . By the Cramer-Wold device:

$\sqrt{n}(\bar{\mathbf{X}}_n - \boldsymbol{\mu}) \xrightarrow{d} \mathcal{N}(\mathbf{0}, \boldsymbol{\Sigma}). \quad \blacksquare$

,

Theorem: Delta Method

Let $\{T_n\}$ be a sequence of random variables satisfying $\sqrt{n}(T_n - \theta) \xrightarrow{d} \mathcal{N}(0, \sigma^2)$ . If $g : \mathbb{R} \to \mathbb{R}$ is differentiable at $\theta$ with $g'(\theta) \neq 0$ , then:

$\sqrt{n}\!\left(g(T_n) - g(\theta)\right) \xrightarrow{d} \mathcal{N}\!\left(0,\; [g'(\theta)]^2 \sigma^2\right).$

Multivariate version: If $\sqrt{n}(\mathbf{T}_n - \boldsymbol{\theta}) \xrightarrow{d} \mathcal{N}(\mathbf{0}, \boldsymbol{\Sigma})$ and $g : \mathbb{R}^d \to \mathbb{R}^k$ is differentiable at $\boldsymbol{\theta}$ with Jacobian $\mathbf{J} = \nabla g(\boldsymbol{\theta})^\mathsf{T}$ , then:

$\sqrt{n}\!\left(g(\mathbf{T}_n) - g(\boldsymbol{\theta})\right) \xrightarrow{d} \mathcal{N}\!\left(\mathbf{0},\; \mathbf{J}\boldsymbol{\Sigma}\mathbf{J}^\mathsf{T}\right).$

If $T_n \approx \theta + \frac{\sigma}{\sqrt{n}} Z$ with $Z \sim \mathcal{N}(0,1)$ , then by Taylor expansion $g(T_n) \approx g(\theta) + g'(\theta)(T_n - \theta) \approx g(\theta) + \frac{g'(\theta)\sigma}{\sqrt{n}} Z$ . The variance of $g(T_n)$ is scaled by $[g'(\theta)]^2$ .

Proof

Taylor expansion

By differentiability of $g$ at $\theta$ :

$g(T_n) - g(\theta) = g'(\theta)(T_n - \theta) + o(|T_n - \theta|).$

Multiply by $\sqrt{n}$ :

$\sqrt{n}(g(T_n) - g(\theta)) = g'(\theta) \cdot \sqrt{n}(T_n - \theta) + \sqrt{n} \cdot o(|T_n - \theta|).$

Handle the remainder

Since $T_n \xrightarrow{P} \theta$ (implied by the CLT assumption) and $|T_n - \theta| = O_P(1/\sqrt{n})$ , the remainder $\sqrt{n} \cdot o(|T_n - \theta|) = o_P(1) \xrightarrow{P} 0$ .

Apply Slutsky

By Slutsky's theorem (below): $g'(\theta) \cdot \sqrt{n}(T_n - \theta) + o_P(1)$ has the same limit in distribution as $g'(\theta) \cdot \sqrt{n}(T_n - \theta)$ . Since $\sqrt{n}(T_n - \theta) \xrightarrow{d} \mathcal{N}(0, \sigma^2)$ :

$\sqrt{n}(g(T_n) - g(\theta)) \xrightarrow{d} \mathcal{N}(0, [g'(\theta)]^2\sigma^2). \quad \blacksquare$

,

Theorem: Slutsky's Theorem

Let $X_n \xrightarrow{d} X$ and $Y_n \xrightarrow{P} c$ where $c$ is a constant. Then:

$X_n + Y_n \xrightarrow{d} X + c$
$Y_n X_n \xrightarrow{d} c X$
$X_n / Y_n \xrightarrow{d} X / c$ (provided $c \neq 0$ )

Convergence in distribution is preserved when we add or multiply by a sequence that converges in probability to a constant. The key word is constant: if $Y_n \xrightarrow{d} Y$ (a non-degenerate limit), the conclusion fails in general because we cannot control the joint distribution of $(X_n, Y_n)$ .

Proof

Proof of (1)

Fix $\epsilon > 0$ and a continuity point $z$ of $F_{X+c}$ . Then:

$\mathbb{P}(X_n + Y_n \leq z) \leq \mathbb{P}(X_n \leq z - c + \epsilon) + \mathbb{P}(|Y_n - c| > \epsilon).$

Taking $n \to \infty$ : $\limsup \mathbb{P}(X_n + Y_n \leq z) \leq F_X(z - c + \epsilon)$ . Similarly for $\liminf$ . Letting $\epsilon \to 0$ and using continuity of $F_X$ at $z - c$ : $\lim \mathbb{P}(X_n + Y_n \leq z) = F_X(z - c) = F_{X+c}(z)$ .

Parts (2) and (3)

The proofs follow a similar pattern using the decomposition $X_n Y_n = X_n c + X_n(Y_n - c)$ and showing the second term vanishes in probability. We omit the details. $\blacksquare$

Example: Delta Method: Asymptotic Distribution of Estimated SNR

A receiver estimates the signal power $P_s$ by averaging $n$ i.i.d. power measurements: $\hat{P}_s = \bar{X}_n$ where $X_i = |Y_i|^2$ and $Y_i$ are received signal samples. The noise power $\sigma^2$ is known. The estimated SNR is $\widehat{\text{SNR}} = \hat{P}_s / \sigma^2$ . Find the asymptotic distribution of $\widehat{\text{SNR}}$ .

Solution

Apply the CLT to the power estimate

Let $\mu_P = \mathbb{E}[X_1]$ and $\sigma_P^2 = \text{Var}(X_1)$ . By the CLT:

$\sqrt{n}(\hat{P}_s - \mu_P) \xrightarrow{d} \mathcal{N}(0, \sigma_P^2).$

Apply the delta method

The function $g(p) = p/\sigma^2$ is linear, so $g'(\mu_P) = 1/\sigma^2$ . By the delta method:

$\sqrt{n}(\widehat{\text{SNR}} - \text{SNR}) \xrightarrow{d} \mathcal{N}\!\left(0, \frac{\sigma_P^2}{\sigma^4}\right).$

Interpret

The estimated SNR is approximately $\mathcal{N}(\text{SNR}, \sigma_P^2/(n\sigma^4))$ . The standard error of the SNR estimate decreases as $1/\sqrt{n}$ . If $X_i$ are exponentially distributed (Rayleigh fading), $\sigma_P^2 = \mu_P^2$ , so the relative error is $1/\sqrt{n}$ regardless of the SNR level.

Delta Method: Distribution of $g(\bar{X}_n)$

Compare the empirical distribution of $g(\bar{X}_n)$ with the CLT + delta method prediction for various nonlinear functions $g$ .

Parameters

g(x)

square: x^2, log: ln(x), sqrt: sqrt(x), reciprocal: 1/x

Distribution of

X_i

n

30

Example: Slutsky in Action: The $t$ -Statistic

Let $X_1, \ldots, X_n$ be i.i.d. with mean $\mu$ and variance $\sigma^2$ . Define the $t$ -statistic $T_n = \frac{\bar{X}_n - \mu}{S_n/\sqrt{n}}$ where $S_n^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i - \bar{X}_n)^2$ is the sample variance. Show that $T_n \xrightarrow{d} \mathcal{N}(0,1)$ .

Solution

CLT for the numerator

By the CLT: $\frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}} \xrightarrow{d} \mathcal{N}(0,1)$ .

SLLN for the sample variance

By the SLLN: $S_n^2 \xrightarrow{\text{a.s.}} \sigma^2$ , hence $S_n \xrightarrow{\text{a.s.}} \sigma$ , and therefore $S_n/\sigma \xrightarrow{P} 1$ .

Apply Slutsky

Write $T_n = \frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}} \cdot \frac{\sigma}{S_n}$ . By Slutsky's theorem (product rule with $Y_n = \sigma/S_n \xrightarrow{P} 1$ ):

$T_n \xrightarrow{d} \mathcal{N}(0,1) \cdot 1 = \mathcal{N}(0,1).$

This justifies using the standard normal for confidence intervals even when $\sigma$ is unknown, provided $n$ is large enough.

🎓CommIT Contribution(2016)

CLT for Massive MIMO Channel Hardening

T. L. Marzetta, G. Caire — IEEE Trans. Information Theory (various)

In massive MIMO systems with $M$ antennas at the base station, the effective channel gain $\|\mathbf{h}\|^2/M$ for a single user concentrates around its mean as $M \to \infty$ — a phenomenon called channel hardening. This is a direct application of the LLN: $\|\mathbf{h}\|^2/M = \frac{1}{M}\sum_{m=1}^M |h_m|^2$ is a sample mean of i.i.d. terms (under i.i.d. Rayleigh fading).

The CLT further characterizes the fluctuations: they are approximately $\mathcal{N}(0, \sigma^4/M)$ where $\sigma^2 = \mathbb{E}[|h_m|^2]$ . For $M = 64$ antennas, the coefficient of variation is $1/\sqrt{M} \approx 12.5\%$ , explaining why massive MIMO dramatically reduces small-scale fading.

massive-mimochannel-hardeningclt-application

Delta Method

If $\sqrt{n}(T_n - \theta) \xrightarrow{d} \mathcal{N}(0, \sigma^2)$ and $g$ is differentiable at $\theta$ , then $\sqrt{n}(g(T_n) - g(\theta)) \xrightarrow{d} \mathcal{N}(0, [g'(\theta)]^2 \sigma^2)$ . Propagates asymptotic normality through smooth transformations.

Related: Central Limit Theorem

Slutsky's Theorem

If $X_n \xrightarrow{d} X$ and $Y_n \xrightarrow{P} c$ (constant), then $X_n + Y_n \xrightarrow{d} X + c$ and $Y_n X_n \xrightarrow{d} cX$ . Essential for combining CLT results with consistent estimators.

Key Takeaway

The multivariate CLT, delta method, and Slutsky's theorem form the asymptotic toolkit for estimation theory. Together, they let us derive the approximate distribution of any smooth function of a sample average — the foundation of confidence intervals, hypothesis tests, and performance analysis in communications.

Convergence of Random Vectors

Why We Need Multivariate Convergence

Theorem: Multivariate Central Limit Theorem

Reduce to one-dimensional CLT via Cramer-Wold

Apply the scalar CLT to each projection

Conclude

Theorem: Delta Method

Taylor expansion

Handle the remainder

Apply Slutsky

Theorem: Slutsky's Theorem

Proof of (1)

Parts (2) and (3)

Example: Delta Method: Asymptotic Distribution of Estimated SNR

Apply the CLT to the power estimate

Apply the delta method

Interpret

Delta Method: Distribution of g(Xˉn)g(\bar{X}_n)g(Xˉn​)

Parameters

Example: Slutsky in Action: The ttt-Statistic

CLT for the numerator

SLLN for the sample variance

Apply Slutsky

CLT for Massive MIMO Channel Hardening

Delta Method

Slutsky's Theorem

Key Takeaway

Delta Method: Distribution of $g(\bar{X}_n)$

Example: Slutsky in Action: The $t$ -Statistic