Convergence of Random Vectors

Why We Need Multivariate Convergence

In many applications, we observe not a single average but a vector of averages. A MIMO receiver estimates a channel vector h^\hat{\mathbf{h}} from pilot observations; a maximum likelihood estimator produces a parameter vector ΞΈ^\hat{\boldsymbol{\theta}}. The multivariate CLT tells us these vector estimates are approximately Gaussian, and the delta method lets us propagate this to nonlinear functions of the estimate β€” for instance, the estimated SNR βˆ₯h^βˆ₯2\|\hat{\mathbf{h}}\|^2 or the estimated rate log⁑(1+SNR^)\log(1 + \widehat{\text{SNR}}).

Theorem: Multivariate Central Limit Theorem

Let X1,X2,β€¦βˆˆRd\mathbf{X}_1, \mathbf{X}_2, \ldots \in \mathbb{R}^d be i.i.d. random vectors with mean ΞΌ=E[X1]\boldsymbol{\mu} = \mathbb{E}[\mathbf{X}_1] and covariance matrix Ξ£=Cov(X1)\boldsymbol{\Sigma} = \text{Cov}(\mathbf{X}_1) (with all entries finite). Then:

n ⁣(XΛ‰nβˆ’ΞΌ)β†’dN(0,Ξ£),\sqrt{n}\!\left(\bar{\mathbf{X}}_n - \boldsymbol{\mu}\right) \xrightarrow{d} \mathcal{N}(\mathbf{0}, \boldsymbol{\Sigma}),

where XΛ‰n=1nβˆ‘i=1nXi\bar{\mathbf{X}}_n = \frac{1}{n}\sum_{i=1}^n \mathbf{X}_i.

Each coordinate of Xˉn\bar{\mathbf{X}}_n satisfies a scalar CLT. The multivariate CLT additionally captures the correlations between coordinates in the limiting Gaussian distribution.

,

Theorem: Delta Method

Let {Tn}\{T_n\} be a sequence of random variables satisfying n(Tnβˆ’ΞΈ)β†’dN(0,Οƒ2)\sqrt{n}(T_n - \theta) \xrightarrow{d} \mathcal{N}(0, \sigma^2). If g:Rβ†’Rg : \mathbb{R} \to \mathbb{R} is differentiable at ΞΈ\theta with gβ€²(ΞΈ)β‰ 0g'(\theta) \neq 0, then:

n ⁣(g(Tn)βˆ’g(ΞΈ))β†’dN ⁣(0,β€…β€Š[gβ€²(ΞΈ)]2Οƒ2).\sqrt{n}\!\left(g(T_n) - g(\theta)\right) \xrightarrow{d} \mathcal{N}\!\left(0,\; [g'(\theta)]^2 \sigma^2\right).

Multivariate version: If n(Tnβˆ’ΞΈ)β†’dN(0,Ξ£)\sqrt{n}(\mathbf{T}_n - \boldsymbol{\theta}) \xrightarrow{d} \mathcal{N}(\mathbf{0}, \boldsymbol{\Sigma}) and g:Rdβ†’Rkg : \mathbb{R}^d \to \mathbb{R}^k is differentiable at ΞΈ\boldsymbol{\theta} with Jacobian J=βˆ‡g(ΞΈ)T\mathbf{J} = \nabla g(\boldsymbol{\theta})^\mathsf{T}, then:

n ⁣(g(Tn)βˆ’g(ΞΈ))β†’dN ⁣(0,β€…β€ŠJΞ£JT).\sqrt{n}\!\left(g(\mathbf{T}_n) - g(\boldsymbol{\theta})\right) \xrightarrow{d} \mathcal{N}\!\left(\mathbf{0},\; \mathbf{J}\boldsymbol{\Sigma}\mathbf{J}^\mathsf{T}\right).

If Tnβ‰ˆΞΈ+ΟƒnZT_n \approx \theta + \frac{\sigma}{\sqrt{n}} Z with Z∼N(0,1)Z \sim \mathcal{N}(0,1), then by Taylor expansion g(Tn)β‰ˆg(ΞΈ)+gβ€²(ΞΈ)(Tnβˆ’ΞΈ)β‰ˆg(ΞΈ)+gβ€²(ΞΈ)ΟƒnZg(T_n) \approx g(\theta) + g'(\theta)(T_n - \theta) \approx g(\theta) + \frac{g'(\theta)\sigma}{\sqrt{n}} Z. The variance of g(Tn)g(T_n) is scaled by [gβ€²(ΞΈ)]2[g'(\theta)]^2.

,

Theorem: Slutsky's Theorem

Let Xn→dXX_n \xrightarrow{d} X and Yn→PcY_n \xrightarrow{P} c where cc is a constant. Then:

  1. Xn+Yn→dX+cX_n + Y_n \xrightarrow{d} X + c
  2. YnXn→dcXY_n X_n \xrightarrow{d} c X
  3. Xn/Yn→dX/cX_n / Y_n \xrightarrow{d} X / c (provided c≠0c \neq 0)

Convergence in distribution is preserved when we add or multiply by a sequence that converges in probability to a constant. The key word is constant: if Yn→dYY_n \xrightarrow{d} Y (a non-degenerate limit), the conclusion fails in general because we cannot control the joint distribution of (Xn,Yn)(X_n, Y_n).

Example: Delta Method: Asymptotic Distribution of Estimated SNR

A receiver estimates the signal power PsP_s by averaging nn i.i.d. power measurements: P^s=XΛ‰n\hat{P}_s = \bar{X}_n where Xi=∣Yi∣2X_i = |Y_i|^2 and YiY_i are received signal samples. The noise power Οƒ2\sigma^2 is known. The estimated SNR is SNR^=P^s/Οƒ2\widehat{\text{SNR}} = \hat{P}_s / \sigma^2. Find the asymptotic distribution of SNR^\widehat{\text{SNR}}.

Delta Method: Distribution of g(Xˉn)g(\bar{X}_n)

Compare the empirical distribution of g(Xˉn)g(\bar{X}_n) with the CLT + delta method prediction for various nonlinear functions gg.

Parameters

square: x^2, log: ln(x), sqrt: sqrt(x), reciprocal: 1/x

30

Example: Slutsky in Action: The tt-Statistic

Let X1,…,XnX_1, \ldots, X_n be i.i.d. with mean ΞΌ\mu and variance Οƒ2\sigma^2. Define the tt-statistic Tn=XΛ‰nβˆ’ΞΌSn/nT_n = \frac{\bar{X}_n - \mu}{S_n/\sqrt{n}} where Sn2=1nβˆ’1βˆ‘i=1n(Xiβˆ’XΛ‰n)2S_n^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i - \bar{X}_n)^2 is the sample variance. Show that Tnβ†’dN(0,1)T_n \xrightarrow{d} \mathcal{N}(0,1).

πŸŽ“CommIT Contribution(2016)

CLT for Massive MIMO Channel Hardening

T. L. Marzetta, G. Caire β€” IEEE Trans. Information Theory (various)

In massive MIMO systems with MM antennas at the base station, the effective channel gain βˆ₯hβˆ₯2/M\|\mathbf{h}\|^2/M for a single user concentrates around its mean as Mβ†’βˆžM \to \infty β€” a phenomenon called channel hardening. This is a direct application of the LLN: βˆ₯hβˆ₯2/M=1Mβˆ‘m=1M∣hm∣2\|\mathbf{h}\|^2/M = \frac{1}{M}\sum_{m=1}^M |h_m|^2 is a sample mean of i.i.d. terms (under i.i.d. Rayleigh fading).

The CLT further characterizes the fluctuations: they are approximately N(0,Οƒ4/M)\mathcal{N}(0, \sigma^4/M) where Οƒ2=E[∣hm∣2]\sigma^2 = \mathbb{E}[|h_m|^2]. For M=64M = 64 antennas, the coefficient of variation is 1/Mβ‰ˆ12.5%1/\sqrt{M} \approx 12.5\%, explaining why massive MIMO dramatically reduces small-scale fading.

massive-mimochannel-hardeningclt-application

Delta Method

If n(Tnβˆ’ΞΈ)β†’dN(0,Οƒ2)\sqrt{n}(T_n - \theta) \xrightarrow{d} \mathcal{N}(0, \sigma^2) and gg is differentiable at ΞΈ\theta, then n(g(Tn)βˆ’g(ΞΈ))β†’dN(0,[gβ€²(ΞΈ)]2Οƒ2)\sqrt{n}(g(T_n) - g(\theta)) \xrightarrow{d} \mathcal{N}(0, [g'(\theta)]^2 \sigma^2). Propagates asymptotic normality through smooth transformations.

Related: Central Limit Theorem

Slutsky's Theorem

If Xn→dXX_n \xrightarrow{d} X and Yn→PcY_n \xrightarrow{P} c (constant), then Xn+Yn→dX+cX_n + Y_n \xrightarrow{d} X + c and YnXn→dcXY_n X_n \xrightarrow{d} cX. Essential for combining CLT results with consistent estimators.

Related: Central Limit Theorem, Delta Method

Key Takeaway

The multivariate CLT, delta method, and Slutsky's theorem form the asymptotic toolkit for estimation theory. Together, they let us derive the approximate distribution of any smooth function of a sample average β€” the foundation of confidence intervals, hypothesis tests, and performance analysis in communications.