Random Vectors and Joint Distributions

Why Random Vectors?

In a multiple-input multiple-output (MIMO) system with nrn_r receive antennas and ntn_t transmit antennas, the received signal is not a scalar but a vector:

y=Hx+n,\mathbf{y} = \mathbf{H}\mathbf{x} + \mathbf{n},

where yโˆˆCnr\mathbf{y} \in \mathbb{C}^{n_r} is the received vector, HโˆˆCnrร—nt\mathbf{H} \in \mathbb{C}^{n_r \times n_t} is the channel matrix, xโˆˆCnt\mathbf{x} \in \mathbb{C}^{n_t} is the transmitted vector, and nโˆˆCnr\mathbf{n} \in \mathbb{C}^{n_r} is additive noise. Every entry of y\mathbf{y}, H\mathbf{H}, and n\mathbf{n} is a random variable, and they are generally correlated across antennas.

To design MIMO detectors, compute channel capacity, or derive optimal beamformers, we must handle joint distributions over multiple random variables simultaneously. The covariance matrix Cn=E[nnH]\mathbf{C}_{\mathbf{n}} = E[\mathbf{n}\mathbf{n}^H] encodes the noise correlation structure; the channel covariance RH\mathbf{R}_{\mathbf{H}} governs spatial diversity and multiplexing gains.

This section is where Chapter 1's linear algebra โ€” inner products, eigendecompositions, positive semidefiniteness โ€” meets Chapter 2's probability. The central result is the multivariate Gaussian distribution, whose conditional distributions are again Gaussian with parameters given by the Schur complement. This single fact underlies minimum mean-square error (MMSE) estimation, Kalman filtering, and the capacity formula for MIMO channels.

Definition:

Random Vector

Let (ฮฉ,F,P)(\Omega, \mathcal{F}, P) be a probability space. A random vector of dimension nn is a measurable function

x:ฮฉโ†’Rn\mathbf{x} : \Omega \to \mathbb{R}^n

(or Cn\mathbb{C}^n in the complex case), written as

x=(X1X2โ‹ฎXn),\mathbf{x} = \begin{pmatrix} X_1 \\ X_2 \\ \vdots \\ X_n \end{pmatrix},

where each component Xi:ฮฉโ†’RX_i : \Omega \to \mathbb{R} (or C\mathbb{C}) is itself a random variable. Measurability of x\mathbf{x} means that for every Borel set BโІRnB \subseteq \mathbb{R}^n,

xโˆ’1(B)={ฯ‰โˆˆฮฉ:x(ฯ‰)โˆˆB}โˆˆF.\mathbf{x}^{-1}(B) = \{\omega \in \Omega : \mathbf{x}(\omega) \in B\} \in \mathcal{F}.

This is equivalent to requiring that each component XiX_i is a random variable in the sense of Definition def-rv.

We use boldface lowercase (x\mathbf{x}) for random vectors and boldface uppercase (A\mathbf{A}) for random matrices, following the convention established in Chapter 1. A concrete realisation is denoted by the same boldface symbol; context (or an explicit statement such as "let x=x0\mathbf{x} = \mathbf{x}_0") distinguishes the random object from its value.

,

Definition:

Joint CDF and Joint PDF

Let x=(X1,โ€ฆ,Xn)T\mathbf{x} = (X_1, \ldots, X_n)^T be a real random vector.

Joint CDF. The joint cumulative distribution function is Fx(x)=FX1,โ€ฆ,Xn(x1,โ€ฆ,xn)=P(X1โ‰คx1,โ€ฆ,Xnโ‰คxn).F_{\mathbf{x}}(\mathbf{x}) = F_{X_1, \ldots, X_n}(x_1, \ldots, x_n) = P(X_1 \leq x_1, \ldots, X_n \leq x_n).

Properties of the joint CDF:

  1. FxF_{\mathbf{x}} is non-decreasing in each argument.
  2. FxF_{\mathbf{x}} is right-continuous in each argument.
  3. limโกxiโ†’+โˆžย forย allย iFx(x)=1\lim_{x_i \to +\infty \text{ for all } i} F_{\mathbf{x}}(\mathbf{x}) = 1.
  4. limโกxiโ†’โˆ’โˆžย forย anyย iFx(x)=0\lim_{x_i \to -\infty \text{ for any } i} F_{\mathbf{x}}(\mathbf{x}) = 0.

Joint PDF. The random vector x\mathbf{x} is (absolutely) continuous if there exists a non-negative function fx:Rnโ†’[0,โˆž)f_{\mathbf{x}} : \mathbb{R}^n \to [0, \infty) such that

Fx(x)=โˆซโˆ’โˆžx1โ€‰โฃโ‹ฏโ€‰โฃโˆซโˆ’โˆžxnfx(t1,โ€ฆ,tn)โ€‰dtnโ‹ฏdt1.F_{\mathbf{x}}(\mathbf{x}) = \int_{-\infty}^{x_1} \!\cdots\! \int_{-\infty}^{x_n} f_{\mathbf{x}}(t_1, \ldots, t_n)\,dt_n \cdots dt_1.

Wherever FxF_{\mathbf{x}} is sufficiently smooth,

fx(x)=โˆ‚nFxโˆ‚x1โ‹ฏโˆ‚xn(x1,โ€ฆ,xn).f_{\mathbf{x}}(\mathbf{x}) = \frac{\partial^n F_{\mathbf{x}}}{\partial x_1 \cdots \partial x_n} (x_1, \ldots, x_n).

Properties of the joint PDF:

  1. fx(x)โ‰ฅ0f_{\mathbf{x}}(\mathbf{x}) \geq 0 for all x\mathbf{x}.
  2. โˆซRnfx(x)โ€‰dx=1\displaystyle\int_{\mathbb{R}^n} f_{\mathbf{x}}(\mathbf{x})\,d\mathbf{x} = 1.
  3. P(xโˆˆA)=โˆซAfx(x)โ€‰dx\displaystyle P(\mathbf{x} \in A) = \int_A f_{\mathbf{x}}(\mathbf{x})\,d\mathbf{x} for any Borel set AA.

The joint CDF and joint PDF completely characterise the joint distribution of (X1,โ€ฆ,Xn)(X_1, \ldots, X_n). In the complex case, we identify Cn\mathbb{C}^n with R2n\mathbb{R}^{2n} and define the joint density over the real and imaginary parts separately.

,

Definition:

Marginal and Conditional Densities

Let x=(X1,โ€ฆ,Xn)T\mathbf{x} = (X_1, \ldots, X_n)^T have joint PDF fx(x)f_{\mathbf{x}}(\mathbf{x}).

Marginal density. The marginal PDF of any subset of components is obtained by integrating out the remaining variables. For instance, the marginal PDF of X1X_1 is

fX1(x1)=โˆซโˆ’โˆžโˆžโ€‰โฃโ‹ฏโ€‰โฃโˆซโˆ’โˆžโˆžfx(x1,x2,โ€ฆ,xn)โ€‰dx2โ‹ฏdxn.f_{X_1}(x_1) = \int_{-\infty}^{\infty} \!\cdots\! \int_{-\infty}^{\infty} f_{\mathbf{x}}(x_1, x_2, \ldots, x_n)\,dx_2 \cdots dx_n.

More generally, if we partition x=(x1T,x2T)T\mathbf{x} = (\mathbf{x}_1^T, \mathbf{x}_2^T)^T, then

fx1(x1)=โˆซRn2fx(x1,x2)โ€‰dx2.f_{\mathbf{x}_1}(\mathbf{x}_1) = \int_{\mathbb{R}^{n_2}} f_{\mathbf{x}}(\mathbf{x}_1, \mathbf{x}_2)\,d\mathbf{x}_2.

Conditional density. The conditional PDF of x1\mathbf{x}_1 given x2=x2\mathbf{x}_2 = \mathbf{x}_2 is defined (where fx2(x2)>0f_{\mathbf{x}_2}(\mathbf{x}_2) > 0) by

fx1โˆฃx2(x1โˆฃx2)=fx(x1,x2)fx2(x2).f_{\mathbf{x}_1 \mid \mathbf{x}_2}(\mathbf{x}_1 \mid \mathbf{x}_2) = \frac{f_{\mathbf{x}}(\mathbf{x}_1, \mathbf{x}_2)}{f_{\mathbf{x}_2}(\mathbf{x}_2)}.

This is the multivariate analog of Bayes' formula for densities.

Bayes' rule for densities. Rearranging:

fx(x1,x2)=fx1โˆฃx2(x1โˆฃx2)โ€‰fx2(x2)=fx2โˆฃx1(x2โˆฃx1)โ€‰fx1(x1).f_{\mathbf{x}}(\mathbf{x}_1, \mathbf{x}_2) = f_{\mathbf{x}_1 \mid \mathbf{x}_2}(\mathbf{x}_1 \mid \mathbf{x}_2) \,f_{\mathbf{x}_2}(\mathbf{x}_2) = f_{\mathbf{x}_2 \mid \mathbf{x}_1}(\mathbf{x}_2 \mid \mathbf{x}_1) \,f_{\mathbf{x}_1}(\mathbf{x}_1).

Hence

fx1โˆฃx2(x1โˆฃx2)=fx2โˆฃx1(x2โˆฃx1)โ€‰fx1(x1)fx2(x2).f_{\mathbf{x}_1 \mid \mathbf{x}_2}(\mathbf{x}_1 \mid \mathbf{x}_2) = \frac{f_{\mathbf{x}_2 \mid \mathbf{x}_1}(\mathbf{x}_2 \mid \mathbf{x}_1) \,f_{\mathbf{x}_1}(\mathbf{x}_1)}{f_{\mathbf{x}_2}(\mathbf{x}_2)}.

Marginalisation and conditioning are the two fundamental operations on joint distributions. In communications, marginalisation computes the distribution of a single antenna's output from the joint MIMO distribution; conditioning computes the posterior distribution of the transmitted symbol given the received signal.

,

Definition:

Mean Vector

The mean vector (or expectation) of a random vector x=(X1,โ€ฆ,Xn)T\mathbf{x} = (X_1, \ldots, X_n)^T is

ฮผx=E[x]=(E[X1]E[X2]โ‹ฎE[Xn])โˆˆRnโ€…โ€Š(orย Cn).\boldsymbol{\mu}_{\mathbf{x}} = E[\mathbf{x}] = \begin{pmatrix} E[X_1] \\ E[X_2] \\ \vdots \\ E[X_n] \end{pmatrix} \in \mathbb{R}^n \; (\text{or } \mathbb{C}^n).

Expectation of a random vector (or matrix) is defined component-wise. Linearity carries over: E[Ax+b]=Aโ€‰E[x]+bE[\mathbf{A}\mathbf{x} + \mathbf{b}] = \mathbf{A}\,E[\mathbf{x}] + \mathbf{b} for any deterministic matrix A\mathbf{A} and vector b\mathbf{b}.

Definition:

Correlation Matrix and Covariance Matrix

Let xโˆˆCn\mathbf{x} \in \mathbb{C}^n be a random vector with mean ฮผ=E[x]\boldsymbol{\mu} = E[\mathbf{x}].

Correlation matrix (autocorrelation matrix).

Rx=E[xxH]โˆˆCnร—n.\mathbf{R}_{\mathbf{x}} = E[\mathbf{x}\mathbf{x}^H] \in \mathbb{C}^{n \times n}.

The (i,k)(i,k)-th entry is [Rx]ik=E[XiXkโˆ—][\mathbf{R}_{\mathbf{x}}]_{ik} = E[X_i X_k^*].

Covariance matrix.

Cx=E[(xโˆ’ฮผ)(xโˆ’ฮผ)H]=Rxโˆ’ฮผฮผHโˆˆCnร—n.\mathbf{C}_{\mathbf{x}} = E\bigl[(\mathbf{x} - \boldsymbol{\mu})(\mathbf{x} - \boldsymbol{\mu})^H\bigr] = \mathbf{R}_{\mathbf{x}} - \boldsymbol{\mu}\boldsymbol{\mu}^H \in \mathbb{C}^{n \times n}.

The (i,k)(i,k)-th entry is [Cx]ik=E[(Xiโˆ’ฮผi)(Xkโˆ’ฮผk)โˆ—]=Cov(Xi,Xk)[\mathbf{C}_{\mathbf{x}}]_{ik} = E[(X_i - \mu_i)(X_k - \mu_k)^*] = \mathrm{Cov}(X_i, X_k).

Properties of the covariance matrix:

  1. Hermitian: Cx=CxH\mathbf{C}_{\mathbf{x}} = \mathbf{C}_{\mathbf{x}}^H.
  2. Positive semidefinite (PSD): Cxโชฐ0\mathbf{C}_{\mathbf{x}} \succeq \mathbf{0} (proved in Theorem thm-covariance-psd below).
  3. Diagonal entries are variances: [Cx]ii=Var(Xi)โ‰ฅ0[\mathbf{C}_{\mathbf{x}}]_{ii} = \mathrm{Var}(X_i) \geq 0.
  4. Off-diagonal entries are covariances: โˆฃ[Cx]ikโˆฃโ‰ค[Cx]iiโ€‰[Cx]kk|[\mathbf{C}_{\mathbf{x}}]_{ik}| \leq \sqrt{[\mathbf{C}_{\mathbf{x}}]_{ii}\,[\mathbf{C}_{\mathbf{x}}]_{kk}} (Cauchy--Schwarz).
  5. Affine transformation: If y=Ax+b\mathbf{y} = \mathbf{A}\mathbf{x} + \mathbf{b}, then Cy=Aโ€‰Cxโ€‰AH\mathbf{C}_{\mathbf{y}} = \mathbf{A}\,\mathbf{C}_{\mathbf{x}}\,\mathbf{A}^H.

For real random vectors, replace the Hermitian transpose H^H with the ordinary transpose T^T throughout.

The correlation matrix Rx\mathbf{R}_{\mathbf{x}} and covariance matrix Cx\mathbf{C}_{\mathbf{x}} coincide when x\mathbf{x} is zero-mean (ฮผ=0\boldsymbol{\mu} = \mathbf{0}), which is the typical case for noise and for circularly symmetric channel vectors. The distinction matters when the mean is nonzero, e.g., in Ricean fading where the LOS component contributes a nonzero mean.

,

Definition:

Cross-Covariance Matrix

Let xโˆˆCm\mathbf{x} \in \mathbb{C}^m and yโˆˆCn\mathbf{y} \in \mathbb{C}^n be random vectors with means ฮผx\boldsymbol{\mu}_{\mathbf{x}} and ฮผy\boldsymbol{\mu}_{\mathbf{y}}. The cross-covariance matrix is

Cxy=E[(xโˆ’ฮผx)(yโˆ’ฮผy)H]โˆˆCmร—n.\mathbf{C}_{\mathbf{x}\mathbf{y}} = E\bigl[(\mathbf{x} - \boldsymbol{\mu}_{\mathbf{x}}) (\mathbf{y} - \boldsymbol{\mu}_{\mathbf{y}})^H\bigr] \in \mathbb{C}^{m \times n}.

Note that Cyx=CxyH\mathbf{C}_{\mathbf{y}\mathbf{x}} = \mathbf{C}_{\mathbf{x}\mathbf{y}}^H.

The random vectors x\mathbf{x} and y\mathbf{y} are uncorrelated if Cxy=0\mathbf{C}_{\mathbf{x}\mathbf{y}} = \mathbf{0}.

When we partition a joint random vector z=(xT,yT)T\mathbf{z} = (\mathbf{x}^T, \mathbf{y}^T)^T, the full covariance matrix has block structure:

Cz=(CxCxyCyxCy).\mathbf{C}_{\mathbf{z}} = \begin{pmatrix} \mathbf{C}_{\mathbf{x}} & \mathbf{C}_{\mathbf{x}\mathbf{y}} \\ \mathbf{C}_{\mathbf{y}\mathbf{x}} & \mathbf{C}_{\mathbf{y}} \end{pmatrix}.

This block partitioning is essential for the conditional Gaussian theorem (Theorem thm-conditional-gaussian).

Theorem: The Covariance Matrix Is Positive Semidefinite

Let xโˆˆCn\mathbf{x} \in \mathbb{C}^n be a random vector with covariance matrix Cx\mathbf{C}_{\mathbf{x}}. Then Cx\mathbf{C}_{\mathbf{x}} is Hermitian positive semidefinite:

aHCxaโ‰ฅ0forย allย aโˆˆCn.\mathbf{a}^H \mathbf{C}_{\mathbf{x}} \mathbf{a} \geq 0 \qquad \text{for all } \mathbf{a} \in \mathbb{C}^n.

Equality holds for a particular aโ‰ 0\mathbf{a} \neq \mathbf{0} if and only if aH(xโˆ’ฮผ)=0\mathbf{a}^H(\mathbf{x} - \boldsymbol{\mu}) = 0 almost surely, i.e., a\mathbf{a} lies in the null space of the "centered" random vector.

The quadratic form aHCxa\mathbf{a}^H \mathbf{C}_{\mathbf{x}} \mathbf{a} is the variance of the scalar projection aHx\mathbf{a}^H \mathbf{x}, and variances are nonnegative.

,

Definition:

Multivariate Gaussian Distribution

A real random vector xโˆˆRn\mathbf{x} \in \mathbb{R}^n has the multivariate Gaussian (or multivariate normal) distribution with mean ฮผโˆˆRn\boldsymbol{\mu} \in \mathbb{R}^n and covariance matrix CโˆˆRnร—n\mathbf{C} \in \mathbb{R}^{n \times n} (symmetric, positive definite), written

xโˆผN(ฮผ,C),\mathbf{x} \sim \mathcal{N}(\boldsymbol{\mu}, \mathbf{C}),

if its joint PDF is

fx(x)=1(2ฯ€)n/2detโก(C)1/2expโกโ€‰โฃ(โˆ’12(xโˆ’ฮผ)TCโˆ’1(xโˆ’ฮผ)).f_{\mathbf{x}}(\mathbf{x}) = \frac{1}{(2\pi)^{n/2} \det(\mathbf{C})^{1/2}} \exp\!\left(-\frac{1}{2}(\mathbf{x} - \boldsymbol{\mu})^T \mathbf{C}^{-1} (\mathbf{x} - \boldsymbol{\mu})\right).

Key properties:

  1. Marginals are Gaussian: Any sub-vector of x\mathbf{x} is also Gaussian, with mean and covariance given by the corresponding sub-vector and sub-matrix.

  2. Affine closure: If y=Ax+b\mathbf{y} = \mathbf{A}\mathbf{x} + \mathbf{b} where AโˆˆRmร—n\mathbf{A} \in \mathbb{R}^{m \times n} and bโˆˆRm\mathbf{b} \in \mathbb{R}^m, then yโˆผN(Aฮผ+b,โ€…โ€ŠACAT)\mathbf{y} \sim \mathcal{N}(\mathbf{A}\boldsymbol{\mu} + \mathbf{b},\; \mathbf{A}\mathbf{C}\mathbf{A}^T).

  3. Uncorrelated โ‡”\Leftrightarrow independent: For jointly Gaussian random variables, zero covariance implies independence (the converse always holds). This is a special property of the Gaussian; it fails for general distributions.

  4. Contours of constant density are ellipsoids centered at ฮผ\boldsymbol{\mu}: (xโˆ’ฮผ)TCโˆ’1(xโˆ’ฮผ)=c(\mathbf{x} - \boldsymbol{\mu})^T \mathbf{C}^{-1} (\mathbf{x} - \boldsymbol{\mu}) = c, with axes aligned along the eigenvectors of C\mathbf{C} and semi-axis lengths proportional to ฮปi\sqrt{\lambda_i}.

The exponent (xโˆ’ฮผ)TCโˆ’1(xโˆ’ฮผ)(\mathbf{x} - \boldsymbol{\mu})^T \mathbf{C}^{-1} (\mathbf{x} - \boldsymbol{\mu}) is the squared Mahalanobis distance from x\mathbf{x} to ฮผ\boldsymbol{\mu}. It generalises the familiar (xโˆ’ฮผ)2/ฯƒ2(x - \mu)^2/\sigma^2 from the scalar case by accounting for correlations through Cโˆ’1\mathbf{C}^{-1}. Points at equal Mahalanobis distance have equal density, forming the ellipsoidal contours.

When C\mathbf{C} is singular (positive semidefinite but not positive definite), the distribution is supported on a proper affine subspace of Rn\mathbb{R}^n and does not possess a density with respect to Lebesgue measure on Rn\mathbb{R}^n. One can still define it via characteristic functions: xโˆผN(ฮผ,C)\mathbf{x} \sim \mathcal{N}(\boldsymbol{\mu}, \mathbf{C}) iff ฮฆx(ฯ‰)=expโก(jฯ‰Tฮผโˆ’12ฯ‰TCฯ‰)\Phi_{\mathbf{x}}(\boldsymbol{\omega}) = \exp(j\boldsymbol{\omega}^T\boldsymbol{\mu} - \tfrac{1}{2}\boldsymbol{\omega}^T\mathbf{C}\boldsymbol{\omega}).

, ,

Theorem: Conditional Distribution of Jointly Gaussian Vectors

Let x=(x1T,x2T)T\mathbf{x} = (\mathbf{x}_1^T, \mathbf{x}_2^T)^T be a jointly Gaussian vector with

xโˆผNโ€‰โฃ((ฮผ1ฮผ2),โ€…โ€Š(C11C12C21C22)),\mathbf{x} \sim \mathcal{N}\!\left( \begin{pmatrix} \boldsymbol{\mu}_1 \\ \boldsymbol{\mu}_2 \end{pmatrix},\; \begin{pmatrix} \mathbf{C}_{11} & \mathbf{C}_{12} \\ \mathbf{C}_{21} & \mathbf{C}_{22} \end{pmatrix}\right),

where x1โˆˆRn1\mathbf{x}_1 \in \mathbb{R}^{n_1}, x2โˆˆRn2\mathbf{x}_2 \in \mathbb{R}^{n_2}, C21=C12T\mathbf{C}_{21} = \mathbf{C}_{12}^T, and C22\mathbf{C}_{22} is invertible. Then the conditional distribution of x1\mathbf{x}_1 given x2\mathbf{x}_2 is Gaussian:

x1โˆฃx2โˆผNโ€‰โฃ(ฮผ1โˆฃ2,โ€…โ€ŠC1โˆฃ2),\mathbf{x}_1 \mid \mathbf{x}_2 \sim \mathcal{N}\!\bigl(\boldsymbol{\mu}_{1|2},\; \mathbf{C}_{1|2}\bigr),

with conditional mean (a linear function of x2\mathbf{x}_2):

ฮผ1โˆฃ2=E[x1โˆฃx2]=ฮผ1+C12C22โˆ’1(x2โˆ’ฮผ2),\boldsymbol{\mu}_{1|2} = E[\mathbf{x}_1 \mid \mathbf{x}_2] = \boldsymbol{\mu}_1 + \mathbf{C}_{12}\mathbf{C}_{22}^{-1} (\mathbf{x}_2 - \boldsymbol{\mu}_2),

and conditional covariance (independent of x2\mathbf{x}_2):

C1โˆฃ2=C11โˆ’C12C22โˆ’1C21.\mathbf{C}_{1|2} = \mathbf{C}_{11} - \mathbf{C}_{12}\mathbf{C}_{22}^{-1}\mathbf{C}_{21}.

The matrix C1โˆฃ2\mathbf{C}_{1|2} is the Schur complement of C22\mathbf{C}_{22} in Cx\mathbf{C}_{\mathbf{x}}.

Conditioning on x2\mathbf{x}_2 shifts the mean of x1\mathbf{x}_1 by a linear correction proportional to how far x2\mathbf{x}_2 deviates from its own mean, scaled by the "regression coefficient" C12C22โˆ’1\mathbf{C}_{12}\mathbf{C}_{22}^{-1}. The conditional covariance C1โˆฃ2\mathbf{C}_{1|2} is always smaller (in the PSD sense) than C11\mathbf{C}_{11}: observing x2\mathbf{x}_2 can only reduce uncertainty about x1\mathbf{x}_1.

, ,

Definition:

Circularly Symmetric Complex Gaussian Distribution

A complex random vector xโˆˆCn\mathbf{x} \in \mathbb{C}^n has the circularly symmetric complex Gaussian distribution, written

xโˆผCN(ฮผ,R),\mathbf{x} \sim \mathcal{CN}(\boldsymbol{\mu}, \mathbf{R}),

if it satisfies two conditions:

  1. The augmented real vector x~=(Re(x)T,Im(x)T)TโˆˆR2n\tilde{\mathbf{x}} = (\mathrm{Re}(\mathbf{x})^T, \mathrm{Im}(\mathbf{x})^T)^T \in \mathbb{R}^{2n} is jointly Gaussian.
  2. Circular symmetry: ejฮธx=dxe^{j\theta}\mathbf{x} \stackrel{d}{=} \mathbf{x} for all ฮธโˆˆ[0,2ฯ€)\theta \in [0, 2\pi) (rotation of the complex plane leaves the distribution invariant).

Here ฮผ=E[x]\boldsymbol{\mu} = E[\mathbf{x}] and R=E[(xโˆ’ฮผ)(xโˆ’ฮผ)H]\mathbf{R} = E[(\mathbf{x} - \boldsymbol{\mu})(\mathbf{x} - \boldsymbol{\mu})^H] is the covariance matrix.

Consequence of circular symmetry: pseudo-covariance vanishes. The pseudo-covariance matrix (or relation matrix) is

C~x=E[(xโˆ’ฮผ)(xโˆ’ฮผ)T].\tilde{\mathbf{C}}_{\mathbf{x}} = E[(\mathbf{x} - \boldsymbol{\mu})(\mathbf{x} - \boldsymbol{\mu})^T].

For a circularly symmetric distribution, C~x=0\tilde{\mathbf{C}}_{\mathbf{x}} = \mathbf{0}. This means the real and imaginary parts of x\mathbf{x} have equal covariances and complementary cross-covariances: if x=u+jv\mathbf{x} = \mathbf{u} + j\mathbf{v} (with u,v\mathbf{u}, \mathbf{v} real), then

Cuu=Cvv=12โ€‰Re(R),Cuv=โˆ’Cvu=โˆ’12โ€‰Im(R).\mathbf{C}_{\mathbf{u}\mathbf{u}} = \mathbf{C}_{\mathbf{v}\mathbf{v}} = \tfrac{1}{2}\,\mathrm{Re}(\mathbf{R}), \qquad \mathbf{C}_{\mathbf{u}\mathbf{v}} = -\mathbf{C}_{\mathbf{v}\mathbf{u}} = -\tfrac{1}{2}\,\mathrm{Im}(\mathbf{R}).

PDF. For ฮผ=0\boldsymbol{\mu} = \mathbf{0} and R\mathbf{R} positive definite, the PDF is

fx(x)=1ฯ€ndetโก(R)expโกโ€‰โฃ(โˆ’xHRโˆ’1x).f_{\mathbf{x}}(\mathbf{x}) = \frac{1}{\pi^n \det(\mathbf{R})} \exp\!\bigl(-\mathbf{x}^H \mathbf{R}^{-1} \mathbf{x}\bigr).

Note the normalising constant ฯ€n\pi^n (not (2ฯ€)n(2\pi)^n) and the absence of the factor 1/21/2 in the exponent, compared with the real Gaussian.

Conditional distribution. The complex version of the conditional Gaussian theorem holds with T^T replaced by H^H: if x=(x1T,x2T)T\mathbf{x} = (\mathbf{x}_1^T, \mathbf{x}_2^T)^T is jointly CN\mathcal{CN}, then

x1โˆฃx2โˆผCNโ€‰โฃ(ฮผ1+R12R22โˆ’1(x2โˆ’ฮผ2),โ€…โ€ŠR11โˆ’R12R22โˆ’1R21).\mathbf{x}_1 \mid \mathbf{x}_2 \sim \mathcal{CN}\!\bigl(\boldsymbol{\mu}_1 + \mathbf{R}_{12}\mathbf{R}_{22}^{-1}(\mathbf{x}_2 - \boldsymbol{\mu}_2),\; \mathbf{R}_{11} - \mathbf{R}_{12}\mathbf{R}_{22}^{-1}\mathbf{R}_{21}\bigr).

The CN\mathcal{CN} distribution is the standard model for noise and channel vectors in wireless communications. Circular symmetry reflects the physical fact that the absolute carrier phase is uniformly distributed and unknown, so the joint statistics must be invariant to phase rotation.

A non-circularly-symmetric complex Gaussian (also called "improper" or "non-circular") has C~xโ‰ 0\tilde{\mathbf{C}}_{\mathbf{x}} \neq \mathbf{0} and requires the full augmented description. Such signals arise in certain interference scenarios and in widely-linear processing.

, ,

Bivariate Gaussian Density Explorer

Visualise the joint PDF of a bivariate Gaussian xโˆผN(ฮผ,C)\mathbf{x} \sim \mathcal{N}(\boldsymbol{\mu}, \mathbf{C}) as a 3D surface or contour plot. Adjust the correlation coefficient ฯ\rho to see how the elliptical contours rotate and stretch. When ฯ=0\rho = 0 the contours are axis-aligned (independent components); as โˆฃฯโˆฃโ†’1|\rho| \to 1 the ellipse collapses onto a line (perfect linear dependence).

Parameters
0
0
0
1
1

Example: MMSE Estimation via Conditional Gaussian

Consider the standard linear observation model

y=Hx+n,\mathbf{y} = \mathbf{H}\mathbf{x} + \mathbf{n},

where xโˆผCN(0,Cx)\mathbf{x} \sim \mathcal{CN}(\mathbf{0}, \mathbf{C}_{\mathbf{x}}) is the transmitted vector, nโˆผCN(0,ฯƒ2I)\mathbf{n} \sim \mathcal{CN}(\mathbf{0}, \sigma^2\mathbf{I}) is noise independent of x\mathbf{x}, and H\mathbf{H} is a known (deterministic) channel matrix. Derive the minimum mean-square error (MMSE) estimate x^MMSE=E[xโˆฃy]\hat{\mathbf{x}}_{\mathrm{MMSE}} = E[\mathbf{x} \mid \mathbf{y}].

,

Why This Matters: Why Channel Vectors Are Circularly Symmetric Complex Gaussian

In a MIMO wireless channel with ntn_t transmit and nrn_r receive antennas, the channel matrix HโˆˆCnrร—nt\mathbf{H} \in \mathbb{C}^{n_r \times n_t} has entries HijH_{ij} representing the complex gain from transmit antenna jj to receive antenna ii. In a rich-scattering environment with no line-of-sight component, each HijH_{ij} is the superposition of a large number of independent scattered paths:

Hij=โˆ‘k=1Nakโ€‰ejฯ•k,H_{ij} = \sum_{k=1}^{N} a_k\,e^{j\phi_k},

where aka_k is the amplitude and ฯ•k\phi_k the phase of the kk-th path.

Why circularly symmetric? The phases ฯ•k\phi_k are uniformly distributed on [0,2ฯ€)[0, 2\pi) because the propagation distances are much larger than the carrier wavelength. By the central limit theorem (applied to the real and imaginary parts separately), HijH_{ij} converges in distribution to a complex Gaussian. The uniform phase distribution ensures that ejฮธHij=dHije^{j\theta}H_{ij} \stackrel{d}{=} H_{ij} for any fixed ฮธ\theta, which is precisely circular symmetry. The pseudo-covariance vanishes: E[Hij2]=0E[H_{ij}^2] = 0.

Spatial correlation. The entries of H\mathbf{H} are generally correlated across antennas when the antenna spacing is small relative to the wavelength. The Kronecker model approximates the full channel covariance as

vec(H)โˆผCNโ€‰โฃ(0,โ€…โ€ŠRtTโŠ—Rr),\mathrm{vec}(\mathbf{H}) \sim \mathcal{CN}\!\bigl(\mathbf{0},\; \mathbf{R}_t^T \otimes \mathbf{R}_r\bigr),

where Rr\mathbf{R}_r and Rt\mathbf{R}_t are the receive and transmit spatial correlation matrices, and โŠ—\otimes is the Kronecker product (Section 1.7).

i.i.d. Rayleigh fading. When the antennas are sufficiently spaced (typically โ‰ฅฮป/2\geq \lambda/2), Rr=I\mathbf{R}_r = \mathbf{I} and Rt=I\mathbf{R}_t = \mathbf{I}, giving the i.i.d. model HijโˆผCN(0,1)H_{ij} \sim \mathcal{CN}(0, 1) independently. This is the standard benchmark for MIMO capacity analysis.

See full treatment in Chapter 6ุŒ Section 3

Quick Check

Let xโˆˆR3\mathbf{x} \in \mathbb{R}^3 be a random vector with covariance matrix Cx\mathbf{C}_{\mathbf{x}}. Which of the following is guaranteed to hold?

Cx\mathbf{C}_{\mathbf{x}} is positive definite

Cx\mathbf{C}_{\mathbf{x}} is positive semidefinite

Cx\mathbf{C}_{\mathbf{x}} is diagonal

All eigenvalues of Cx\mathbf{C}_{\mathbf{x}} are strictly positive

Quick Check

Let (X1,X2)TโˆผN(0,C)(X_1, X_2)^T \sim \mathcal{N}(\mathbf{0}, \mathbf{C}) with C=(4224)\mathbf{C} = \begin{pmatrix} 4 & 2 \\ 2 & 4 \end{pmatrix}. What is E[X1โˆฃX2=3]E[X_1 \mid X_2 = 3]?

00

3/23/2

33

66

Quick Check

Let xโˆผCN(0,In)\mathbf{x} \sim \mathcal{CN}(\mathbf{0}, \mathbf{I}_n). What is the pseudo-covariance E[xxT]E[\mathbf{x}\mathbf{x}^T]?

In\mathbf{I}_n

0\mathbf{0}

12In\frac{1}{2}\mathbf{I}_n

Not well-defined

Random Vector

A measurable function x:ฮฉโ†’Rn\mathbf{x} : \Omega \to \mathbb{R}^n (or Cn\mathbb{C}^n) whose components are random variables. Extends the scalar random variable concept to the multivariate setting needed for MIMO and multi-sensor processing.

Related: Random Vector, Random Variable

Covariance Matrix

For a random vector x\mathbf{x} with mean ฮผ\boldsymbol{\mu}, the matrix Cx=E[(xโˆ’ฮผ)(xโˆ’ฮผ)H]\mathbf{C}_{\mathbf{x}} = E[(\mathbf{x} - \boldsymbol{\mu})(\mathbf{x} - \boldsymbol{\mu})^H]. It is Hermitian and positive semidefinite. Diagonal entries are variances; off-diagonal entries are covariances between components. Encodes the second-order correlation structure.

Related: Correlation Matrix and Covariance Matrix, The Covariance Matrix Is Positive Semidefinite

Multivariate Gaussian Distribution

A distribution on Rn\mathbb{R}^n parameterised by mean ฮผ\boldsymbol{\mu} and covariance C\mathbf{C}, with PDF proportional to expโก(โˆ’12(xโˆ’ฮผ)TCโˆ’1(xโˆ’ฮผ))\exp(-\frac{1}{2}(\mathbf{x}-\boldsymbol{\mu})^T\mathbf{C}^{-1}(\mathbf{x}-\boldsymbol{\mu})). Marginals, conditionals, and affine transformations of Gaussians are Gaussian. Uncorrelated components are independent. The workhorse distribution for MIMO signal processing.

Related: Multivariate Gaussian Distribution, Conditional Distribution of Jointly Gaussian Vectors

Circularly Symmetric (Complex Gaussian)

A complex random vector x\mathbf{x} is circularly symmetric if ejฮธx=dxe^{j\theta}\mathbf{x} \stackrel{d}{=} \mathbf{x} for all ฮธ\theta. Equivalently, the pseudo-covariance E[(xโˆ’ฮผ)(xโˆ’ฮผ)T]E[(\mathbf{x}-\boldsymbol{\mu})(\mathbf{x}-\boldsymbol{\mu})^T] vanishes. The standard model for wireless channel gains and additive noise, written CN(ฮผ,R)\mathcal{CN}(\boldsymbol{\mu}, \mathbf{R}).

Related: Circularly Symmetric Complex Gaussian Distribution, Why Channel Vectors Are Circularly Symmetric Complex Gaussian

Schur Complement

Given a block matrix M=(ABCD)\mathbf{M} = \bigl(\begin{smallmatrix} \mathbf{A} & \mathbf{B} \\ \mathbf{C} & \mathbf{D} \end{smallmatrix}\bigr) with D\mathbf{D} invertible, the Schur complement of D\mathbf{D} in M\mathbf{M} is M/D=Aโˆ’BDโˆ’1C\mathbf{M}/\mathbf{D} = \mathbf{A} - \mathbf{B}\mathbf{D}^{-1}\mathbf{C}. In the conditional Gaussian theorem, the Schur complement of C22\mathbf{C}_{22} gives the conditional covariance C1โˆฃ2\mathbf{C}_{1|2}. Also central to block matrix inversion and determinant identities.

Related: Conditional Distribution of Jointly Gaussian Vectors, MMSE Estimation via Conditional Gaussian

Common Mistake: Forgetting Hermitian Transpose (H^H) vs Transpose (T^T) in Complex Covariance

Mistake:

Writing the covariance matrix of a complex random vector as Cx=E[(xโˆ’ฮผ)(xโˆ’ฮผ)T]\mathbf{C}_{\mathbf{x}} = E[(\mathbf{x} - \boldsymbol{\mu})(\mathbf{x} - \boldsymbol{\mu})^T] using the ordinary transpose T^T instead of the Hermitian (conjugate) transpose H^H.

Correction:

For complex random vectors, the correct definition is

Cx=E[(xโˆ’ฮผ)(xโˆ’ฮผ)H],\mathbf{C}_{\mathbf{x}} = E\bigl[(\mathbf{x} - \boldsymbol{\mu}) (\mathbf{x} - \boldsymbol{\mu})^H\bigr],

where H^H denotes the conjugate transpose. Using T^T instead of H^H gives the pseudo-covariance matrix C~x=E[(xโˆ’ฮผ)(xโˆ’ฮผ)T]\tilde{\mathbf{C}}_{\mathbf{x}} = E[(\mathbf{x} - \boldsymbol{\mu})(\mathbf{x} - \boldsymbol{\mu})^T], which is a completely different object (and equals zero for circularly symmetric vectors).

Consequences of the error:

  • (xโˆ’ฮผ)(xโˆ’ฮผ)T(\mathbf{x} - \boldsymbol{\mu})(\mathbf{x} - \boldsymbol{\mu})^T is not Hermitian; the resulting "covariance" would not be Hermitian PSD.
  • The Gaussian PDF formula uses Cโˆ’1\mathbf{C}^{-1} with the H^H-defined covariance. Substituting the T^T-version gives a nonsensical density.
  • All downstream results โ€” MMSE estimators, capacity formulas, beamformer designs โ€” will be incorrect.

Mnemonic: In the real case, H=T^H = ^T, so the distinction vanishes. The moment you work with complex signals (which is always in baseband communications), switch to H^H.

โš ๏ธEngineering Note

Numerical Stability When Inverting Covariance Matrices

The conditional Gaussian formula and the MMSE estimator both require inverting the covariance matrix C22\mathbf{C}_{22}. In practice, covariance matrices estimated from finite samples are often ill-conditioned, especially in massive MIMO systems where ntn_t or nrn_r can exceed 64.

Key issues:

  • Condition number: If ฮบ(C)=ฮปmaxโก/ฮปminโก\kappa(\mathbf{C}) = \lambda_{\max}/\lambda_{\min} exceeds 10610^{6}--10810^{8} (common in correlated channels), direct inversion via LU or Cholesky can amplify roundoff errors by a factor of ฮบ\kappa in double precision (64-bit IEEE 754, โˆผ16\sim 16 significant digits).
  • Sample deficiency: When the number of samples N<nN < n (the dimension), the sample covariance matrix is rank-deficient and singular. This occurs routinely in pilot-limited systems.

Practical remedies:

  1. Diagonal loading (Tikhonov regularisation): Replace C\mathbf{C} with C+ฯตI\mathbf{C} + \epsilon \mathbf{I} where ฯตโˆผ10โˆ’2โ€‰tr(C)/n\epsilon \sim 10^{-2}\,\mathrm{tr}(\mathbf{C})/n. This bounds ฮบโ‰ค(ฮปmaxโก+ฯต)/ฯต\kappa \leq (\lambda_{\max}+\epsilon)/\epsilon.
  2. Cholesky factorisation: Compute C=LLH\mathbf{C} = \mathbf{L}\mathbf{L}^H and solve via forward/back-substitution instead of forming Cโˆ’1\mathbf{C}^{-1} explicitly. Cost: n3/3n^3/3 flops vs n3n^3 for general inverse.
  3. Eigenvalue truncation: Discard eigenvalues below a threshold (e.g., 10โˆ’6ฮปmaxโก10^{-6}\lambda_{\max}) and invert only the dominant subspace.
  4. Woodbury identity: For low-rank updates C=ฯƒ2I+UUH\mathbf{C} = \sigma^2\mathbf{I} + \mathbf{U}\mathbf{U}^H (common in signal + noise models), the Woodbury formula gives Cโˆ’1\mathbf{C}^{-1} in O(nr2)O(n r^2) instead of O(n3)O(n^3) where rโ‰ชnr \ll n is the rank of U\mathbf{U}.

In 5G NR, channel estimation uses LMMSE with regularised covariance inversion at every coherence interval (โˆผ1\sim 1 ms at 30 kHz SCS). At 64 antennas, this means inverting a 64ร—6464 \times 64 complex matrix every millisecond per user --- numerical stability is not academic.

Practical Constraints
  • โ€ข

    Double-precision floating point limits condition number to ~10^15

  • โ€ข

    Sample covariance requires N >= n samples for full rank

  • โ€ข

    5G NR coherence time constrains computation budget to ~1 ms

Key Takeaway

The central message of this section in three points:

  1. Random vectors extend scalar probability to MIMO. The covariance matrix Cx\mathbf{C}_{\mathbf{x}} โ€” Hermitian and positive semidefinite by construction โ€” encodes all pairwise second-order statistics. The eigendecomposition of Cx\mathbf{C}_{\mathbf{x}} reveals the principal axes of randomness, directly connecting to Chapter 1's spectral theory.

  2. The conditional Gaussian theorem is the master tool. For jointly Gaussian vectors, conditioning produces another Gaussian whose parameters are given explicitly by the Schur complement: E[x1โˆฃx2]=ฮผ1+C12C22โˆ’1(x2โˆ’ฮผ2)E[\mathbf{x}_1 \mid \mathbf{x}_2] = \boldsymbol{\mu}_1 + \mathbf{C}_{12}\mathbf{C}_{22}^{-1}(\mathbf{x}_2 - \boldsymbol{\mu}_2) and C1โˆฃ2=C11โˆ’C12C22โˆ’1C21\mathbf{C}_{1|2} = \mathbf{C}_{11} - \mathbf{C}_{12}\mathbf{C}_{22}^{-1}\mathbf{C}_{21}. This single result underlies the MMSE estimator, the Kalman filter, and MIMO capacity computation.

  3. Circular symmetry is the bridge to wireless. The CN(ฮผ,R)\mathcal{CN}(\boldsymbol{\mu}, \mathbf{R}) distribution โ€” with its vanishing pseudo-covariance and phase-rotation invariance โ€” is the natural model for channel vectors in rich-scattering environments. All of the real Gaussian machinery (marginals, conditionals, affine closure) carries over, with T^T replaced by H^H throughout.