Ferkans — Interactive Telecom Tutor

Why Random Vectors?

In a multiple-input multiple-output (MIMO) system with $n_r$ receive antennas and $n_t$ transmit antennas, the received signal is not a scalar but a vector:

$\mathbf{y} = \mathbf{H}\mathbf{x} + \mathbf{n},$

where $\mathbf{y} \in \mathbb{C}^{n_r}$ is the received vector, $\mathbf{H} \in \mathbb{C}^{n_r \times n_t}$ is the channel matrix, $\mathbf{x} \in \mathbb{C}^{n_t}$ is the transmitted vector, and $\mathbf{n} \in \mathbb{C}^{n_r}$ is additive noise. Every entry of $\mathbf{y}$ , $\mathbf{H}$ , and $\mathbf{n}$ is a random variable, and they are generally correlated across antennas.

To design MIMO detectors, compute channel capacity, or derive optimal beamformers, we must handle joint distributions over multiple random variables simultaneously. The covariance matrix $\mathbf{C}_{\mathbf{n}} = E[\mathbf{n}\mathbf{n}^H]$ encodes the noise correlation structure; the channel covariance $\mathbf{R}_{\mathbf{H}}$ governs spatial diversity and multiplexing gains.

This section is where Chapter 1's linear algebra — inner products, eigendecompositions, positive semidefiniteness — meets Chapter 2's probability. The central result is the multivariate Gaussian distribution, whose conditional distributions are again Gaussian with parameters given by the Schur complement. This single fact underlies minimum mean-square error (MMSE) estimation, Kalman filtering, and the capacity formula for MIMO channels.

Definition:
Random Vector

Let $(\Omega, \mathcal{F}, P)$ be a probability space. A random vector of dimension $n$ is a measurable function

$\mathbf{x} : \Omega \to \mathbb{R}^n$

(or $\mathbb{C}^n$ in the complex case), written as

$\mathbf{x} = \begin{pmatrix} X_1 \\ X_2 \\ \vdots \\ X_n \end{pmatrix},$

where each component $X_i : \Omega \to \mathbb{R}$ (or $\mathbb{C}$ ) is itself a random variable. Measurability of $\mathbf{x}$ means that for every Borel set $B \subseteq \mathbb{R}^n$ ,

$\mathbf{x}^{-1}(B) = \{\omega \in \Omega : \mathbf{x}(\omega) \in B\} \in \mathcal{F}.$

This is equivalent to requiring that each component $X_i$ is a random variable in the sense of Definition def-rv.

We use boldface lowercase ( $\mathbf{x}$ ) for random vectors and boldface uppercase ( $\mathbf{A}$ ) for random matrices, following the convention established in Chapter 1. A concrete realisation is denoted by the same boldface symbol; context (or an explicit statement such as "let $\mathbf{x} = \mathbf{x}_0$ ") distinguishes the random object from its value.

,

Definition:
Joint CDF and Joint PDF

Let $\mathbf{x} = (X_1, \ldots, X_n)^T$ be a real random vector.

Joint CDF. The joint cumulative distribution function is $F_{\mathbf{x}}(\mathbf{x}) = F_{X_1, \ldots, X_n}(x_1, \ldots, x_n) = P(X_1 \leq x_1, \ldots, X_n \leq x_n).$

Properties of the joint CDF:

$F_{\mathbf{x}}$ is non-decreasing in each argument.
$F_{\mathbf{x}}$ is right-continuous in each argument.
$\lim_{x_i \to +\infty \text{ for all } i} F_{\mathbf{x}}(\mathbf{x}) = 1$ .
$\lim_{x_i \to -\infty \text{ for any } i} F_{\mathbf{x}}(\mathbf{x}) = 0$ .

Joint PDF. The random vector $\mathbf{x}$ is (absolutely) continuous if there exists a non-negative function $f_{\mathbf{x}} : \mathbb{R}^n \to [0, \infty)$ such that

$F_{\mathbf{x}}(\mathbf{x}) = \int_{-\infty}^{x_1} \!\cdots\! \int_{-\infty}^{x_n} f_{\mathbf{x}}(t_1, \ldots, t_n)\,dt_n \cdots dt_1.$

Wherever $F_{\mathbf{x}}$ is sufficiently smooth,

$f_{\mathbf{x}}(\mathbf{x}) = \frac{\partial^n F_{\mathbf{x}}}{\partial x_1 \cdots \partial x_n} (x_1, \ldots, x_n).$

Properties of the joint PDF:

$f_{\mathbf{x}}(\mathbf{x}) \geq 0$ for all $\mathbf{x}$ .
$\displaystyle\int_{\mathbb{R}^n} f_{\mathbf{x}}(\mathbf{x})\,d\mathbf{x} = 1$ .
$\displaystyle P(\mathbf{x} \in A) = \int_A f_{\mathbf{x}}(\mathbf{x})\,d\mathbf{x}$ for any Borel set $A$ .

The joint CDF and joint PDF completely characterise the joint distribution of $(X_1, \ldots, X_n)$ . In the complex case, we identify $\mathbb{C}^n$ with $\mathbb{R}^{2n}$ and define the joint density over the real and imaginary parts separately.

,

Definition:
Marginal and Conditional Densities

Let $\mathbf{x} = (X_1, \ldots, X_n)^T$ have joint PDF $f_{\mathbf{x}}(\mathbf{x})$ .

Marginal density. The marginal PDF of any subset of components is obtained by integrating out the remaining variables. For instance, the marginal PDF of $X_1$ is

$f_{X_1}(x_1) = \int_{-\infty}^{\infty} \!\cdots\! \int_{-\infty}^{\infty} f_{\mathbf{x}}(x_1, x_2, \ldots, x_n)\,dx_2 \cdots dx_n.$

More generally, if we partition $\mathbf{x} = (\mathbf{x}_1^T, \mathbf{x}_2^T)^T$ , then

$f_{\mathbf{x}_1}(\mathbf{x}_1) = \int_{\mathbb{R}^{n_2}} f_{\mathbf{x}}(\mathbf{x}_1, \mathbf{x}_2)\,d\mathbf{x}_2.$

Conditional density. The conditional PDF of $\mathbf{x}_1$ given $\mathbf{x}_2 = \mathbf{x}_2$ is defined (where $f_{\mathbf{x}_2}(\mathbf{x}_2) > 0$ ) by

$f_{\mathbf{x}_1 \mid \mathbf{x}_2}(\mathbf{x}_1 \mid \mathbf{x}_2) = \frac{f_{\mathbf{x}}(\mathbf{x}_1, \mathbf{x}_2)}{f_{\mathbf{x}_2}(\mathbf{x}_2)}.$

This is the multivariate analog of Bayes' formula for densities.

Bayes' rule for densities. Rearranging:

$f_{\mathbf{x}}(\mathbf{x}_1, \mathbf{x}_2) = f_{\mathbf{x}_1 \mid \mathbf{x}_2}(\mathbf{x}_1 \mid \mathbf{x}_2) \,f_{\mathbf{x}_2}(\mathbf{x}_2) = f_{\mathbf{x}_2 \mid \mathbf{x}_1}(\mathbf{x}_2 \mid \mathbf{x}_1) \,f_{\mathbf{x}_1}(\mathbf{x}_1).$

Hence

$f_{\mathbf{x}_1 \mid \mathbf{x}_2}(\mathbf{x}_1 \mid \mathbf{x}_2) = \frac{f_{\mathbf{x}_2 \mid \mathbf{x}_1}(\mathbf{x}_2 \mid \mathbf{x}_1) \,f_{\mathbf{x}_1}(\mathbf{x}_1)}{f_{\mathbf{x}_2}(\mathbf{x}_2)}.$

Marginalisation and conditioning are the two fundamental operations on joint distributions. In communications, marginalisation computes the distribution of a single antenna's output from the joint MIMO distribution; conditioning computes the posterior distribution of the transmitted symbol given the received signal.

,

Definition:
Mean Vector

The mean vector (or expectation) of a random vector $\mathbf{x} = (X_1, \ldots, X_n)^T$ is

$\boldsymbol{\mu}_{\mathbf{x}} = E[\mathbf{x}] = \begin{pmatrix} E[X_1] \\ E[X_2] \\ \vdots \\ E[X_n] \end{pmatrix} \in \mathbb{R}^n \; (\text{or } \mathbb{C}^n).$

Expectation of a random vector (or matrix) is defined component-wise. Linearity carries over: $E[\mathbf{A}\mathbf{x} + \mathbf{b}] = \mathbf{A}\,E[\mathbf{x}] + \mathbf{b}$ for any deterministic matrix $\mathbf{A}$ and vector $\mathbf{b}$ .

Definition:
Correlation Matrix and Covariance Matrix

Let $\mathbf{x} \in \mathbb{C}^n$ be a random vector with mean $\boldsymbol{\mu} = E[\mathbf{x}]$ .

Correlation matrix (autocorrelation matrix).

$\mathbf{R}_{\mathbf{x}} = E[\mathbf{x}\mathbf{x}^H] \in \mathbb{C}^{n \times n}.$

The $(i,k)$ -th entry is $[\mathbf{R}_{\mathbf{x}}]_{ik} = E[X_i X_k^*]$ .

Covariance matrix.

$\mathbf{C}_{\mathbf{x}} = E\bigl[(\mathbf{x} - \boldsymbol{\mu})(\mathbf{x} - \boldsymbol{\mu})^H\bigr] = \mathbf{R}_{\mathbf{x}} - \boldsymbol{\mu}\boldsymbol{\mu}^H \in \mathbb{C}^{n \times n}.$

The $(i,k)$ -th entry is $[\mathbf{C}_{\mathbf{x}}]_{ik} = E[(X_i - \mu_i)(X_k - \mu_k)^*] = \mathrm{Cov}(X_i, X_k)$ .

Properties of the covariance matrix:

Hermitian: $\mathbf{C}_{\mathbf{x}} = \mathbf{C}_{\mathbf{x}}^H$ .
Positive semidefinite (PSD): $\mathbf{C}_{\mathbf{x}} \succeq \mathbf{0}$ (proved in Theorem thm-covariance-psd below).
Diagonal entries are variances: $[\mathbf{C}_{\mathbf{x}}]_{ii} = \mathrm{Var}(X_i) \geq 0$ .
Off-diagonal entries are covariances: $|[\mathbf{C}_{\mathbf{x}}]_{ik}| \leq \sqrt{[\mathbf{C}_{\mathbf{x}}]_{ii}\,[\mathbf{C}_{\mathbf{x}}]_{kk}}$ (Cauchy--Schwarz).
Affine transformation: If $\mathbf{y} = \mathbf{A}\mathbf{x} + \mathbf{b}$ , then $\mathbf{C}_{\mathbf{y}} = \mathbf{A}\,\mathbf{C}_{\mathbf{x}}\,\mathbf{A}^H$ .

For real random vectors, replace the Hermitian transpose $^H$ with the ordinary transpose $^T$ throughout.

The correlation matrix $\mathbf{R}_{\mathbf{x}}$ and covariance matrix $\mathbf{C}_{\mathbf{x}}$ coincide when $\mathbf{x}$ is zero-mean ( $\boldsymbol{\mu} = \mathbf{0}$ ), which is the typical case for noise and for circularly symmetric channel vectors. The distinction matters when the mean is nonzero, e.g., in Ricean fading where the LOS component contributes a nonzero mean.

,

Definition:
Cross-Covariance Matrix

Let $\mathbf{x} \in \mathbb{C}^m$ and $\mathbf{y} \in \mathbb{C}^n$ be random vectors with means $\boldsymbol{\mu}_{\mathbf{x}}$ and $\boldsymbol{\mu}_{\mathbf{y}}$ . The cross-covariance matrix is

$\mathbf{C}_{\mathbf{x}\mathbf{y}} = E\bigl[(\mathbf{x} - \boldsymbol{\mu}_{\mathbf{x}}) (\mathbf{y} - \boldsymbol{\mu}_{\mathbf{y}})^H\bigr] \in \mathbb{C}^{m \times n}.$

Note that $\mathbf{C}_{\mathbf{y}\mathbf{x}} = \mathbf{C}_{\mathbf{x}\mathbf{y}}^H$ .

The random vectors $\mathbf{x}$ and $\mathbf{y}$ are uncorrelated if $\mathbf{C}_{\mathbf{x}\mathbf{y}} = \mathbf{0}$ .

When we partition a joint random vector $\mathbf{z} = (\mathbf{x}^T, \mathbf{y}^T)^T$ , the full covariance matrix has block structure:

$\mathbf{C}_{\mathbf{z}} = \begin{pmatrix} \mathbf{C}_{\mathbf{x}} & \mathbf{C}_{\mathbf{x}\mathbf{y}} \\ \mathbf{C}_{\mathbf{y}\mathbf{x}} & \mathbf{C}_{\mathbf{y}} \end{pmatrix}.$

This block partitioning is essential for the conditional Gaussian theorem (Theorem thm-conditional-gaussian).

Theorem: The Covariance Matrix Is Positive Semidefinite

Let $\mathbf{x} \in \mathbb{C}^n$ be a random vector with covariance matrix $\mathbf{C}_{\mathbf{x}}$ . Then $\mathbf{C}_{\mathbf{x}}$ is Hermitian positive semidefinite:

$\mathbf{a}^H \mathbf{C}_{\mathbf{x}} \mathbf{a} \geq 0 \qquad \text{for all } \mathbf{a} \in \mathbb{C}^n.$

Equality holds for a particular $\mathbf{a} \neq \mathbf{0}$ if and only if $\mathbf{a}^H(\mathbf{x} - \boldsymbol{\mu}) = 0$ almost surely, i.e., $\mathbf{a}$ lies in the null space of the "centered" random vector.

The quadratic form $\mathbf{a}^H \mathbf{C}_{\mathbf{x}} \mathbf{a}$ is the variance of the scalar projection $\mathbf{a}^H \mathbf{x}$ , and variances are nonnegative.

Show Hint

Write out $\mathbf{a}^H \mathbf{C}_{\mathbf{x}} \mathbf{a}$ using the definition of $\mathbf{C}_{\mathbf{x}}$ .

Move $\mathbf{a}$ inside the expectation and recognise the result as the expected squared magnitude of a scalar.

Proof

Let $\mathbf{a} \in \mathbb{C}^n$ be arbitrary. Define the scalar random variable $Z = \mathbf{a}^H(\mathbf{x} - \boldsymbol{\mu})$ . Then

$\mathbf{a}^H \mathbf{C}_{\mathbf{x}} \mathbf{a} = \mathbf{a}^H E\bigl[(\mathbf{x} - \boldsymbol{\mu}) (\mathbf{x} - \boldsymbol{\mu})^H\bigr] \mathbf{a} = E\bigl[\mathbf{a}^H (\mathbf{x} - \boldsymbol{\mu}) (\mathbf{x} - \boldsymbol{\mu})^H \mathbf{a}\bigr] = E\bigl[|Z|^2\bigr].$

Since $|Z|^2 \geq 0$ pointwise, its expectation is nonnegative:

$\mathbf{a}^H \mathbf{C}_{\mathbf{x}} \mathbf{a} = E[|Z|^2] \geq 0.$

Equality holds if and only if $|Z|^2 = 0$ almost surely, i.e., $\mathbf{a}^H(\mathbf{x} - \boldsymbol{\mu}) = 0$ a.s. $\blacksquare$

,

Definition:
Multivariate Gaussian Distribution

A real random vector $\mathbf{x} \in \mathbb{R}^n$ has the multivariate Gaussian (or multivariate normal) distribution with mean $\boldsymbol{\mu} \in \mathbb{R}^n$ and covariance matrix $\mathbf{C} \in \mathbb{R}^{n \times n}$ (symmetric, positive definite), written

$\mathbf{x} \sim \mathcal{N}(\boldsymbol{\mu}, \mathbf{C}),$

if its joint PDF is

$f_{\mathbf{x}}(\mathbf{x}) = \frac{1}{(2\pi)^{n/2} \det(\mathbf{C})^{1/2}} \exp\!\left(-\frac{1}{2}(\mathbf{x} - \boldsymbol{\mu})^T \mathbf{C}^{-1} (\mathbf{x} - \boldsymbol{\mu})\right).$

Key properties:

Marginals are Gaussian: Any sub-vector of $\mathbf{x}$ is also Gaussian, with mean and covariance given by the corresponding sub-vector and sub-matrix.
Affine closure: If $\mathbf{y} = \mathbf{A}\mathbf{x} + \mathbf{b}$ where $\mathbf{A} \in \mathbb{R}^{m \times n}$ and $\mathbf{b} \in \mathbb{R}^m$ , then $\mathbf{y} \sim \mathcal{N}(\mathbf{A}\boldsymbol{\mu} + \mathbf{b},\; \mathbf{A}\mathbf{C}\mathbf{A}^T)$ .
Uncorrelated $\Leftrightarrow$ independent: For jointly Gaussian random variables, zero covariance implies independence (the converse always holds). This is a special property of the Gaussian; it fails for general distributions.
Contours of constant density are ellipsoids centered at $\boldsymbol{\mu}$ : $(\mathbf{x} - \boldsymbol{\mu})^T \mathbf{C}^{-1} (\mathbf{x} - \boldsymbol{\mu}) = c$ , with axes aligned along the eigenvectors of $\mathbf{C}$ and semi-axis lengths proportional to $\sqrt{\lambda_i}$ .

The exponent $(\mathbf{x} - \boldsymbol{\mu})^T \mathbf{C}^{-1} (\mathbf{x} - \boldsymbol{\mu})$ is the squared Mahalanobis distance from $\mathbf{x}$ to $\boldsymbol{\mu}$ . It generalises the familiar $(x - \mu)^2/\sigma^2$ from the scalar case by accounting for correlations through $\mathbf{C}^{-1}$ . Points at equal Mahalanobis distance have equal density, forming the ellipsoidal contours.

When $\mathbf{C}$ is singular (positive semidefinite but not positive definite), the distribution is supported on a proper affine subspace of $\mathbb{R}^n$ and does not possess a density with respect to Lebesgue measure on $\mathbb{R}^n$ . One can still define it via characteristic functions: $\mathbf{x} \sim \mathcal{N}(\boldsymbol{\mu}, \mathbf{C})$ iff $\Phi_{\mathbf{x}}(\boldsymbol{\omega}) = \exp(j\boldsymbol{\omega}^T\boldsymbol{\mu} - \tfrac{1}{2}\boldsymbol{\omega}^T\mathbf{C}\boldsymbol{\omega})$ .

, ,

Theorem: Conditional Distribution of Jointly Gaussian Vectors

Let $\mathbf{x} = (\mathbf{x}_1^T, \mathbf{x}_2^T)^T$ be a jointly Gaussian vector with

$\mathbf{x} \sim \mathcal{N}\!\left( \begin{pmatrix} \boldsymbol{\mu}_1 \\ \boldsymbol{\mu}_2 \end{pmatrix},\; \begin{pmatrix} \mathbf{C}_{11} & \mathbf{C}_{12} \\ \mathbf{C}_{21} & \mathbf{C}_{22} \end{pmatrix}\right),$

where $\mathbf{x}_1 \in \mathbb{R}^{n_1}$ , $\mathbf{x}_2 \in \mathbb{R}^{n_2}$ , $\mathbf{C}_{21} = \mathbf{C}_{12}^T$ , and $\mathbf{C}_{22}$ is invertible. Then the conditional distribution of $\mathbf{x}_1$ given $\mathbf{x}_2$ is Gaussian:

$\mathbf{x}_1 \mid \mathbf{x}_2 \sim \mathcal{N}\!\bigl(\boldsymbol{\mu}_{1|2},\; \mathbf{C}_{1|2}\bigr),$

with conditional mean (a linear function of $\mathbf{x}_2$ ):

$\boldsymbol{\mu}_{1|2} = E[\mathbf{x}_1 \mid \mathbf{x}_2] = \boldsymbol{\mu}_1 + \mathbf{C}_{12}\mathbf{C}_{22}^{-1} (\mathbf{x}_2 - \boldsymbol{\mu}_2),$

and conditional covariance (independent of $\mathbf{x}_2$ ):

$\mathbf{C}_{1|2} = \mathbf{C}_{11} - \mathbf{C}_{12}\mathbf{C}_{22}^{-1}\mathbf{C}_{21}.$

The matrix $\mathbf{C}_{1|2}$ is the Schur complement of $\mathbf{C}_{22}$ in $\mathbf{C}_{\mathbf{x}}$ .

Conditioning on $\mathbf{x}_2$ shifts the mean of $\mathbf{x}_1$ by a linear correction proportional to how far $\mathbf{x}_2$ deviates from its own mean, scaled by the "regression coefficient" $\mathbf{C}_{12}\mathbf{C}_{22}^{-1}$ . The conditional covariance $\mathbf{C}_{1|2}$ is always smaller (in the PSD sense) than $\mathbf{C}_{11}$ : observing $\mathbf{x}_2$ can only reduce uncertainty about $\mathbf{x}_1$ .

Show Hint

Complete the square in the joint exponent after substituting the block partition.

Use the Schur complement identity for the inverse of a $2 \times 2$ block matrix.

Proof

Step 1: Write the joint exponent in block form

The joint PDF is proportional to $\exp\!\bigl(-\frac{1}{2}\,\mathbf{z}^T \mathbf{C}^{-1} \mathbf{z}\bigr)$ where $\mathbf{z} = \mathbf{x} - \boldsymbol{\mu}$ . Partition $\mathbf{z} = (\mathbf{z}_1^T, \mathbf{z}_2^T)^T$ with $\mathbf{z}_i = \mathbf{x}_i - \boldsymbol{\mu}_i$ . Using the block inversion formula (Schur complement), the inverse of the covariance has blocks:

$\mathbf{C}^{-1} = \begin{pmatrix} \mathbf{C}_{1|2}^{-1} & -\mathbf{C}_{1|2}^{-1}\mathbf{C}_{12}\mathbf{C}_{22}^{-1} \\ -\mathbf{C}_{22}^{-1}\mathbf{C}_{21}\mathbf{C}_{1|2}^{-1} & \mathbf{C}_{22}^{-1} + \mathbf{C}_{22}^{-1}\mathbf{C}_{21}\mathbf{C}_{1|2}^{-1} \mathbf{C}_{12}\mathbf{C}_{22}^{-1} \end{pmatrix},$

where $\mathbf{C}_{1|2} = \mathbf{C}_{11} - \mathbf{C}_{12}\mathbf{C}_{22}^{-1}\mathbf{C}_{21}$ .

Step 2: Complete the square

Expanding the quadratic form $\mathbf{z}^T \mathbf{C}^{-1} \mathbf{z}$ and grouping terms involving $\mathbf{z}_1$ , we obtain

$\mathbf{z}^T \mathbf{C}^{-1} \mathbf{z} = \bigl(\mathbf{z}_1 - \mathbf{C}_{12}\mathbf{C}_{22}^{-1}\mathbf{z}_2\bigr)^T \mathbf{C}_{1|2}^{-1} \bigl(\mathbf{z}_1 - \mathbf{C}_{12}\mathbf{C}_{22}^{-1}\mathbf{z}_2\bigr) + \mathbf{z}_2^T \mathbf{C}_{22}^{-1} \mathbf{z}_2.$

The second term depends only on $\mathbf{z}_2$ and will be absorbed into the marginal density of $\mathbf{x}_2$ upon conditioning.

Step 3: Read off the conditional distribution

Using $f_{\mathbf{x}_1 \mid \mathbf{x}_2} = f_{\mathbf{x}} / f_{\mathbf{x}_2}$ , the second quadratic term cancels with the marginal PDF of $\mathbf{x}_2$ . What remains is Gaussian in $\mathbf{x}_1$ with mean

$\boldsymbol{\mu}_{1|2} = \boldsymbol{\mu}_1 + \mathbf{C}_{12}\mathbf{C}_{22}^{-1} (\mathbf{x}_2 - \boldsymbol{\mu}_2)$

and covariance $\mathbf{C}_{1|2} = \mathbf{C}_{11} - \mathbf{C}_{12}\mathbf{C}_{22}^{-1}\mathbf{C}_{21}$ .

Crucially, $\mathbf{C}_{1|2}$ does not depend on $\mathbf{x}_2$ : conditioning changes the mean but not the covariance structure. $\blacksquare$

, ,

Definition:
Circularly Symmetric Complex Gaussian Distribution

A complex random vector $\mathbf{x} \in \mathbb{C}^n$ has the circularly symmetric complex Gaussian distribution, written

$\mathbf{x} \sim \mathcal{CN}(\boldsymbol{\mu}, \mathbf{R}),$

if it satisfies two conditions:

The augmented real vector $\tilde{\mathbf{x}} = (\mathrm{Re}(\mathbf{x})^T, \mathrm{Im}(\mathbf{x})^T)^T \in \mathbb{R}^{2n}$ is jointly Gaussian.
Circular symmetry: $e^{j\theta}\mathbf{x} \stackrel{d}{=} \mathbf{x}$ for all $\theta \in [0, 2\pi)$ (rotation of the complex plane leaves the distribution invariant).

Here $\boldsymbol{\mu} = E[\mathbf{x}]$ and $\mathbf{R} = E[(\mathbf{x} - \boldsymbol{\mu})(\mathbf{x} - \boldsymbol{\mu})^H]$ is the covariance matrix.

Consequence of circular symmetry: pseudo-covariance vanishes. The pseudo-covariance matrix (or relation matrix) is

$\tilde{\mathbf{C}}_{\mathbf{x}} = E[(\mathbf{x} - \boldsymbol{\mu})(\mathbf{x} - \boldsymbol{\mu})^T].$

For a circularly symmetric distribution, $\tilde{\mathbf{C}}_{\mathbf{x}} = \mathbf{0}$ . This means the real and imaginary parts of $\mathbf{x}$ have equal covariances and complementary cross-covariances: if $\mathbf{x} = \mathbf{u} + j\mathbf{v}$ (with $\mathbf{u}, \mathbf{v}$ real), then

$\mathbf{C}_{\mathbf{u}\mathbf{u}} = \mathbf{C}_{\mathbf{v}\mathbf{v}} = \tfrac{1}{2}\,\mathrm{Re}(\mathbf{R}), \qquad \mathbf{C}_{\mathbf{u}\mathbf{v}} = -\mathbf{C}_{\mathbf{v}\mathbf{u}} = -\tfrac{1}{2}\,\mathrm{Im}(\mathbf{R}).$

PDF. For $\boldsymbol{\mu} = \mathbf{0}$ and $\mathbf{R}$ positive definite, the PDF is

$f_{\mathbf{x}}(\mathbf{x}) = \frac{1}{\pi^n \det(\mathbf{R})} \exp\!\bigl(-\mathbf{x}^H \mathbf{R}^{-1} \mathbf{x}\bigr).$

Note the normalising constant $\pi^n$ (not $(2\pi)^n$ ) and the absence of the factor $1/2$ in the exponent, compared with the real Gaussian.

Conditional distribution. The complex version of the conditional Gaussian theorem holds with $^T$ replaced by $^H$ : if $\mathbf{x} = (\mathbf{x}_1^T, \mathbf{x}_2^T)^T$ is jointly $\mathcal{CN}$ , then

$\mathbf{x}_1 \mid \mathbf{x}_2 \sim \mathcal{CN}\!\bigl(\boldsymbol{\mu}_1 + \mathbf{R}_{12}\mathbf{R}_{22}^{-1}(\mathbf{x}_2 - \boldsymbol{\mu}_2),\; \mathbf{R}_{11} - \mathbf{R}_{12}\mathbf{R}_{22}^{-1}\mathbf{R}_{21}\bigr).$

The $\mathcal{CN}$ distribution is the standard model for noise and channel vectors in wireless communications. Circular symmetry reflects the physical fact that the absolute carrier phase is uniformly distributed and unknown, so the joint statistics must be invariant to phase rotation.

A non-circularly-symmetric complex Gaussian (also called "improper" or "non-circular") has $\tilde{\mathbf{C}}_{\mathbf{x}} \neq \mathbf{0}$ and requires the full augmented description. Such signals arise in certain interference scenarios and in widely-linear processing.

, ,

Bivariate Gaussian Density Explorer

Visualise the joint PDF of a bivariate Gaussian $\mathbf{x} \sim \mathcal{N}(\boldsymbol{\mu}, \mathbf{C})$ as a 3D surface or contour plot. Adjust the correlation coefficient $\rho$ to see how the elliptical contours rotate and stretch. When $\rho = 0$ the contours are axis-aligned (independent components); as $|\rho| \to 1$ the ellipse collapses onto a line (perfect linear dependence).

Parameters

Correlation

\rho

0

Mean

\mu_1

0

Mean

\mu_2

0

Std dev

\sigma_1

1

Std dev

\sigma_2

1

Plot type

Example: MMSE Estimation via Conditional Gaussian

Consider the standard linear observation model

$\mathbf{y} = \mathbf{H}\mathbf{x} + \mathbf{n},$

where $\mathbf{x} \sim \mathcal{CN}(\mathbf{0}, \mathbf{C}_{\mathbf{x}})$ is the transmitted vector, $\mathbf{n} \sim \mathcal{CN}(\mathbf{0}, \sigma^2\mathbf{I})$ is noise independent of $\mathbf{x}$ , and $\mathbf{H}$ is a known (deterministic) channel matrix. Derive the minimum mean-square error (MMSE) estimate $\hat{\mathbf{x}}_{\mathrm{MMSE}} = E[\mathbf{x} \mid \mathbf{y}]$ .

Solution

Step 1: Establish joint Gaussianity

Since $\mathbf{x}$ and $\mathbf{n}$ are independent complex Gaussian vectors, and $\mathbf{y}$ is an affine function of $(\mathbf{x}, \mathbf{n})$ , the stacked vector $(\mathbf{x}^T, \mathbf{y}^T)^T$ is jointly $\mathcal{CN}$ .

Step 2: Compute the required covariance blocks

$\boldsymbol{\mu}_{\mathbf{x}} = \mathbf{0}, \qquad \boldsymbol{\mu}_{\mathbf{y}} = \mathbf{H}\,E[\mathbf{x}] = \mathbf{0}.KATEXPLACEHOLDER0END\mathbf{C}_{\mathbf{y}\mathbf{y}} = E[\mathbf{y}\mathbf{y}^H] = \mathbf{H}\mathbf{C}_{\mathbf{x}}\mathbf{H}^H + \sigma^2\mathbf{I}.KATEXPLACEHOLDER1END\mathbf{C}_{\mathbf{x}\mathbf{y}} = E[\mathbf{x}\mathbf{y}^H] = E[\mathbf{x}(\mathbf{H}\mathbf{x} + \mathbf{n})^H] = \mathbf{C}_{\mathbf{x}}\mathbf{H}^H.$ $

Step 3: Apply the conditional Gaussian formula

By the conditional Gaussian theorem (complex version):

$\hat{\mathbf{x}}_{\mathrm{MMSE}} = E[\mathbf{x} \mid \mathbf{y}] = \mathbf{C}_{\mathbf{x}\mathbf{y}} \mathbf{C}_{\mathbf{y}\mathbf{y}}^{-1} \mathbf{y} = \mathbf{C}_{\mathbf{x}}\mathbf{H}^H \bigl(\mathbf{H}\mathbf{C}_{\mathbf{x}}\mathbf{H}^H + \sigma^2\mathbf{I}\bigr)^{-1} \mathbf{y}.$

This is the MMSE estimator (also called the Wiener filter or Bayesian LMMSE estimator).

Step 4: MMSE error covariance

The estimation error covariance is the conditional covariance (Schur complement):

$\mathbf{C}_{\mathbf{x}|\mathbf{y}} = \mathbf{C}_{\mathbf{x}} - \mathbf{C}_{\mathbf{x}}\mathbf{H}^H \bigl(\mathbf{H}\mathbf{C}_{\mathbf{x}}\mathbf{H}^H + \sigma^2\mathbf{I}\bigr)^{-1} \mathbf{H}\mathbf{C}_{\mathbf{x}}.$

Using the matrix inversion lemma, this can be rewritten as

$\mathbf{C}_{\mathbf{x}|\mathbf{y}} = \bigl(\mathbf{C}_{\mathbf{x}}^{-1} + \sigma^{-2}\mathbf{H}^H\mathbf{H}\bigr)^{-1},$

which is often more convenient when $\mathbf{C}_{\mathbf{x}}$ is diagonal (as in i.i.d. signalling).

Key insight: The MMSE estimator and its error covariance both follow directly from the conditional Gaussian theorem — no calculus of variations or Lagrange multipliers is needed. The Gaussian distribution makes estimation a purely algebraic exercise.

,

Why This Matters: Why Channel Vectors Are Circularly Symmetric Complex Gaussian

In a MIMO wireless channel with $n_t$ transmit and $n_r$ receive antennas, the channel matrix $\mathbf{H} \in \mathbb{C}^{n_r \times n_t}$ has entries $H_{ij}$ representing the complex gain from transmit antenna $j$ to receive antenna $i$ . In a rich-scattering environment with no line-of-sight component, each $H_{ij}$ is the superposition of a large number of independent scattered paths:

$H_{ij} = \sum_{k=1}^{N} a_k\,e^{j\phi_k},$

where $a_k$ is the amplitude and $\phi_k$ the phase of the $k$ -th path.

Why circularly symmetric? The phases $\phi_k$ are uniformly distributed on $[0, 2\pi)$ because the propagation distances are much larger than the carrier wavelength. By the central limit theorem (applied to the real and imaginary parts separately), $H_{ij}$ converges in distribution to a complex Gaussian. The uniform phase distribution ensures that $e^{j\theta}H_{ij} \stackrel{d}{=} H_{ij}$ for any fixed $\theta$ , which is precisely circular symmetry. The pseudo-covariance vanishes: $E[H_{ij}^2] = 0$ .

Spatial correlation. The entries of $\mathbf{H}$ are generally correlated across antennas when the antenna spacing is small relative to the wavelength. The Kronecker model approximates the full channel covariance as

$\mathrm{vec}(\mathbf{H}) \sim \mathcal{CN}\!\bigl(\mathbf{0},\; \mathbf{R}_t^T \otimes \mathbf{R}_r\bigr),$

where $\mathbf{R}_r$ and $\mathbf{R}_t$ are the receive and transmit spatial correlation matrices, and $\otimes$ is the Kronecker product (Section 1.7).

i.i.d. Rayleigh fading. When the antennas are sufficiently spaced (typically $\geq \lambda/2$ ), $\mathbf{R}_r = \mathbf{I}$ and $\mathbf{R}_t = \mathbf{I}$ , giving the i.i.d. model $H_{ij} \sim \mathcal{CN}(0, 1)$ independently. This is the standard benchmark for MIMO capacity analysis.

See full treatment in Chapter 6، Section 3

Quick Check

Let $\mathbf{x} \in \mathbb{R}^3$ be a random vector with covariance matrix $\mathbf{C}_{\mathbf{x}}$ . Which of the following is guaranteed to hold?

$\mathbf{C}_{\mathbf{x}}$ is positive definite

$\mathbf{C}_{\mathbf{x}}$ is positive semidefinite

$\mathbf{C}_{\mathbf{x}}$ is diagonal

All eigenvalues of $\mathbf{C}_{\mathbf{x}}$ are strictly positive

Correction:

\mathbf{C}_{\mathbf{x}}

is positive semidefinite

By Theorem thm-covariance-psd, the covariance matrix is always positive semidefinite: $\mathbf{a}^T \mathbf{C}_{\mathbf{x}} \mathbf{a} = E[|\mathbf{a}^T(\mathbf{x} - \boldsymbol{\mu})|^2] \geq 0$ for all $\mathbf{a}$ . It is positive definite only if no nontrivial linear combination of the components is degenerate (deterministic). For instance, if $X_3 = X_1 + X_2$ deterministically, then $\mathbf{C}_{\mathbf{x}}$ is PSD but singular (not PD).

Quick Check

Let $(X_1, X_2)^T \sim \mathcal{N}(\mathbf{0}, \mathbf{C})$ with $\mathbf{C} = \begin{pmatrix} 4 & 2 \\ 2 & 4 \end{pmatrix}$ . What is $E[X_1 \mid X_2 = 3]$ ?

$0$

$3/2$

$3$

$6$

Correction:

3/2

By the conditional Gaussian formula, $E[X_1 \mid X_2] = \mu_1 + C_{12} C_{22}^{-1}(X_2 - \mu_2) = 0 + (2)(4^{-1})(3 - 0) = 3/2$ . The conditional mean shifts linearly with $X_2$ , with regression coefficient $C_{12}/C_{22} = 2/4 = 1/2$ .

Quick Check

Let $\mathbf{x} \sim \mathcal{CN}(\mathbf{0}, \mathbf{I}_n)$ . What is the pseudo-covariance $E[\mathbf{x}\mathbf{x}^T]$ ?

$\mathbf{I}_n$

$\mathbf{0}$

$\frac{1}{2}\mathbf{I}_n$

Not well-defined

Correction:

\mathbf{0}

Circular symmetry $e^{j\theta}\mathbf{x} \stackrel{d}{=} \mathbf{x}$ implies $E[\mathbf{x}\mathbf{x}^T] = E[(e^{j\theta}\mathbf{x})(e^{j\theta}\mathbf{x})^T] = e^{j2\theta}\,E[\mathbf{x}\mathbf{x}^T]$ for all $\theta$ . The only matrix satisfying $\mathbf{M} = e^{j2\theta}\mathbf{M}$ for all $\theta$ is $\mathbf{M} = \mathbf{0}$ . This vanishing pseudo-covariance is the hallmark of circular symmetry.

Random Vector

A measurable function $\mathbf{x} : \Omega \to \mathbb{R}^n$ (or $\mathbb{C}^n$ ) whose components are random variables. Extends the scalar random variable concept to the multivariate setting needed for MIMO and multi-sensor processing.

Related: Random Vector, Random Variable

Covariance Matrix

For a random vector $\mathbf{x}$ with mean $\boldsymbol{\mu}$ , the matrix $\mathbf{C}_{\mathbf{x}} = E[(\mathbf{x} - \boldsymbol{\mu})(\mathbf{x} - \boldsymbol{\mu})^H]$ . It is Hermitian and positive semidefinite. Diagonal entries are variances; off-diagonal entries are covariances between components. Encodes the second-order correlation structure.

Multivariate Gaussian Distribution

A distribution on $\mathbb{R}^n$ parameterised by mean $\boldsymbol{\mu}$ and covariance $\mathbf{C}$ , with PDF proportional to $\exp(-\frac{1}{2}(\mathbf{x}-\boldsymbol{\mu})^T\mathbf{C}^{-1}(\mathbf{x}-\boldsymbol{\mu}))$ . Marginals, conditionals, and affine transformations of Gaussians are Gaussian. Uncorrelated components are independent. The workhorse distribution for MIMO signal processing.

Circularly Symmetric (Complex Gaussian)

A complex random vector $\mathbf{x}$ is circularly symmetric if $e^{j\theta}\mathbf{x} \stackrel{d}{=} \mathbf{x}$ for all $\theta$ . Equivalently, the pseudo-covariance $E[(\mathbf{x}-\boldsymbol{\mu})(\mathbf{x}-\boldsymbol{\mu})^T]$ vanishes. The standard model for wireless channel gains and additive noise, written $\mathcal{CN}(\boldsymbol{\mu}, \mathbf{R})$ .

Schur Complement

Given a block matrix $\mathbf{M} = \bigl(\begin{smallmatrix} \mathbf{A} & \mathbf{B} \\ \mathbf{C} & \mathbf{D} \end{smallmatrix}\bigr)$ with $\mathbf{D}$ invertible, the Schur complement of $\mathbf{D}$ in $\mathbf{M}$ is $\mathbf{M}/\mathbf{D} = \mathbf{A} - \mathbf{B}\mathbf{D}^{-1}\mathbf{C}$ . In the conditional Gaussian theorem, the Schur complement of $\mathbf{C}_{22}$ gives the conditional covariance $\mathbf{C}_{1|2}$ . Also central to block matrix inversion and determinant identities.

Common Mistake: Forgetting Hermitian Transpose ( $^H$ ) vs Transpose ( $^T$ ) in Complex Covariance

Mistake:

Writing the covariance matrix of a complex random vector as $\mathbf{C}_{\mathbf{x}} = E[(\mathbf{x} - \boldsymbol{\mu})(\mathbf{x} - \boldsymbol{\mu})^T]$ using the ordinary transpose $^T$ instead of the Hermitian (conjugate) transpose $^H$ .

Correction:

For complex random vectors, the correct definition is

$\mathbf{C}_{\mathbf{x}} = E\bigl[(\mathbf{x} - \boldsymbol{\mu}) (\mathbf{x} - \boldsymbol{\mu})^H\bigr],$

where $^H$ denotes the conjugate transpose. Using $^T$ instead of $^H$ gives the pseudo-covariance matrix $\tilde{\mathbf{C}}_{\mathbf{x}} = E[(\mathbf{x} - \boldsymbol{\mu})(\mathbf{x} - \boldsymbol{\mu})^T]$ , which is a completely different object (and equals zero for circularly symmetric vectors).

Consequences of the error:

$(\mathbf{x} - \boldsymbol{\mu})(\mathbf{x} - \boldsymbol{\mu})^T$ is not Hermitian; the resulting "covariance" would not be Hermitian PSD.
The Gaussian PDF formula uses $\mathbf{C}^{-1}$ with the $^H$ -defined covariance. Substituting the $^T$ -version gives a nonsensical density.
All downstream results — MMSE estimators, capacity formulas, beamformer designs — will be incorrect.

Mnemonic: In the real case, $^H = ^T$ , so the distinction vanishes. The moment you work with complex signals (which is always in baseband communications), switch to $^H$ .

⚠️Engineering Note

Numerical Stability When Inverting Covariance Matrices

The conditional Gaussian formula and the MMSE estimator both require inverting the covariance matrix $\mathbf{C}_{22}$ . In practice, covariance matrices estimated from finite samples are often ill-conditioned, especially in massive MIMO systems where $n_t$ or $n_r$ can exceed 64.

Key issues:

Condition number: If $\kappa(\mathbf{C}) = \lambda_{\max}/\lambda_{\min}$ exceeds $10^{6}$ -- $10^{8}$ (common in correlated channels), direct inversion via LU or Cholesky can amplify roundoff errors by a factor of $\kappa$ in double precision (64-bit IEEE 754, $\sim 16$ significant digits).
Sample deficiency: When the number of samples $N < n$ (the dimension), the sample covariance matrix is rank-deficient and singular. This occurs routinely in pilot-limited systems.

Practical remedies:

Diagonal loading (Tikhonov regularisation): Replace $\mathbf{C}$ with $\mathbf{C} + \epsilon \mathbf{I}$ where $\epsilon \sim 10^{-2}\,\mathrm{tr}(\mathbf{C})/n$ . This bounds $\kappa \leq (\lambda_{\max}+\epsilon)/\epsilon$ .
Cholesky factorisation: Compute $\mathbf{C} = \mathbf{L}\mathbf{L}^H$ and solve via forward/back-substitution instead of forming $\mathbf{C}^{-1}$ explicitly. Cost: $n^3/3$ flops vs $n^3$ for general inverse.
Eigenvalue truncation: Discard eigenvalues below a threshold (e.g., $10^{-6}\lambda_{\max}$ ) and invert only the dominant subspace.
Woodbury identity: For low-rank updates $\mathbf{C} = \sigma^2\mathbf{I} + \mathbf{U}\mathbf{U}^H$ (common in signal + noise models), the Woodbury formula gives $\mathbf{C}^{-1}$ in $O(n r^2)$ instead of $O(n^3)$ where $r \ll n$ is the rank of $\mathbf{U}$ .

In 5G NR, channel estimation uses LMMSE with regularised covariance inversion at every coherence interval ( $\sim 1$ ms at 30 kHz SCS). At 64 antennas, this means inverting a $64 \times 64$ complex matrix every millisecond per user --- numerical stability is not academic.

Practical Constraints

•
Double-precision floating point limits condition number to ~10^15
•
Sample covariance requires N >= n samples for full rank
•
5G NR coherence time constrains computation budget to ~1 ms

Key Takeaway

The central message of this section in three points:

Random vectors extend scalar probability to MIMO. The covariance matrix $\mathbf{C}_{\mathbf{x}}$ — Hermitian and positive semidefinite by construction — encodes all pairwise second-order statistics. The eigendecomposition of $\mathbf{C}_{\mathbf{x}}$ reveals the principal axes of randomness, directly connecting to Chapter 1's spectral theory.
The conditional Gaussian theorem is the master tool. For jointly Gaussian vectors, conditioning produces another Gaussian whose parameters are given explicitly by the Schur complement: $E[\mathbf{x}_1 \mid \mathbf{x}_2] = \boldsymbol{\mu}_1 + \mathbf{C}_{12}\mathbf{C}_{22}^{-1}(\mathbf{x}_2 - \boldsymbol{\mu}_2)$ and $\mathbf{C}_{1|2} = \mathbf{C}_{11} - \mathbf{C}_{12}\mathbf{C}_{22}^{-1}\mathbf{C}_{21}$ . This single result underlies the MMSE estimator, the Kalman filter, and MIMO capacity computation.
Circular symmetry is the bridge to wireless. The $\mathcal{CN}(\boldsymbol{\mu}, \mathbf{R})$ distribution — with its vanishing pseudo-covariance and phase-rotation invariance — is the natural model for channel vectors in rich-scattering environments. All of the real Gaussian machinery (marginals, conditionals, affine closure) carries over, with $^T$ replaced by $^H$ throughout.

Random Vectors and Joint Distributions

Why Random Vectors?

Definition: Random Vector

Definition: Joint CDF and Joint PDF

Definition: Marginal and Conditional Densities

Definition: Mean Vector

Definition: Correlation Matrix and Covariance Matrix

Definition: Cross-Covariance Matrix

Theorem: The Covariance Matrix Is Positive Semidefinite

Proof

Definition: Multivariate Gaussian Distribution

Theorem: Conditional Distribution of Jointly Gaussian Vectors

Step 1: Write the joint exponent in block form

Step 2: Complete the square

Step 3: Read off the conditional distribution

Definition: Circularly Symmetric Complex Gaussian Distribution

Bivariate Gaussian Density Explorer

Parameters

Example: MMSE Estimation via Conditional Gaussian

Step 1: Establish joint Gaussianity

Step 2: Compute the required covariance blocks

Step 3: Apply the conditional Gaussian formula

Step 4: MMSE error covariance

Why This Matters: Why Channel Vectors Are Circularly Symmetric Complex Gaussian

Quick Check

Quick Check

Quick Check

Random Vector

Covariance Matrix

Multivariate Gaussian Distribution

Circularly Symmetric (Complex Gaussian)

Schur Complement

Common Mistake: Forgetting Hermitian Transpose (H^HH) vs Transpose (T^TT) in Complex Covariance

Numerical Stability When Inverting Covariance Matrices

Key Takeaway

Definition:
Random Vector

Definition:
Joint CDF and Joint PDF

Definition:
Marginal and Conditional Densities

Definition:
Mean Vector

Definition:
Correlation Matrix and Covariance Matrix

Definition:
Cross-Covariance Matrix

Definition:
Multivariate Gaussian Distribution

Definition:
Circularly Symmetric Complex Gaussian Distribution

Common Mistake: Forgetting Hermitian Transpose ( $^H$ ) vs Transpose ( $^T$ ) in Complex Covariance