Random Vectors and Their Statistics

From Scalars to Vectors

In Chapters 5–7, we studied individual random variables and pairs (X,Y)(X, Y). But in most engineering applications, we observe not a single measurement but a collection of measurements simultaneously. A MIMO receiver observes nn antenna outputs; an estimator processes a vector of samples; a stochastic process evaluated at nn time instants yields a random vector.

The natural mathematical object is the random vector X=(X1,,Xn)T\mathbf{X} = (X_1, \ldots, X_n)^T, and the natural summary statistics are the mean vector and covariance matrix. This section establishes the vocabulary and the key structural result: the covariance matrix is always positive semi-definite.

Definition:

Random Vector

A random vector is an ordered collection of nn random variables defined on the same probability space (Ω,F,P)(\Omega, \mathcal{F}, \mathbb{P}):

X=(X1X2Xn).\mathbf{X} = \begin{pmatrix} X_1 \\ X_2 \\ \vdots \\ X_n \end{pmatrix}.

The joint PDF (when it exists) is fX(x)f_{\mathbf{X}}(\mathbf{x}) such that P(XA)=AfX(x)dx\mathbb{P}(\mathbf{X} \in A) = \int_A f_{\mathbf{X}}(\mathbf{x})\,d\mathbf{x} for any (measurable) set ARnA \subseteq \mathbb{R}^n.

Random vector

An ordered collection X=(X1,,Xn)T\mathbf{X} = (X_1, \ldots, X_n)^T of random variables on the same probability space. Completely characterized by the family of finite-dimensional distributions.

Related: Covariance matrix

Definition:

Mean Vector and Covariance Matrix

Let X=(X1,,Xn)T\mathbf{X} = (X_1, \ldots, X_n)^T be a random vector with finite second moments. The mean vector is

μ=E[X]=(E[X1]E[Xn]).\boldsymbol{\mu} = \mathbb{E}[\mathbf{X}] = \begin{pmatrix} \mathbb{E}[X_1] \\ \vdots \\ \mathbb{E}[X_n] \end{pmatrix}.

The covariance matrix is the n×nn \times n matrix

Σ=E[(Xμ)(Xμ)T],\boldsymbol{\Sigma} = \mathbb{E}\bigl[(\mathbf{X} - \boldsymbol{\mu})(\mathbf{X} - \boldsymbol{\mu})^T\bigr],

whose (i,j)(i,j)-entry is Cov(Xi,Xj)\text{Cov}(X_i, X_j). The correlation matrix is R=E[XXT]=Σ+μμT\mathbf{R} = \mathbb{E}[\mathbf{X}\mathbf{X}^T] = \boldsymbol{\Sigma} + \boldsymbol{\mu}\boldsymbol{\mu}^T.

The diagonal entries of Σ\boldsymbol{\Sigma} are the variances Var(Xi)\text{Var}(X_i), and the off-diagonal entries are the covariances Cov(Xi,Xj)\text{Cov}(X_i, X_j).

Covariance matrix

The matrix Σ=E[(Xμ)(Xμ)T]\boldsymbol{\Sigma} = \mathbb{E}[(\mathbf{X} - \boldsymbol{\mu})(\mathbf{X} - \boldsymbol{\mu})^T] summarizing all pairwise covariances of a random vector. Always symmetric and positive semi-definite.

Related: Random vector, Positive semi-definite (PSD)

Theorem: Covariance Matrices Are Positive Semi-Definite

For any random vector X\mathbf{X} with finite second moments, the covariance matrix Σ\boldsymbol{\Sigma} is symmetric and positive semi-definite:

aTΣa0for all aRn.\mathbf{a}^T \boldsymbol{\Sigma} \, \mathbf{a} \geq 0 \quad \text{for all } \mathbf{a} \in \mathbb{R}^n.

Moreover, Σ\boldsymbol{\Sigma} is strictly positive definite if and only if no non-trivial linear combination aTX\mathbf{a}^T \mathbf{X} is a constant (almost surely).

The quadratic form aTΣa\mathbf{a}^T \boldsymbol{\Sigma} \, \mathbf{a} equals Var(aTX)\text{Var}(\mathbf{a}^T \mathbf{X}), and variance is always non-negative.

Positive semi-definite (PSD)

A symmetric matrix A\mathbf{A} is PSD (A0\mathbf{A} \succeq 0) if xTAx0\mathbf{x}^T \mathbf{A} \mathbf{x} \geq 0 for all x\mathbf{x}. Equivalently, all eigenvalues of A\mathbf{A} are non-negative.

Related: Covariance matrix

Common Mistake: Covariance Matrix vs. Correlation Matrix

Mistake:

Confusing the covariance matrix Σ\boldsymbol{\Sigma} with the correlation matrix R=E[XXT]\mathbf{R} = \mathbb{E}[\mathbf{X}\mathbf{X}^T].

Correction:

They differ by a rank-one term: R=Σ+μμT\mathbf{R} = \boldsymbol{\Sigma} + \boldsymbol{\mu}\boldsymbol{\mu}^T. They coincide only when μ=0\boldsymbol{\mu} = \mathbf{0}. In signal processing, R\mathbf{R} includes the "DC component" while Σ\boldsymbol{\Sigma} does not.

Example: Covariance Matrix of a Bivariate Distribution

Let X=(X1,X2)T\mathbf{X} = (X_1, X_2)^T with E[X1]=1\mathbb{E}[X_1] = 1, E[X2]=2\mathbb{E}[X_2] = -2, Var(X1)=4\text{Var}(X_1) = 4, Var(X2)=9\text{Var}(X_2) = 9, and Cov(X1,X2)=3\text{Cov}(X_1, X_2) = -3. Write the covariance matrix and verify that it is PSD.

Cross-Covariance Matrix

For two random vectors XRm\mathbf{X} \in \mathbb{R}^m and YRn\mathbf{Y} \in \mathbb{R}^n, the cross-covariance matrix is the m×nm \times n matrix

Σxy=E[(Xμx)(Yμy)T].\boldsymbol{\Sigma}_{xy} = \mathbb{E}\bigl[(\mathbf{X} - \boldsymbol{\mu}_x)(\mathbf{Y} - \boldsymbol{\mu}_y)^T\bigr].

Notice that Σyx=ΣxyT\boldsymbol{\Sigma}_{yx} = \boldsymbol{\Sigma}_{xy}^{T}. The cross-covariance measures the linear dependence between X\mathbf{X} and Y\mathbf{Y} and plays a central role in LMMSE estimation (Book FSI, Chapter 3).

Why This Matters: Covariance Matrices in MIMO Channel Modeling

In a MIMO system with NtN_t transmit and NrN_r receive antennas, the received signal vector yCNr\mathbf{y} \in \mathbb{C}^{N_r} has a covariance matrix that encodes the spatial correlation structure of the channel and noise. The transmit covariance Σt=E[xxH]\boldsymbol{\Sigma}_{t} = \mathbb{E}[\mathbf{x}\mathbf{x}^H] is the design variable in capacity-achieving precoding (water-filling over the eigenmodes of HHH\mathbf{H}\mathbf{H}^H). The receive spatial correlation Σr\boldsymbol{\Sigma}_{r} determines how much diversity the channel offers. All of massive MIMO analysis rests on the covariance matrix structure developed in this chapter.