Random Vectors and Joint Distributions
Why Random Vectors?
In a multiple-input multiple-output (MIMO) system with receive antennas and transmit antennas, the received signal is not a scalar but a vector:
where is the received vector, is the channel matrix, is the transmitted vector, and is additive noise. Every entry of , , and is a random variable, and they are generally correlated across antennas.
To design MIMO detectors, compute channel capacity, or derive optimal beamformers, we must handle joint distributions over multiple random variables simultaneously. The covariance matrix encodes the noise correlation structure; the channel covariance governs spatial diversity and multiplexing gains.
This section is where Chapter 1's linear algebra โ inner products, eigendecompositions, positive semidefiniteness โ meets Chapter 2's probability. The central result is the multivariate Gaussian distribution, whose conditional distributions are again Gaussian with parameters given by the Schur complement. This single fact underlies minimum mean-square error (MMSE) estimation, Kalman filtering, and the capacity formula for MIMO channels.
Definition: Random Vector
Random Vector
Let be a probability space. A random vector of dimension is a measurable function
(or in the complex case), written as
where each component (or ) is itself a random variable. Measurability of means that for every Borel set ,
This is equivalent to requiring that each component is a random variable in the sense of Definition def-rv.
We use boldface lowercase () for random vectors and boldface uppercase () for random matrices, following the convention established in Chapter 1. A concrete realisation is denoted by the same boldface symbol; context (or an explicit statement such as "let ") distinguishes the random object from its value.
Definition: Joint CDF and Joint PDF
Joint CDF and Joint PDF
Let be a real random vector.
Joint CDF. The joint cumulative distribution function is
Properties of the joint CDF:
- is non-decreasing in each argument.
- is right-continuous in each argument.
- .
- .
Joint PDF. The random vector is (absolutely) continuous if there exists a non-negative function such that
Wherever is sufficiently smooth,
Properties of the joint PDF:
- for all .
- .
- for any Borel set .
The joint CDF and joint PDF completely characterise the joint distribution of . In the complex case, we identify with and define the joint density over the real and imaginary parts separately.
Definition: Marginal and Conditional Densities
Marginal and Conditional Densities
Let have joint PDF .
Marginal density. The marginal PDF of any subset of components is obtained by integrating out the remaining variables. For instance, the marginal PDF of is
More generally, if we partition , then
Conditional density. The conditional PDF of given is defined (where ) by
This is the multivariate analog of Bayes' formula for densities.
Bayes' rule for densities. Rearranging:
Hence
Marginalisation and conditioning are the two fundamental operations on joint distributions. In communications, marginalisation computes the distribution of a single antenna's output from the joint MIMO distribution; conditioning computes the posterior distribution of the transmitted symbol given the received signal.
Definition: Mean Vector
Mean Vector
The mean vector (or expectation) of a random vector is
Expectation of a random vector (or matrix) is defined component-wise. Linearity carries over: for any deterministic matrix and vector .
Definition: Correlation Matrix and Covariance Matrix
Correlation Matrix and Covariance Matrix
Let be a random vector with mean .
Correlation matrix (autocorrelation matrix).
The -th entry is .
Covariance matrix.
The -th entry is .
Properties of the covariance matrix:
- Hermitian: .
- Positive semidefinite (PSD): (proved in Theorem thm-covariance-psd below).
- Diagonal entries are variances: .
- Off-diagonal entries are covariances: (Cauchy--Schwarz).
- Affine transformation: If , then .
For real random vectors, replace the Hermitian transpose with the ordinary transpose throughout.
The correlation matrix and covariance matrix coincide when is zero-mean (), which is the typical case for noise and for circularly symmetric channel vectors. The distinction matters when the mean is nonzero, e.g., in Ricean fading where the LOS component contributes a nonzero mean.
Definition: Cross-Covariance Matrix
Cross-Covariance Matrix
Let and be random vectors with means and . The cross-covariance matrix is
Note that .
The random vectors and are uncorrelated if .
When we partition a joint random vector , the full covariance matrix has block structure:
This block partitioning is essential for the conditional Gaussian theorem (Theorem thm-conditional-gaussian).
Theorem: The Covariance Matrix Is Positive Semidefinite
Let be a random vector with covariance matrix . Then is Hermitian positive semidefinite:
Equality holds for a particular if and only if almost surely, i.e., lies in the null space of the "centered" random vector.
The quadratic form is the variance of the scalar projection , and variances are nonnegative.
Write out using the definition of .
Move inside the expectation and recognise the result as the expected squared magnitude of a scalar.
Proof
Let be arbitrary. Define the scalar random variable . Then
Since pointwise, its expectation is nonnegative:
Equality holds if and only if almost surely, i.e., a.s.
Definition: Multivariate Gaussian Distribution
Multivariate Gaussian Distribution
A real random vector has the multivariate Gaussian (or multivariate normal) distribution with mean and covariance matrix (symmetric, positive definite), written
if its joint PDF is
Key properties:
-
Marginals are Gaussian: Any sub-vector of is also Gaussian, with mean and covariance given by the corresponding sub-vector and sub-matrix.
-
Affine closure: If where and , then .
-
Uncorrelated independent: For jointly Gaussian random variables, zero covariance implies independence (the converse always holds). This is a special property of the Gaussian; it fails for general distributions.
-
Contours of constant density are ellipsoids centered at : , with axes aligned along the eigenvectors of and semi-axis lengths proportional to .
The exponent is the squared Mahalanobis distance from to . It generalises the familiar from the scalar case by accounting for correlations through . Points at equal Mahalanobis distance have equal density, forming the ellipsoidal contours.
When is singular (positive semidefinite but not positive definite), the distribution is supported on a proper affine subspace of and does not possess a density with respect to Lebesgue measure on . One can still define it via characteristic functions: iff .
Theorem: Conditional Distribution of Jointly Gaussian Vectors
Let be a jointly Gaussian vector with
where , , , and is invertible. Then the conditional distribution of given is Gaussian:
with conditional mean (a linear function of ):
and conditional covariance (independent of ):
The matrix is the Schur complement of in .
Conditioning on shifts the mean of by a linear correction proportional to how far deviates from its own mean, scaled by the "regression coefficient" . The conditional covariance is always smaller (in the PSD sense) than : observing can only reduce uncertainty about .
Complete the square in the joint exponent after substituting the block partition.
Use the Schur complement identity for the inverse of a block matrix.
Step 1: Write the joint exponent in block form
The joint PDF is proportional to where . Partition with . Using the block inversion formula (Schur complement), the inverse of the covariance has blocks:
where .
Step 2: Complete the square
Expanding the quadratic form and grouping terms involving , we obtain
The second term depends only on and will be absorbed into the marginal density of upon conditioning.
Step 3: Read off the conditional distribution
Using , the second quadratic term cancels with the marginal PDF of . What remains is Gaussian in with mean
and covariance .
Crucially, does not depend on : conditioning changes the mean but not the covariance structure.
Definition: Circularly Symmetric Complex Gaussian Distribution
Circularly Symmetric Complex Gaussian Distribution
A complex random vector has the circularly symmetric complex Gaussian distribution, written
if it satisfies two conditions:
- The augmented real vector is jointly Gaussian.
- Circular symmetry: for all (rotation of the complex plane leaves the distribution invariant).
Here and is the covariance matrix.
Consequence of circular symmetry: pseudo-covariance vanishes. The pseudo-covariance matrix (or relation matrix) is
For a circularly symmetric distribution, . This means the real and imaginary parts of have equal covariances and complementary cross-covariances: if (with real), then
PDF. For and positive definite, the PDF is
Note the normalising constant (not ) and the absence of the factor in the exponent, compared with the real Gaussian.
Conditional distribution. The complex version of the conditional Gaussian theorem holds with replaced by : if is jointly , then
The distribution is the standard model for noise and channel vectors in wireless communications. Circular symmetry reflects the physical fact that the absolute carrier phase is uniformly distributed and unknown, so the joint statistics must be invariant to phase rotation.
A non-circularly-symmetric complex Gaussian (also called "improper" or "non-circular") has and requires the full augmented description. Such signals arise in certain interference scenarios and in widely-linear processing.
Bivariate Gaussian Density Explorer
Visualise the joint PDF of a bivariate Gaussian as a 3D surface or contour plot. Adjust the correlation coefficient to see how the elliptical contours rotate and stretch. When the contours are axis-aligned (independent components); as the ellipse collapses onto a line (perfect linear dependence).
Parameters
Example: MMSE Estimation via Conditional Gaussian
Consider the standard linear observation model
where is the transmitted vector, is noise independent of , and is a known (deterministic) channel matrix. Derive the minimum mean-square error (MMSE) estimate .
Step 1: Establish joint Gaussianity
Since and are independent complex Gaussian vectors, and is an affine function of , the stacked vector is jointly .
Step 2: Compute the required covariance blocks
$
Step 3: Apply the conditional Gaussian formula
By the conditional Gaussian theorem (complex version):
This is the MMSE estimator (also called the Wiener filter or Bayesian LMMSE estimator).
Step 4: MMSE error covariance
The estimation error covariance is the conditional covariance (Schur complement):
Using the matrix inversion lemma, this can be rewritten as
which is often more convenient when is diagonal (as in i.i.d. signalling).
Key insight: The MMSE estimator and its error covariance both follow directly from the conditional Gaussian theorem โ no calculus of variations or Lagrange multipliers is needed. The Gaussian distribution makes estimation a purely algebraic exercise.
Why This Matters: Why Channel Vectors Are Circularly Symmetric Complex Gaussian
In a MIMO wireless channel with transmit and receive antennas, the channel matrix has entries representing the complex gain from transmit antenna to receive antenna . In a rich-scattering environment with no line-of-sight component, each is the superposition of a large number of independent scattered paths:
where is the amplitude and the phase of the -th path.
Why circularly symmetric? The phases are uniformly distributed on because the propagation distances are much larger than the carrier wavelength. By the central limit theorem (applied to the real and imaginary parts separately), converges in distribution to a complex Gaussian. The uniform phase distribution ensures that for any fixed , which is precisely circular symmetry. The pseudo-covariance vanishes: .
Spatial correlation. The entries of are generally correlated across antennas when the antenna spacing is small relative to the wavelength. The Kronecker model approximates the full channel covariance as
where and are the receive and transmit spatial correlation matrices, and is the Kronecker product (Section 1.7).
i.i.d. Rayleigh fading. When the antennas are sufficiently spaced (typically ), and , giving the i.i.d. model independently. This is the standard benchmark for MIMO capacity analysis.
See full treatment in Chapter 6ุ Section 3
Quick Check
Let be a random vector with covariance matrix . Which of the following is guaranteed to hold?
is positive definite
is positive semidefinite
is diagonal
All eigenvalues of are strictly positive
By Theorem thm-covariance-psd, the covariance matrix is always positive semidefinite: for all . It is positive definite only if no nontrivial linear combination of the components is degenerate (deterministic). For instance, if deterministically, then is PSD but singular (not PD).
Quick Check
Let with . What is ?
By the conditional Gaussian formula, . The conditional mean shifts linearly with , with regression coefficient .
Quick Check
Let . What is the pseudo-covariance ?
Not well-defined
Circular symmetry implies for all . The only matrix satisfying for all is . This vanishing pseudo-covariance is the hallmark of circular symmetry.
Random Vector
A measurable function (or ) whose components are random variables. Extends the scalar random variable concept to the multivariate setting needed for MIMO and multi-sensor processing.
Related: Random Vector, Random Variable
Covariance Matrix
For a random vector with mean , the matrix . It is Hermitian and positive semidefinite. Diagonal entries are variances; off-diagonal entries are covariances between components. Encodes the second-order correlation structure.
Related: Correlation Matrix and Covariance Matrix, The Covariance Matrix Is Positive Semidefinite
Multivariate Gaussian Distribution
A distribution on parameterised by mean and covariance , with PDF proportional to . Marginals, conditionals, and affine transformations of Gaussians are Gaussian. Uncorrelated components are independent. The workhorse distribution for MIMO signal processing.
Related: Multivariate Gaussian Distribution, Conditional Distribution of Jointly Gaussian Vectors
Circularly Symmetric (Complex Gaussian)
A complex random vector is circularly symmetric if for all . Equivalently, the pseudo-covariance vanishes. The standard model for wireless channel gains and additive noise, written .
Related: Circularly Symmetric Complex Gaussian Distribution, Why Channel Vectors Are Circularly Symmetric Complex Gaussian
Schur Complement
Given a block matrix with invertible, the Schur complement of in is . In the conditional Gaussian theorem, the Schur complement of gives the conditional covariance . Also central to block matrix inversion and determinant identities.
Related: Conditional Distribution of Jointly Gaussian Vectors, MMSE Estimation via Conditional Gaussian
Common Mistake: Forgetting Hermitian Transpose () vs Transpose () in Complex Covariance
Mistake:
Writing the covariance matrix of a complex random vector as using the ordinary transpose instead of the Hermitian (conjugate) transpose .
Correction:
For complex random vectors, the correct definition is
where denotes the conjugate transpose. Using instead of gives the pseudo-covariance matrix , which is a completely different object (and equals zero for circularly symmetric vectors).
Consequences of the error:
- is not Hermitian; the resulting "covariance" would not be Hermitian PSD.
- The Gaussian PDF formula uses with the -defined covariance. Substituting the -version gives a nonsensical density.
- All downstream results โ MMSE estimators, capacity formulas, beamformer designs โ will be incorrect.
Mnemonic: In the real case, , so the distinction vanishes. The moment you work with complex signals (which is always in baseband communications), switch to .
Numerical Stability When Inverting Covariance Matrices
The conditional Gaussian formula and the MMSE estimator both require inverting the covariance matrix . In practice, covariance matrices estimated from finite samples are often ill-conditioned, especially in massive MIMO systems where or can exceed 64.
Key issues:
- Condition number: If exceeds -- (common in correlated channels), direct inversion via LU or Cholesky can amplify roundoff errors by a factor of in double precision (64-bit IEEE 754, significant digits).
- Sample deficiency: When the number of samples (the dimension), the sample covariance matrix is rank-deficient and singular. This occurs routinely in pilot-limited systems.
Practical remedies:
- Diagonal loading (Tikhonov regularisation): Replace with where . This bounds .
- Cholesky factorisation: Compute and solve via forward/back-substitution instead of forming explicitly. Cost: flops vs for general inverse.
- Eigenvalue truncation: Discard eigenvalues below a threshold (e.g., ) and invert only the dominant subspace.
- Woodbury identity: For low-rank updates (common in signal + noise models), the Woodbury formula gives in instead of where is the rank of .
In 5G NR, channel estimation uses LMMSE with regularised covariance inversion at every coherence interval ( ms at 30 kHz SCS). At 64 antennas, this means inverting a complex matrix every millisecond per user --- numerical stability is not academic.
- โข
Double-precision floating point limits condition number to ~10^15
- โข
Sample covariance requires N >= n samples for full rank
- โข
5G NR coherence time constrains computation budget to ~1 ms
Key Takeaway
The central message of this section in three points:
-
Random vectors extend scalar probability to MIMO. The covariance matrix โ Hermitian and positive semidefinite by construction โ encodes all pairwise second-order statistics. The eigendecomposition of reveals the principal axes of randomness, directly connecting to Chapter 1's spectral theory.
-
The conditional Gaussian theorem is the master tool. For jointly Gaussian vectors, conditioning produces another Gaussian whose parameters are given explicitly by the Schur complement: and . This single result underlies the MMSE estimator, the Kalman filter, and MIMO capacity computation.
-
Circular symmetry is the bridge to wireless. The distribution โ with its vanishing pseudo-covariance and phase-rotation invariance โ is the natural model for channel vectors in rich-scattering environments. All of the real Gaussian machinery (marginals, conditionals, affine closure) carries over, with replaced by throughout.