Joint PMFs and PDFs

Why Joint Distributions?

In Chapters 5 and 6 we studied a single random variable at a time. But in virtually every engineering problem, multiple quantities interact: the signal and the noise, the channel gain and the interference, the transmit power and the received SNR. To reason about how two or more random variables relate to each other β€” whether they are dependent, how one conditions the other, what happens when we add or transform them β€” we need the joint distribution.

The marginal distributions fXf_{X} and fYf_{Y} alone do not determine the joint distribution fX,Yf_{X,Y}. The joint distribution is a strictly richer object: it encodes all marginals, all conditionals, and all dependence structure.

Definition:

Joint Cumulative Distribution Function

Let XX and YY be random variables defined on a common probability space (Ξ©,F,P)(\Omega, \mathcal{F}, \mathbb{P}). The joint CDF of (X,Y)(X, Y) is the function FX,Y:R2β†’[0,1]F_{X,Y} : \mathbb{R}^2 \to [0,1] defined by

FX,Y(x,y)=P(X≀x,β€…β€ŠY≀y).F_{X,Y}(x, y) = \mathbb{P}(X \le x,\; Y \le y).

The joint CDF is the fundamental object from which all other joint distributional quantities are derived. It always exists, regardless of whether the RVs are discrete, continuous, or mixed.

Theorem: Properties of the Joint CDF

Let FX,Y(x,y)F_{X,Y}(x,y) be the joint CDF of (X,Y)(X,Y). Then:

  1. Limits: lim⁑xβ†’βˆ’βˆžFX,Y(x,y)=0\lim_{x \to -\infty} F_{X,Y}(x,y) = 0 for all yy, lim⁑yβ†’βˆ’βˆžFX,Y(x,y)=0\lim_{y \to -\infty} F_{X,Y}(x,y) = 0 for all xx, and lim⁑x,yβ†’+∞FX,Y(x,y)=1\lim_{x,y \to +\infty} F_{X,Y}(x,y) = 1.

  2. Monotonicity: FX,YF_{X,Y} is non-decreasing in each argument.

  3. Right-continuity: FX,YF_{X,Y} is right-continuous in each argument.

  4. Marginals: FX(x)=lim⁑yβ†’βˆžFX,Y(x,y)F_{X}(x) = \lim_{y \to \infty} F_{X,Y}(x,y) and FY(y)=lim⁑xβ†’βˆžFX,Y(x,y)F_{Y}(y) = \lim_{x \to \infty} F_{X,Y}(x,y).

  5. Rectangle probability: For a<ba < b and c<dc < d, P(a<X≀b,β€…β€Šc<Y≀d)=FX,Y(b,d)βˆ’FX,Y(a,d)βˆ’FX,Y(b,c)+FX,Y(a,c)β‰₯0.\mathbb{P}(a < X \le b,\; c < Y \le d) = F_{X,Y}(b,d) - F_{X,Y}(a,d) - F_{X,Y}(b,c) + F_{X,Y}(a,c) \ge 0.

Definition:

Joint Probability Mass Function

Let XX and YY be discrete random variables with supports X={x1,x2,…}\mathcal{X} = \{x_1, x_2, \ldots\} and Y={y1,y2,…}\mathcal{Y} = \{y_1, y_2, \ldots\}. The joint PMF is

PX,Y(xi,yj)=P(X=xi,β€…β€ŠY=yj),P_{X,Y}(x_i, y_j) = \mathbb{P}(X = x_i,\; Y = y_j),

satisfying PX,Y(xi,yj)β‰₯0P_{X,Y}(x_i, y_j) \ge 0 for all i,ji, j and βˆ‘iβˆ‘jPX,Y(xi,yj)=1\sum_i \sum_j P_{X,Y}(x_i, y_j) = 1.

The joint PMF can be displayed as a table (or matrix) indexed by the values of XX and YY. Row sums give the marginal PXP_{X}; column sums give the marginal PYP_{Y}.

Definition:

Marginal PMF from Joint PMF

Given the joint PMF PX,YP_{X,Y}, the marginal PMFs are obtained by summing over the other variable:

PX(xi)=βˆ‘jPX,Y(xi,yj),PY(yj)=βˆ‘iPX,Y(xi,yj).P_{X}(x_i) = \sum_j P_{X,Y}(x_i, y_j), \qquad P_{Y}(y_j) = \sum_i P_{X,Y}(x_i, y_j).

The marginal PMFs are proper PMFs: each is non-negative and sums to 1.

Example: Joint PMF β€” Weather in Two Cities

Let XX and YY denote the weather in Los Angeles and San Francisco, respectively, where 00 = sunny and 11 = cloudy. The joint PMF is given by the table:

X\YX \backslash Y 0 1
0 0.2 0.5
1 0.1 0.2

Find the marginal PMFs and compute P(atΒ leastΒ oneΒ cityΒ isΒ sunny)\mathbb{P}(\text{at least one city is sunny}).

Definition:

Joint Probability Density Function

Two random variables XX and YY are jointly continuous if their joint CDF can be expressed as

FX,Y(x,y)=βˆ«βˆ’βˆžxβˆ«βˆ’βˆžyfX,Y(u,v) dv duF_{X,Y}(x,y) = \int_{-\infty}^{x} \int_{-\infty}^{y} f_{X,Y}(u,v)\,dv\,du

for some non-negative function fX,Yf_{X,Y} called the joint probability density function. Equivalently,

fX,Y(x,y)=βˆ‚2FX,Y(x,y)βˆ‚xβ€‰βˆ‚yf_{X,Y}(x,y) = \frac{\partial^2 F_{X,Y}(x,y)}{\partial x\,\partial y}

wherever the mixed partial derivative exists.

The joint PDF satisfies fX,Y(x,y)β‰₯0f_{X,Y}(x,y) \ge 0 and βˆ«βˆ’βˆžβˆžβˆ«βˆ’βˆžβˆžfX,Y(x,y) dx dy=1\int_{-\infty}^{\infty}\int_{-\infty}^{\infty} f_{X,Y}(x,y)\,dx\,dy = 1.

The value fX,Y(x,y)f_{X,Y}(x,y) is not a probability β€” it is a density. The probability of (X,Y)(X,Y) falling in a region AA is P((X,Y)∈A)=∬AfX,Y(x,y) dx dy\mathbb{P}((X,Y) \in A) = \iint_A f_{X,Y}(x,y)\,dx\,dy.

Definition:

Marginal PDF from Joint PDF

Given the joint PDF fX,Yf_{X,Y}, the marginal PDFs are

fX(x)=βˆ«βˆ’βˆžβˆžfX,Y(x,y) dy,fY(y)=βˆ«βˆ’βˆžβˆžfX,Y(x,y) dx.f_{X}(x) = \int_{-\infty}^{\infty} f_{X,Y}(x,y)\,dy, \qquad f_{Y}(y) = \int_{-\infty}^{\infty} f_{X,Y}(x,y)\,dx.

Geometrically, the marginal fX(x)f_{X}(x) is obtained by integrating ("projecting") the joint density along the yy-axis.

Example: Bivariate Uniform on a Triangle

Let (X,Y)(X, Y) be uniformly distributed on the triangle {(x,y):0≀x≀1,β€…β€Š0≀y≀x}\{(x,y) : 0 \le x \le 1,\; 0 \le y \le x\}, which has area 1/21/2. Find the joint PDF, the marginal PDFs, and P(Y>X/2)\mathbb{P}(Y > X/2).

Joint probability density function

A non-negative function fX,Y(x,y)f_{X,Y}(x,y) whose double integral over any region AβŠ†R2A \subseteq \mathbb{R}^2 gives P((X,Y)∈A)\mathbb{P}((X,Y) \in A).

Related: Marginal distribution

Marginal distribution

The distribution of a single random variable obtained from a joint distribution by integrating (or summing) over all other variables.

Related: Joint probability density function

Joint PDF and Marginal Projections

Animation showing how the marginal densities fX(x)f_{X}(x) and fY(y)f_{Y}(y) arise as projections (integrals) of the joint density fX,Y(x,y)f_{X,Y}(x,y) along each axis.

Joint PDF Contour Plot with Marginals

Explore the joint density of a bivariate Gaussian with adjustable means, variances, and correlation coefficient ρ\rho. The marginal densities are displayed on the side panels.

Parameters
0
0
1
1
0

Common Mistake: Marginals Do Not Determine the Joint Distribution

Mistake:

Assuming that knowing fXf_{X} and fYf_{Y} is enough to determine fX,Yf_{X,Y}.

Correction:

Infinitely many joint distributions share the same marginals. The joint distribution encodes the dependence structure between XX and YY, which the marginals alone cannot capture. For instance, two standard Gaussian marginals can be paired with any correlation ρ∈[βˆ’1,1]\rho \in [-1, 1] to produce different bivariate Gaussian distributions.

Quick Check

If fX,Y(x,y)=6(1βˆ’y)f_{X,Y}(x,y) = 6(1-y) for 0≀x≀y≀10 \le x \le y \le 1 and zero otherwise, what is fX(x)f_{X}(x) for x∈[0,1]x \in [0,1]?

3(1βˆ’x)23(1-x)^2

6(1βˆ’x)6(1-x)

3x(1βˆ’x)3x(1-x)

6x6x

Historical Note: The Origins of Multivariate Distributions

1880s–1933

The study of joint distributions began in earnest with Francis Galton's work on regression and correlation in the 1880s. Galton noticed that the heights of fathers and sons formed an elliptical scatter pattern β€” the hallmark of a bivariate Gaussian. Karl Pearson formalized this observation into the multivariate normal distribution and introduced the correlation coefficient ρ\rho that we still use today. The generalization to arbitrary joint distributions, via the joint CDF, came later with Kolmogorov's axiomatization of probability in 1933.

Key Takeaway

The joint distribution fX,Yf_{X,Y} determines the marginals fXf_{X} and fYf_{Y} (by integration), but the converse is false. The joint distribution is a strictly richer object that encodes the full dependence structure between the random variables.