The Moment Generating Function

Why Transform Methods?

We have spent several chapters learning to compute with distributions directly β€” PDFs, CDFs, convolutions. This works well for one or two random variables, but the moment you need the distribution of a sum Sn=X1+β‹―+XnS_n = X_1 + \cdots + X_n of independent random variables, you face an nn-fold convolution that becomes unwieldy even for n=3n = 3.

Transform methods offer an elegant alternative: encode the distribution into a single function, and the convolution becomes a product. This chapter develops three transforms β€” the MGF, the characteristic function, and the PGF β€” each suited to different contexts. Together, they form the analytical backbone that powers the great limit theorems: the law of large numbers and the central limit theorem.

Definition:

Moment Generating Function (MGF)

Let XX be a random variable with CDF F(x)F(x). The moment generating function (MGF) of XX is

MX(t)=E[etX]=βˆ«βˆ’βˆžβˆžetx dF(x),t∈R.M_X(t) = \mathbb{E}[e^{tX}] = \int_{-\infty}^{\infty} e^{tx}\,dF(x), \qquad t \in \mathbb{R}.

The MGF may be finite only on a subset of R\mathbb{R}. We say that the MGF exists if there is an open interval (βˆ’a,a)(-a, a) with a>0a > 0 such that MX(t)<∞M_X(t) < \infty for all ∣t∣<a|t| < a.

The Riemann-Stieltjes formulation ∫etx dF(x)\int e^{tx}\,dF(x) unifies the discrete case (βˆ‘xetxpX(x)\sum_x e^{tx} p_X(x)) and the continuous case (∫etxf(x) dx\int e^{tx} f(x)\,dx).

Moment Generating Function (MGF)

The function MX(t)=E[etX]M_X(t) = \mathbb{E}[e^{tX}], which encodes all moments of XX through its Taylor coefficients: E[Xk]=MX(k)(0)\mathbb{E}[X^k] = M_X^{(k)}(0).

Related: Characteristic Function, Probability Generating Function (PGF)

MGF as Bilateral Laplace Transform

For a continuous random variable with PDF f(x)f(x), the MGF is the bilateral Laplace transform of the density evaluated at s=βˆ’ts = -t:

MX(t)=L[f](s)∣s=βˆ’t,L[f](s)=βˆ«βˆ’βˆžβˆžeβˆ’sxf(x) dx.M_X(t) = \mathcal{L}[f](s)\big|_{s = -t}, \qquad \mathcal{L}[f](s) = \int_{-\infty}^{\infty} e^{-sx}f(x)\,dx.

This connection to the Laplace transform is why the MGF inherits all the algebraic machinery of transform calculus β€” in particular, the conversion of convolution to multiplication.

Theorem: Moments from Derivatives of the MGF

If MX(t)<∞M_X(t) < \infty for ∣t∣<a|t| < a with a>0a > 0, then all moments of XX exist and

E[Xk]=dkdtkMX(t)∣t=0=MX(k)(0),k=1,2,3,…\mathbb{E}[X^k] = \frac{d^k}{dt^k}M_X(t)\bigg|_{t=0} = M_X^{(k)}(0), \qquad k = 1, 2, 3, \ldots

Moreover, MX(t)M_X(t) admits the Taylor expansion

MX(t)=βˆ‘k=0∞E[Xk]k! tkM_X(t) = \sum_{k=0}^{\infty} \frac{\mathbb{E}[X^k]}{k!}\,t^k

for ∣t∣<a|t| < a.

Differentiate under the integral: ddtE[etX]=E[XetX]\frac{d}{dt}\mathbb{E}[e^{tX}] = \mathbb{E}[X e^{tX}], and evaluate at t=0t = 0 to extract moments one at a time.

Theorem: MGF of a Sum of Independent Random Variables

If XX and YY are independent random variables whose MGFs exist, then

MX+Y(t)=MX(t)β‹…MY(t).M_{X+Y}(t) = M_X(t) \cdot M_Y(t).

More generally, if X1,…,XnX_1, \ldots, X_n are independent, then

MSn(t)=∏i=1nMXi(t),Sn=βˆ‘i=1nXi.M_{S_n}(t) = \prod_{i=1}^n M_{X_i}(t), \qquad S_n = \sum_{i=1}^n X_i.

The exponential converts a sum into a product: et(X+Y)=etXβ‹…etYe^{t(X+Y)} = e^{tX} \cdot e^{tY}. Independence then factors the expectation: E[etXetY]=E[etX] E[etY]\mathbb{E}[e^{tX} e^{tY}] = \mathbb{E}[e^{tX}]\,\mathbb{E}[e^{tY}].

Key Takeaway

The MGF converts the convolution of densities into a product of functions. This is the single most important algebraic property of transforms: it reduces the problem of finding the distribution of a sum to multiplying known functions and inverting.

Example: MGF of the Gaussian Distribution

Let X∼N(ΞΌ,Οƒ2)X \sim \mathcal{N}(\mu, \sigma^2). Compute MX(t)M_X(t).

Example: MGF of the Exponential Distribution

Let X∼Exp(Ξ»)X \sim \text{Exp}(\lambda) with PDF f(x)=Ξ»eβˆ’Ξ»xf(x) = \lambda e^{-\lambda x} for xβ‰₯0x \geq 0. Find MX(t)M_X(t) and identify its domain.

MGFs of Common Distributions

DistributionParametersMX(t)M_X(t)Domain
Bernoulli(p)\text{Bernoulli}(p)p∈(0,1)p \in (0,1)1βˆ’p+pet1 - p + pe^tR\mathbb{R}
Bin(n,p)\text{Bin}(n, p)n∈N,β€…β€Šp∈(0,1)n \in \mathbb{N},\; p \in (0,1)(1βˆ’p+pet)n(1 - p + pe^t)^nR\mathbb{R}
Poi(Ξ»)\text{Poi}(\lambda)Ξ»>0\lambda > 0eΞ»(etβˆ’1)e^{\lambda(e^t - 1)}R\mathbb{R}
Exp(Ξ»)\text{Exp}(\lambda)Ξ»>0\lambda > 0Ξ»Ξ»βˆ’t\frac{\lambda}{\lambda - t}t<Ξ»t < \lambda
Gamma(Ξ±,Ξ²)\text{Gamma}(\alpha, \beta)Ξ±,Ξ²>0\alpha, \beta > 0(Ξ²Ξ²βˆ’t)Ξ±\left(\frac{\beta}{\beta - t}\right)^\alphat<Ξ²t < \beta
N(ΞΌ,Οƒ2)\mathcal{N}(\mu, \sigma^2)μ∈R,β€…β€ŠΟƒ2>0\mu \in \mathbb{R},\; \sigma^2 > 0eΞΌt+Οƒ2t2/2e^{\mu t + \sigma^2 t^2/2}R\mathbb{R}

Common Mistake: The MGF Does Not Always Exist

Mistake:

Assuming that MX(t)M_X(t) is finite for all tt β€” or even for any tβ‰ 0t \neq 0 β€” without checking. For example, the Cauchy distribution has MX(t)=∞M_X(t) = \infty for all tβ‰ 0t \neq 0.

Correction:

Always verify the domain of finiteness before using the MGF. Heavy-tailed distributions (Cauchy, Pareto with small shape parameter, log-normal) have MGFs that diverge everywhere except at t=0t = 0. For such distributions, use the characteristic function instead β€” it always exists.

MGF Explorer for Common Distributions

Select a distribution and adjust its parameters to see how the MGF MX(t)M_X(t) changes shape. Notice how the slope at t=0t = 0 equals the mean and the curvature relates to the variance.

Parameters
1
1

Historical Note: Laplace and the Birth of Transform Methods

18th-19th century

The idea of encoding a function through an integral transform dates to Pierre-Simon Laplace (1749--1827), who used what we now call the Laplace transform to solve differential equations. The connection to probability was recognized early: the MGF MX(t)=E[etX]M_X(t) = \mathbb{E}[e^{tX}] is precisely the Laplace transform of the density evaluated on the negative real axis. Laplace himself used generating functions (the discrete analogue) to study random walks and gambler's ruin problems in his Theorie analytique des probabilites (1812).

Laplace Transform

The integral transform L[f](s)=∫0∞eβˆ’sxf(x) dx\mathcal{L}[f](s) = \int_0^{\infty} e^{-sx} f(x)\,dx. For probability, the bilateral version βˆ«βˆ’βˆžβˆžeβˆ’sxf(x) dx\int_{-\infty}^{\infty} e^{-sx} f(x)\,dx connects to the MGF via MX(t)=L[f](βˆ’t)M_X(t) = \mathcal{L}[f](-t).

Related: Moment Generating Function (MGF)

Quick Check

If X∼Poi(λ)X \sim \text{Poi}(\lambda) and Y∼Poi(μ)Y \sim \text{Poi}(\mu) are independent, what is the MGF of Z=X+YZ = X + Y?

e(Ξ»+ΞΌ)(etβˆ’1)e^{(\lambda + \mu)(e^t - 1)}

eλμ(etβˆ’1)e^{\lambda\mu(e^t - 1)}

e(Ξ»+ΞΌ)te^{(\lambda + \mu)t}

λμ(Ξ»βˆ’t)(ΞΌβˆ’t)\frac{\lambda\mu}{(\lambda-t)(\mu-t)}

πŸ”§Engineering Note

MGF Approach to BER Analysis over Fading Channels

In digital communications over fading channels, the bit error rate (BER) conditioned on the instantaneous SNR Ξ³\gamma is typically Pe(Ξ³)=Q(2Ξ³)P_e(\gamma) = Q(\sqrt{2\gamma}). The average BER requires integrating Pe(Ξ³)P_e(\gamma) over the fading distribution. Using the alternative form Q(x)=1Ο€βˆ«0Ο€/2exp⁑ ⁣(βˆ’x22sin⁑2ΞΈ)dΞΈQ(x) = \frac{1}{\pi}\int_0^{\pi/2} \exp\!\bigl(-\frac{x^2}{2\sin^2\theta}\bigr)d\theta (Craig's representation), the inner integral becomes the MGF of Ξ³\gamma evaluated at βˆ’1/sin⁑2ΞΈ-1/\sin^2\theta. This converts the BER averaging problem into an MGF evaluation β€” a technique used extensively in wireless communications.

Practical Constraints
  • β€’

    Requires that the MGF of the fading distribution exists

  • β€’

    Craig's representation applies to the Q-function specifically

Why This Matters: MGF and Fading Channel Analysis

The MGF of the SNR distribution plays a central role in wireless communications. For Rayleigh fading (γ∼Exp(1/Ξ³Λ‰)\gamma \sim \text{Exp}(1/\bar{\gamma})), the MGF is MΞ³(t)=(1βˆ’Ξ³Λ‰t)βˆ’1M_\gamma(t) = (1 - \bar{\gamma}t)^{-1}, which directly yields closed-form BER expressions for most modulation schemes. The MGF approach extends naturally to diversity combining (MRC, EGC) and MIMO systems where the SNR is a sum or function of channel gains.