Expectation
What Is the Average?
Knowing the full PMF of a random variable is ideal, but we often need a single number that summarizes its "typical" value. The expectation does exactly this: it is the probability-weighted average of all possible values. The expectation is the single most important summary of a random variable β not because it tells the whole story, but because it enjoys a remarkable property (linearity) that makes it computable even when the full distribution is out of reach.
Definition: Expectation of a Discrete Random Variable
Expectation of a Discrete Random Variable
The expectation (or mean) of a discrete random variable with PMF and support is
provided the sum converges absolutely: .
If the absolute convergence condition fails, we say the expectation does not exist. The Cauchy distribution is the classical example in the continuous case; among discrete distributions, a random variable with PMF on the positive integers has finite mean, but does not.
Theorem: Linearity of Expectation
For any random variables (not necessarily independent) and constants :
Linearity holds unconditionally β it does not require independence. This is arguably the single most useful property in all of probability. It allows us to compute the expected number of successes in trials without knowing the joint distribution, simply by summing the individual success probabilities.
Scalar case
For a single RV, .
Sum of two RVs
For with joint PMF :
The inner sums are the marginal PMFs, so this equals .
General case
The result for terms follows by induction.
Example: Expected Number of Fixed Points (Matching Problem)
Find where is the number of fixed points in a random permutation of .
Indicator decomposition
Write . By linearity, .
Compute individual probabilities
For each , . (There are permutations fixing element , out of total.)
Conclude
.
Remarkably, the expected number of fixed points is exactly 1, regardless of . This is the power of linearity: we never needed the (complicated) distribution of .
Theorem: LOTUS β Law of the Unconscious Statistician
If is a discrete random variable with PMF and is any function, then
In particular, we do not need to first find the PMF of .
The name is tongue-in-cheek: the formula is so natural that beginners use it "unconsciously" before proving it. The point is that to compute , it suffices to know the distribution of β not of .
Group by values of $X$
Let . The support of is . For each value of :
Expand $\mathbb{E}[Y]$
xg(x) \cdot P(x)\blacksquare$
Example: Computing via LOTUS
Let be a Bernoulli() random variable. Compute using LOTUS.
Apply LOTUS with $g(x) = x^2$
$
Observation
For a Bernoulli RV, . This is because when . We will use this fact when computing the variance of the Bernoulli distribution.
Common Mistake: in General
Mistake:
Assuming that , or more generally that the expectation "passes through" nonlinear functions.
Correction:
Linearity holds only for linear functions of . For a nonlinear , Jensen's inequality tells us the direction of the inequality: if is convex, ; if concave, .
Quick Check
If and are random variables (possibly dependent) with and , what is ?
1
2
7
Cannot determine without knowing the joint distribution
. Linearity holds regardless of dependence.
Why This Matters: Expected Bit Error Rate
In a digital communication system, the bit error rate (BER) is the expected fraction of bits received incorrectly. If we transmit bits and define , then the total number of errors is and . For i.i.d. errors with probability per bit, . The BER is β linearity of expectation makes this trivial.
Expectation
The probability-weighted average of all possible values of a random variable: for discrete RVs.
Related: Probability Mass Function (PMF)
LOTUS (Law of the Unconscious Statistician)
. Allows computing expectations of functions of directly from the PMF of .
Related: Expectation
Historical Note: Huygens and the Origins of Expected Value
1657The concept of expected value dates to Christiaan Huygens' 1657 treatise De Ratiociniis in Ludo Aleae (On Reasoning in Games of Chance), the first published work on probability. Huygens defined the "value of a game" as the price a rational player should pay to participate β exactly what we now call the expected payoff. Pierre-Simon Laplace later placed the concept on a firmer mathematical footing and used it extensively in his 1812 ThΓ©orie analytique des probabilitΓ©s.
Key Takeaway
Linearity of expectation, , holds without any assumption of independence. This is the most useful single property in discrete probability β whenever you can decompose a complicated count into a sum of indicators, linearity hands you the answer.