Combinatorial Probability
Counting as the Engine of Combinatorial Probability
When all outcomes in a finite sample space are equally likely, computing reduces probability to pure counting. This sounds simple, but counting is surprisingly subtle β the answer depends critically on whether order matters and whether repetition is allowed.
The four sampling paradigms developed in this section generate almost every counting problem in discrete probability. They also appear repeatedly in coding theory: the number of codewords in a codebook is an ordered sample without replacement; the number of distinct error patterns is an unordered sample; the multinomial coefficient counts the ways to assign symbols across bins. Getting the counting right is, in Feller's words, "a matter of utmost importance."
Definition: The Four Sampling Paradigms
The Four Sampling Paradigms
Given a set with , we extract a sample of size . The number of distinct samples depends on two choices:
| With replacement | Without replacement | |
|---|---|---|
| Ordered | ||
| Unordered |
- Ordered with replacement (sequences): β each of draws has choices independently.
- Ordered without replacement (permutations): β choices decrease by one with each draw.
- Unordered without replacement (combinations): β divide permutations by since order within the sample does not matter.
- Unordered with replacement (multisets): β equivalently, the number of ways to place indistinguishable balls into bins.
Theorem: Properties of Binomial Coefficients
For non-negative integers and with :
- Symmetry: .
- Pascal's rule: .
- Binomial theorem: .
- Sum: .
- Vandermonde's identity: .
Pascal's rule is the recursion underlying Pascal's triangle and enables efficient computation. The binomial theorem says the are exactly the coefficients in the expansion of , explaining the name "binomial coefficients." Property 4 follows by setting .
Symmetry
. Combinatorially: choosing elements to include is the same as choosing elements to exclude.
Pascal's rule
Combinatorial proof: fix element . Every -subset either includes (choose the remaining from the other elements: ways) or excludes (choose all from the other : ways). These cases are disjoint and exhaustive.
Binomial theorem (induction)
Base case : .
Inductive step: . Distributing and collecting the coefficient of : by Pascal's rule.
Example: The Birthday Problem
In a group of people, what is the probability that at least two share a birthday? Assume 365 equally likely birthdays and that birthdays are independent.
Define the complementary event
Let = "at least two people share a birthday." It is easier to compute = "all birthdays are distinct."
Count outcomes
The sample space of birthday assignments (ordered, with replacement) has outcomes.
The event (all distinct, ordered, without replacement): .
Compute the probability
$
Key threshold
For : . With only 23 people, it is already more likely than not that two share a birthday. For : . This counterintuitive result β the birthday paradox β arises because the number of pairs grows as , which reaches 365 far sooner than itself does.
The birthday paradox has direct applications in hashing and collision analysis for error-correcting codes and MAC protocols.
Birthday Problem: Collision Probability
Plot the probability that at least two people in a group of share a birthday, as a function of . Also compare with the Poisson approximation.
Parameters
Definition: Multinomial Coefficient
Multinomial Coefficient
Let be non-negative integers with . The multinomial coefficient counts the number of ways to partition a set of distinct objects into ordered groups of sizes .
The multinomial theorem generalizes the binomial theorem:
The multinomial coefficient is fundamental in coding theory and information theory. The number of binary strings of length with exactly ones is (the case). The number of sequences of symbols from an alphabet of size with prescribed type is the multinomial coefficient β and bounding these counts is the starting point for the method of types in information theory (Book ITA, Chapter 3).
Example: Combinatorics in Binary Codes
(a) How many binary strings of length have Hamming weight exactly ? (b) How many binary strings of length have Hamming weight at most ? (c) In a random codebook of codewords, each chosen uniformly from , what is the expected number of codeword pairs at Hamming distance exactly ?
(a) Weight-$k$ binary strings
The positions of the ones can be chosen in ways. Each choice gives a distinct binary string of weight . Answer: .
(b) Hamming ball of radius $t$
Strings of weight at most form the Hamming ball of radius around the all-zeros codeword. Their count is: This quantity appears in the Hamming (sphere-packing) bound for error-correcting codes: a code with minimum distance can correct all -error patterns, and the Hamming balls around codewords must be disjoint.
(c) Expected number of colliding pairs
Fix two codewords . Since each is chosen uniformly from , the probability that is (there are strings at distance from ). There are pairs, so by linearity of expectation: For the pairs to be rare (good codebook), we need this expectation to be small β which requires .
Historical Note: The Birthday Paradox in Cryptography and Hashing
1939βpresentThe birthday problem was apparently first solved by von Mises in 1939, though Feller popularized it in his 1950 classic. The "paradox" is psychological: humans drastically underestimate the number of pairs in a group. With people, there are pairs β and any single pair has a 1/365 chance of a collision. The expected number of colliding pairs is approximately , which equals 1 when .
In cryptography, the birthday attack exploits exactly this calculation: a hash function with -bit outputs has a collision probability of 50% after approximately evaluations, not . This is why SHA-256 (256-bit output) is designed to resist attacks up to operations, not .
Birthday (Collision) Attack
Complexity: expected, by the birthday boundThe birthday bound is tight: by the Poisson approximation to the birthday problem, after evaluations, a collision exists with probability .
Common Mistake: Ordered vs. Unordered: The Most Common Counting Mistake
Mistake:
In a lottery where 6 numbers are drawn without replacement from , a student counts the total number of outcomes as . This gives a much lower probability of winning than the correct answer.
Correction:
The lottery outcome is an unordered sample without replacement: the winner need not match the order of drawing, only the set of numbers. The correct count is . The ordered count overcounts each winning outcome by . The two counts differ by a factor of 720.
Rule: If the problem asks for a set of outcomes (the drawn numbers, a hand of cards, a subset of antennas), use combinations. If the problem asks for a sequence (the order of names drawn, a ranked list), use permutations.
Monty Hall Simulation
Simulate the Monty Hall problem. A car is hidden behind one of three doors; the host opens a losing door. Compare the win probability of the 'stay' vs 'switch' strategies across many simulations.
Parameters
Quick Check
A wireless base station has 8 antennas and must select 3 to activate for a transmission. How many distinct antenna subsets can be chosen?
An antenna subset is an unordered selection without replacement: .
Combination
An unordered selection of elements from a set of , without replacement. Count: .
Related: Permutation, Binomial Coefficient
Permutation
An ordered selection of elements from a set of , without replacement. Count: . For : ordered arrangements of all elements.
Related: Combination