Independence

Why Independence Matters

Independence is the structural assumption that makes most of information theory work. The i.i.d. (independent and identically distributed) model for sources and channels allows us to factor joint probabilities, write channel capacities as single-letter expressions, and prove coding theorems via the law of large numbers. When independence fails β€” correlated fading, bursty interference, memory in the channel β€” the analysis becomes markedly harder and often requires the Markov and mixing tools developed in Chapter 13.

Independence is also the most commonly over-assumed property in engineering. Verifying that a model truly has independent components, rather than merely treating them as independent for mathematical convenience, is a critical modelling skill.

Definition:

Independence of Events

A collection of events {Ai:i∈I}\{A_i : i \in I\} is mutually independent (or simply independent) if for every finite subset JβŠ†IJ \subseteq I: P ⁣(β‹‚i∈JAi)=∏i∈JP(Ai).\mathbb{P}\!\left(\bigcap_{i \in J} A_i\right) = \prod_{i \in J} \mathbb{P}(A_i). Two events AA and BB are independent if P(A∩B)=P(A) P(B)\mathbb{P}(A \cap B) = \mathbb{P}(A)\,\mathbb{P}(B).

The collection is pairwise independent if every pair satisfies P(Ai∩Aj)=P(Ai)P(Aj)\mathbb{P}(A_i \cap A_j) = \mathbb{P}(A_i)\mathbb{P}(A_j) but the higher-order product conditions are not required. Mutual independence implies pairwise independence, but not conversely.

,

Theorem: Equivalent Characterization of Independence

When P(B)>0\mathbb{P}(B) > 0, two events AA and BB are independent if and only if P(A∣B)=P(A).\mathbb{P}(A \mid B) = \mathbb{P}(A). That is, knowing BB occurred provides no information about AA.

Theorem: Independence Is Preserved Under Complementation

If AA and BB are independent, then so are AcA^c and BB, AA and BcB^c, and AcA^c and BcB^c.

Example: Pairwise Independence Does Not Imply Mutual Independence

Toss two fair coins. Let A={firstΒ coinΒ heads}A = \{\text{first coin heads}\}, B={secondΒ coinΒ heads}B = \{\text{second coin heads}\}, C={bothΒ coinsΒ showΒ theΒ sameΒ face}C = \{\text{both coins show the same face}\}. Show that AA, BB, CC are pairwise independent but not mutually independent.

Common Mistake: Pairwise Independence Is NOT Mutual Independence

Mistake:

In simulations and modelling, it is tempting to verify independence only for pairs of events and conclude that all events in the collection are independent. The example above (three two-coin events) shows this is false with three events; analogous constructions exist for any number of events.

Correction:

Mutual independence requires the product rule to hold for every finite subset, not just pairs. For nn events, there are 2nβˆ’nβˆ’12^n - n - 1 conditions beyond the (n2)\binom{n}{2} pairwise conditions. All must be checked.

Independence: Pairwise vs. Mutual

PropertyPairwise IndependentMutually Independent
DefinitionP(Ai∩Aj)=P(Ai)P(Aj)\mathbb{P}(A_i \cap A_j) = \mathbb{P}(A_i)\mathbb{P}(A_j) for all iβ‰ ji \neq jP(β‹‚i∈JAi)=∏i∈JP(Ai)\mathbb{P}(\bigcap_{i \in J} A_i) = \prod_{i \in J}\mathbb{P}(A_i) for all finite JJ
ImplicationsDoes NOT imply mutual independenceImplies pairwise independence
Number of conditions(n2)\binom{n}{2} equations2nβˆ’nβˆ’12^n - n - 1 equations (all subsets of size β‰₯2\geq 2)
Used in practiceWeaker, easier to verifyRequired for most probabilistic analysis
Example counterexampleTwo fair coins + same-face eventN/A (no gap in reverse direction)

Independence Checker: P(A∩B)\mathbb{P}(A \cap B) vs. P(A)P(B)\mathbb{P}(A)\mathbb{P}(B)

Set the probabilities of three events AA, BB, CC defined on a two-coin experiment and verify whether each pair satisfies the product rule for independence.

Parameters
0.5
0.5
0.25
⚠️Engineering Note

The i.i.d. Assumption in Shannon Theory

Shannon's channel coding theorem assumes the channel is memoryless: consecutive uses of the channel are statistically independent. Under this assumption the capacity per channel use is a single-letter expression C=max⁑pXI(X;Y)C = \max_{p_X} I(X;Y). The i.i.d. source coding theorem similarly requires independent symbols. These independence assumptions are the reason capacity results look so clean.

In practice, wireless channels are NOT memoryless: multipath creates frequency-selective fading (correlated across subcarriers) and Doppler creates time-selective fading (correlated across symbols). Engineers work around this via interleaving (reordering symbols to break correlation before decoding) and OFDM (converting a frequency-selective channel into many parallel flat-fading sub-channels, each approximately memoryless).

Practical Constraints
  • β€’

    LTE/5G NR use OFDM with cyclic prefix to create approximately i.i.d. sub-channel model

  • β€’

    Interleaver depth must exceed the coherence time to achieve near-independence

  • β€’

    When coherence bandwidth β‰ͺ\ll channel bandwidth, frequency diversity approaches the i.i.d. bound

Independent Events

Events {Ai}\{A_i\} are mutually independent if P(β‹‚i∈JAi)=∏i∈JP(Ai)\mathbb{P}(\bigcap_{i \in J} A_i) = \prod_{i \in J}\mathbb{P}(A_i) for every finite subset JJ. Intuitively, knowledge of any subset of events provides no information about the remaining events.

Related: Pairwise Independence, Conditional Independence, Discrete-Time i.i.d. Gaussian Noise

Quick Check

Two events AA and BB both have positive probability and are disjoint (A∩B=βˆ…A \cap B = \emptyset). Are they independent?

Yes, because they have no overlap.

No, because P(A∩B)=0β‰ P(A)P(B)\mathbb{P}(A \cap B) = 0 \neq \mathbb{P}(A)\mathbb{P}(B).

Only if P(A)=P(B)\mathbb{P}(A) = \mathbb{P}(B).

Impossible to determine without more information.

Key Takeaway

Independence means no information flows between events. AβŠ₯BA \perp B iff P(A∣B)=P(A)\mathbb{P}(A \mid B) = \mathbb{P}(A): observing BB leaves the probability of AA unchanged. Disjointness is the opposite extreme β€” the most dependent possible relationship. Mutual independence is strictly stronger than pairwise independence and requires 2nβˆ’nβˆ’12^n - n - 1 conditions for nn events. In information theory, independence is the assumption that makes entropy additive: H(X1,…,Xn)=βˆ‘iH(Xi)H(X_1, \ldots, X_n) = \sum_i H(X_i).