Conditional Independence
When Independence Emerges from Conditioning
In complex systems, events that are correlated marginally (without any conditioning) may become independent once we condition on some intermediate variable. Conversely, events that are marginally independent may become dependent once we condition on a common effect. Conditional independence is the language for expressing these relationships precisely.
Graphical models β Bayesian networks, factor graphs, Markov random fields β are nothing but compact representations of conditional independence structures. The belief propagation algorithm passes messages along graph edges by repeatedly applying Bayes' theorem to locally conditionally independent components. Understanding conditional independence is a prerequisite for everything in probabilistic inference.
Definition: Conditional Independence
Conditional Independence
Events and are conditionally independent given , written , if and Equivalently (when ): i.e., given , knowledge of adds no further information about .
Conditional independence given does NOT imply marginal independence, and marginal independence does NOT imply conditional independence. The two notions are logically unrelated and can hold in either combination.
Example: Conditionally Independent but Marginally Dependent
A student passes an exam () or fails, and the exam difficulty is high () or low (). Suppose:
- (hard exam: student more likely fails)
- (easy exam: student more likely passes)
- = "neighbour passes the same exam", with the same conditional probabilities as and conditionally independent of given .
Show that and are marginally dependent (correlated) but conditionally independent given .
Marginal probability of passing
By total probability: . Similarly .
Joint marginal probability
By total probability over : But . So and are marginally dependent.
Conditional independence
Given , by assumption . So .
Lesson
The exam difficulty is a common cause of both and . Observing both pass together makes it more likely the exam was easy β hence the correlation. But once we know the exam difficulty, and provide no further information about each other.
Example: Marginally Independent but Conditionally Dependent
Toss two fair coins independently: let = "first coin heads" and = "second coin heads". They are independent. Now condition on = "exactly one head". Show that and become dependent given .
Marginal independence
. β
Conditional probabilities
, . . by symmetry. . But . So and are conditionally dependent given .
Lesson
is a common effect (collider): observing "exactly one head" creates dependence between the two coins. Conditioning on a common effect induces correlation between otherwise independent causes. This "explaining-away" phenomenon is fundamental in Bayesian networks.
Definition: Markov Chain (Three Events)
Markov Chain (Three Events)
Three events , , with positive probability form a Markov chain (or ) if Equivalently, and are conditionally independent given : i.e., given , knowledge of provides no additional information about .
The Markov chain notation does NOT imply a temporal or causal ordering β it is a statement about conditional independence. Importantly, is equivalent to : the chain is symmetric in and given .
Theorem: Markov Chain Implies Conditional Independence
If is a Markov chain, then:
- and are conditionally independent given : .
- The chain is symmetric: .
Prove $A \perp C \mid B$
Assume . By the Markov condition: Dividing by : This is the definition of .
Symmetry
The factorisation can equally be written as (since both equal and is symmetric). Hence , which gives , i.e., .
Why This Matters: Markov Chains and Factor Graphs
In channel decoding, the code graph imposes a Markov structure on the encoded bits. For a convolutional code, consecutive encoder states form a Markov chain: . The Viterbi algorithm exploits this structure β it propagates messages along the trellis using exactly the chain rule of conditional probability.
More generally, factor graphs (used in LDPC and turbo code decoding) are graphical representations of conditional independence structures. The belief propagation algorithm on a tree-structured factor graph is an efficient algorithm for computing marginal posteriors, and its correctness follows directly from the Markov properties encoded in the graph.
Quick Check
is a Markov chain. Which of the following is guaranteed to hold?
and are marginally independent
This is the definition of the Markov property: given , event is independent of . Equivalently .
Historical Note: Andrei Markov and the Chain That Bears His Name
1906Andrei Andreyevich Markov (1856β1922) introduced what we now call Markov chains in 1906 as part of a dispute with Pavel Nekrasov. Nekrasov had claimed that the law of large numbers required independence. Markov constructed a counterexample: a dependent sequence (alternating letters from Pushkin's poem "Eugene Onegin") for which the law of large numbers still held. The chain Markov used was exactly what bears his name: a sequence where each element depends only on the immediately preceding one.
Today, Markov chains are the foundation of stochastic processes (Chapters 13β15), Monte Carlo methods, hidden Markov models in speech recognition, and the PageRank algorithm. The concept of the Markov property β "the future is independent of the past given the present" β is arguably the single most important simplifying assumption in applied probability.
Key Takeaway
Conditional independence is the language of graphical models. means: once you know , and carry no mutual information. Conditioning on a common cause removes correlation between its effects; conditioning on a common effect (collider) creates correlation between otherwise independent causes. The Markov chain is the simplest conditional independence structure: and are screened off by .