Ferkans — Interactive Telecom Tutor

When Independence Emerges from Conditioning

In complex systems, events that are correlated marginally (without any conditioning) may become independent once we condition on some intermediate variable. Conversely, events that are marginally independent may become dependent once we condition on a common effect. Conditional independence is the language for expressing these relationships precisely.

Graphical models — Bayesian networks, factor graphs, Markov random fields — are nothing but compact representations of conditional independence structures. The belief propagation algorithm passes messages along graph edges by repeatedly applying Bayes' theorem to locally conditionally independent components. Understanding conditional independence is a prerequisite for everything in probabilistic inference.

Definition:
Conditional Independence

Events $A$ and $B$ are conditionally independent given $C$ , written $A \perp B \mid C$ , if $\mathbb{P}(C) > 0$ and $\mathbb{P}(A \cap B \mid C) = \mathbb{P}(A \mid C)\,\mathbb{P}(B \mid C).$ Equivalently (when $\mathbb{P}(B \cap C) > 0$ ): $\mathbb{P}(A \mid B \cap C) = \mathbb{P}(A \mid C),$ i.e., given $C$ , knowledge of $B$ adds no further information about $A$ .

Conditional independence given $C$ does NOT imply marginal independence, and marginal independence does NOT imply conditional independence. The two notions are logically unrelated and can hold in either combination.

Example: Conditionally Independent but Marginally Dependent

A student passes an exam ( $A$ ) or fails, and the exam difficulty is high ( $C = 1$ ) or low ( $C = 0$ ). Suppose:

$\mathbb{P}(C=1) = 0.5$
$\mathbb{P}(A=1 \mid C=1) = 0.3$ (hard exam: student more likely fails)
$\mathbb{P}(A=1 \mid C=0) = 0.8$ (easy exam: student more likely passes)
$B$ = "neighbour passes the same exam", with the same conditional probabilities as $A$ and conditionally independent of $A$ given $C$ .

Show that $A$ and $B$ are marginally dependent (correlated) but conditionally independent given $C$ .

Solution

Marginal probability of passing

By total probability: $\mathbb{P}(A=1) = 0.3 \times 0.5 + 0.8 \times 0.5 = 0.55$ . Similarly $\mathbb{P}(B=1) = 0.55$ .

Joint marginal probability

By total probability over $C$ : $\mathbb{P}(A=1,B=1) = \sum_{c \in \{0,1\}} \mathbb{P}(A=1 \mid C=c)\, \mathbb{P}(B=1 \mid C=c)\,\mathbb{P}(C=c)$ $= 0.3^2 \times 0.5 + 0.8^2 \times 0.5 = 0.045 + 0.32 = 0.365.$ But $\mathbb{P}(A=1)\mathbb{P}(B=1) = 0.55^2 = 0.3025 \neq 0.365$ . So $A$ and $B$ are marginally dependent.

Conditional independence

Given $C = c$ , by assumption $\mathbb{P}(A, B \mid C=c) = \mathbb{P}(A \mid C=c)\,\mathbb{P}(B \mid C=c)$ . So $A \perp B \mid C$ .

Lesson

The exam difficulty $C$ is a common cause of both $A$ and $B$ . Observing both pass together makes it more likely the exam was easy — hence the correlation. But once we know the exam difficulty, $A$ and $B$ provide no further information about each other.

Example: Marginally Independent but Conditionally Dependent

Toss two fair coins independently: let $A$ = "first coin heads" and $B$ = "second coin heads". They are independent. Now condition on $C$ = "exactly one head". Show that $A$ and $B$ become dependent given $C$ .

Solution

Marginal independence

$\mathbb{P}(A \cap B) = 1/4 = (1/2)(1/2) = \mathbb{P}(A)\mathbb{P}(B)$ . ✓

Conditional probabilities

$C = \{HT, TH\}$ , $\mathbb{P}(C) = 1/2$ . $\mathbb{P}(A \mid C) = \mathbb{P}(\{HT\})/\mathbb{P}(C) = (1/4)/(1/2) = 1/2$ . $\mathbb{P}(B \mid C) = 1/2$ by symmetry. $\mathbb{P}(A \cap B \mid C) = \mathbb{P}(\{HH\} \cap C)/\mathbb{P}(C) = 0$ . But $\mathbb{P}(A \mid C)\,\mathbb{P}(B \mid C) = 1/4 \neq 0$ . So $A$ and $B$ are conditionally dependent given $C$ .

Lesson

$C$ is a common effect (collider): observing "exactly one head" creates dependence between the two coins. Conditioning on a common effect induces correlation between otherwise independent causes. This "explaining-away" phenomenon is fundamental in Bayesian networks.

Definition:
Markov Chain (Three Events)

Three events $A$ , $B$ , $C$ with positive probability form a Markov chain $A \multimap B \multimap C$ (or $A \to B \to C$ ) if $\mathbb{P}(A \cap B \cap C) = \mathbb{P}(A)\,\mathbb{P}(B \mid A)\,\mathbb{P}(C \mid B).$ Equivalently, $A$ and $C$ are conditionally independent given $B$ : $\mathbb{P}(C \mid A \cap B) = \mathbb{P}(C \mid B),$ i.e., given $B$ , knowledge of $A$ provides no additional information about $C$ .

The Markov chain notation $A \multimap B \multimap C$ does NOT imply a temporal or causal ordering — it is a statement about conditional independence. Importantly, $A \multimap B \multimap C$ is equivalent to $C \multimap B \multimap A$ : the chain is symmetric in $A$ and $C$ given $B$ .

Theorem: Markov Chain Implies Conditional Independence

If $A \multimap B \multimap C$ is a Markov chain, then:

$A$ and $C$ are conditionally independent given $B$ : $A \perp C \mid B$ .
The chain is symmetric: $C \multimap B \multimap A$ .

Proof

Prove $A \perp C \mid B$

Assume $\mathbb{P}(A \cap B) > 0$ . By the Markov condition: $\mathbb{P}(A \cap B \cap C) = \mathbb{P}(A)\mathbb{P}(B \mid A)\mathbb{P}(C \mid B) = \mathbb{P}(A \cap B)\,\mathbb{P}(C \mid B).$ Dividing by $\mathbb{P}(B) > 0$ : $\mathbb{P}(A \cap C \mid B) = \frac{\mathbb{P}(A \cap B \cap C)}{\mathbb{P}(B)} = \frac{\mathbb{P}(A \cap B)}{\mathbb{P}(B)}\cdot\mathbb{P}(C \mid B) = \mathbb{P}(A \mid B)\,\mathbb{P}(C \mid B).$ This is the definition of $A \perp C \mid B$ .

Symmetry

The factorisation $\mathbb{P}(A \cap B \cap C) = \mathbb{P}(A \cap B)\,\mathbb{P}(C \mid B)$ can equally be written as $\mathbb{P}(B \cap C)\,\mathbb{P}(A \mid B)$ (since both equal $\mathbb{P}(A \cap B \cap C)$ and $\mathbb{P}(\cdot \mid B)$ is symmetric). Hence $\mathbb{P}(C \cap A \mid B) = \mathbb{P}(C \mid B)\mathbb{P}(A \mid B)$ , which gives $C \perp A \mid B$ , i.e., $C \multimap B \multimap A$ . $\blacksquare$

Why This Matters: Markov Chains and Factor Graphs

In channel decoding, the code graph imposes a Markov structure on the encoded bits. For a convolutional code, consecutive encoder states form a Markov chain: $S_0 \multimap S_1 \multimap \cdots \multimap S_n$ . The Viterbi algorithm exploits this structure — it propagates messages along the trellis using exactly the chain rule of conditional probability.

More generally, factor graphs (used in LDPC and turbo code decoding) are graphical representations of conditional independence structures. The belief propagation algorithm on a tree-structured factor graph is an efficient algorithm for computing marginal posteriors, and its correctness follows directly from the Markov properties encoded in the graph.

Quick Check

$A \multimap B \multimap C$ is a Markov chain. Which of the following is guaranteed to hold?

$\mathbb{P}(A \cap B \cap C) = \mathbb{P}(A)\mathbb{P}(B)\mathbb{P}(C)$

$\mathbb{P}(A \mid B, C) = \mathbb{P}(A \mid B)$

$A$ and $C$ are marginally independent

$\mathbb{P}(C \mid A, B) = \mathbb{P}(C \mid A)$

Correction:

\mathbb{P}(A \mid B, C) = \mathbb{P}(A \mid B)

This is the definition of the Markov property: given $B$ , event $A$ is independent of $C$ . Equivalently $\mathbb{P}(A \mid B, C) = \mathbb{P}(A \mid B)$ .

Historical Note: Andrei Markov and the Chain That Bears His Name

1906

Andrei Andreyevich Markov (1856–1922) introduced what we now call Markov chains in 1906 as part of a dispute with Pavel Nekrasov. Nekrasov had claimed that the law of large numbers required independence. Markov constructed a counterexample: a dependent sequence (alternating letters from Pushkin's poem "Eugene Onegin") for which the law of large numbers still held. The chain Markov used was exactly what bears his name: a sequence where each element depends only on the immediately preceding one.

Today, Markov chains are the foundation of stochastic processes (Chapters 13–15), Monte Carlo methods, hidden Markov models in speech recognition, and the PageRank algorithm. The concept of the Markov property — "the future is independent of the past given the present" — is arguably the single most important simplifying assumption in applied probability.

Key Takeaway

Conditional independence is the language of graphical models. $A \perp B \mid C$ means: once you know $C$ , $A$ and $B$ carry no mutual information. Conditioning on a common cause removes correlation between its effects; conditioning on a common effect (collider) creates correlation between otherwise independent causes. The Markov chain $A \multimap B \multimap C$ is the simplest conditional independence structure: $A$ and $C$ are screened off by $B$ .

Conditional Independence

When Independence Emerges from Conditioning

Definition: Conditional Independence

Example: Conditionally Independent but Marginally Dependent

Marginal probability of passing

Joint marginal probability

Conditional independence

Lesson

Example: Marginally Independent but Conditionally Dependent

Marginal independence

Conditional probabilities

Lesson

Definition: Markov Chain (Three Events)

Theorem: Markov Chain Implies Conditional Independence

Prove $A \perp C \mid B$

Symmetry

Why This Matters: Markov Chains and Factor Graphs

Quick Check

Historical Note: Andrei Markov and the Chain That Bears His Name

Key Takeaway

Definition:
Conditional Independence

Definition:
Markov Chain (Three Events)