Ferkans — Interactive Telecom Tutor

Why Markov Chains?

Many systems we encounter in engineering evolve randomly over time, but with a crucial simplification: the future depends on the past only through the present state. A packet retransmission protocol remembers only whether the last packet was acknowledged or lost; a wireless channel model tracks only the current fading state. This "memoryless" property — the Markov property — transforms the intractable problem of tracking full histories into an elegant matrix algebra.

Discrete-time Markov chains (DTMCs) are the simplest and most widely-used class of Markov models. They appear throughout communications: channel models, queueing, random access protocols, iterative decoding, and network analysis. The theory we develop in this chapter is the foundation for all of these.

Definition:
Markov Property and Discrete-Time Markov Chain

Let $\{X_n : n \geq 0\}$ be a sequence of random variables taking values in a countable set $\mathcal{S}$ (the state space). The process is a discrete-time Markov chain (DTMC) if it satisfies the Markov property: for all $n \geq 0$ and all states $i_0, i_1, \ldots, i_{n-1}, i, j \in \mathcal{S}$ ,

$\mathbb{P}(X_{n+1} = j \mid X_n = i, X_{n-1} = i_{n-1}, \ldots, X_0 = i_0) = \mathbb{P}(X_{n+1} = j \mid X_n = i).$

The chain is time-homogeneous if the right-hand side does not depend on $n$ :

$p_{ij} \triangleq \mathbb{P}(X_{n+1} = j \mid X_n = i), \quad \text{for all } n \geq 0.$

Unless stated otherwise, all Markov chains in this chapter are time-homogeneous.

The Markov property says: given the present, the future is independent of the past. This is sometimes called the "memoryless" property, though it should not be confused with the memoryless property of the exponential or geometric distributions.

,

Markov property

The property that the conditional distribution of $X_{n+1}$ given the entire past $X_0, \ldots, X_n$ depends only on $X_n$ . Formally: $\mathbb{P}(X_{n+1} = j \mid X_n = i, X_{n-1}, \ldots, X_0) = \mathbb{P}(X_{n+1} = j \mid X_n = i)$ .

Historical Note: Andrey Markov and Chains of Linked Events

Early 20th century

Andrey Andreyevich Markov (1856--1922) introduced dependent sequences in 1906 to prove that the law of large numbers does not require independence. His original application was a statistical analysis of the alternation of vowels and consonants in Pushkin's poem Eugene Onegin. This seemingly literary exercise established that convergence theorems extend far beyond i.i.d. sequences, opening the door to the modern theory of stochastic processes. The term "Markov chain" was coined by Sergei Bernstein in the 1920s.

Definition:
Transition Matrix

The transition matrix of a time-homogeneous DTMC on $\mathcal{S} = \{1, 2, \ldots, M\}$ (or a countable set) is the matrix $\mathbf{P}$ whose $(i,j)$ -entry is

$[\mathbf{P}]_{ij} = p_{ij} = \mathbb{P}(X_{n+1} = j \mid X_n = i).$

The matrix $\mathbf{P}$ is a stochastic matrix: every entry is non-negative and each row sums to one:

$p_{ij} \geq 0 \quad \text{and} \quad \sum_{j \in \mathcal{S}} p_{ij} = 1, \quad \text{for all } i \in \mathcal{S}.$

Equivalently, $\mathbf{P} \mathbf{1} = \mathbf{1}$ , where $\mathbf{1}$ is the all-ones column vector.

We adopt the convention that rows of $\mathbf{P}$ correspond to the current state and columns to the next state. Some references transpose this convention; be careful when consulting different sources.

,

Stochastic matrix

A square matrix with non-negative entries whose rows each sum to one. Also called a row-stochastic matrix. If both rows and columns sum to one, it is doubly stochastic.

Related: Transition Matrix

Example: Two-State Markov Chain (Sunny/Rainy Weather)

A simplified weather model has two states: $\mathcal{S} = \{S, R\}$ (Sunny, Rainy). On a sunny day, the probability of rain tomorrow is $\alpha = 0.3$ . On a rainy day, the probability of sun tomorrow is $\beta = 0.5$ . Find the transition matrix and compute $\mathbb{P}(X_2 = S \mid X_0 = R)$ .

Solution

Identify the transition probabilities

From the problem statement: $p_{SS} = 1 - \alpha = 0.7$ , $p_{SR} = \alpha = 0.3$ , $p_{RS} = \beta = 0.5$ , $p_{RR} = 1 - \beta = 0.5$ .

Write the transition matrix

$\mathbf{P} = \begin{pmatrix} 0.7 & 0.3 \\ 0.5 & 0.5 \end{pmatrix}.$ $The rows sum to 1:$ 0.7 + 0.3 = 1 $and$ 0.5 + 0.5 = 1$. This is indeed a stochastic matrix.

Compute the two-step probability

$\mathbb{P}(X_2 = S \mid X_0 = R) = [\mathbf{P}^{2}]_{RS} = p_{RS} p_{SS} + p_{RR} p_{RS} = (0.5)(0.7) + (0.5)(0.5) = 0.35 + 0.25 = 0.60.$ $

Transition Diagrams

A DTMC is often visualized as a directed graph (transition diagram): each state is a node, and a directed edge from $i$ to $j$ with label $p_{ij}$ is drawn whenever $p_{ij} > 0$ . Self-loops represent $p_{ii} > 0$ . The transition diagram provides immediate insight into the structure: which states can reach which, whether the chain has absorbing states, and whether it decomposes into independent sub-chains.

Theorem: Chapman-Kolmogorov Equation

Let $p_{ij}^{(n)} = \mathbb{P}(X_n = j \mid X_0 = i)$ denote the $n$ -step transition probability. Then for all $m, n \geq 0$ and all states $i, j \in \mathcal{S}$ :

$p_{ij}^{(m+n)} = \sum_{k \in \mathcal{S}} p_{ik}^{(m)} \, p_{kj}^{(n)}.$

In matrix form: $\mathbf{P}^{m+n} = \mathbf{P}^{m} \mathbf{P}^{n}$ . In particular, $\mathbf{P}^{n} = \mathbf{P}^{n}$ (the ordinary $n$ -th matrix power).

To go from $i$ to $j$ in $m + n$ steps, the chain must be at some intermediate state $k$ after $m$ steps. The Chapman-Kolmogorov equation sums over all possible intermediaries, weighting by the probability of each two-leg journey.

Proof

Condition on the intermediate state

By the law of total probability and the Markov property: $p_{ij}^{(m+n)} = \mathbb{P}(X_{m+n} = j \mid X_0 = i) = \sum_{k \in \mathcal{S}} \mathbb{P}(X_{m+n} = j \mid X_m = k, X_0 = i)\,\mathbb{P}(X_m = k \mid X_0 = i).$

Apply the Markov property

Since the chain is Markov, $\mathbb{P}(X_{m+n} = j \mid X_m = k, X_0 = i) = \mathbb{P}(X_{m+n} = j \mid X_m = k) = p_{kj}^{(n)}$ by time-homogeneity. Thus: $p_{ij}^{(m+n)} = \sum_{k \in \mathcal{S}} p_{ik}^{(m)} \, p_{kj}^{(n)}.$

Matrix interpretation

The summation $\sum_k p_{ik}^{(m)} p_{kj}^{(n)}$ is precisely the $(i,j)$ -entry of the matrix product $\mathbf{P}^{m} \mathbf{P}^{n}$ . Hence $\mathbf{P}^{m+n} = \mathbf{P}^{m} \mathbf{P}^{n}$ , and by induction $\mathbf{P}^{n}$ is the $n$ -th power of $\mathbf{P}$ . $\blacksquare$

,

Example: Three-State Chain: Computing $\mathbf{P}^{2}$

Consider a DTMC on $\mathcal{S} = \{1, 2, 3\}$ with transition matrix

$\mathbf{P} = \begin{pmatrix} 0 & 1 & 0 \\ 0.5 & 0 & 0.5 \\ 0 & 1 & 0 \end{pmatrix}.$

Compute $\mathbf{P}^{2}$ and find the probability of returning to state 2 in exactly 2 steps.

Solution

Matrix multiplication

$\mathbf{P}^{2} = \mathbf{P} \cdot \mathbf{P} = \begin{pmatrix} 0.5 & 0 & 0.5 \\ 0 & 1 & 0 \\ 0.5 & 0 & 0.5 \end{pmatrix}.$ $

Read off the answer

$p_{22}^{(2)} = [\mathbf{P}^{2}]_{22} = 1$ . State 2 returns to itself with certainty in 2 steps. This occurs because from state 2 the chain goes to state 1 or 3 with equal probability, and both states 1 and 3 return to state 2 with probability 1.

Verify row sums

Each row of $\mathbf{P}^{2}$ sums to 1, confirming that $\mathbf{P}^{2}$ is also stochastic. This must hold: the product of stochastic matrices is stochastic.

Transition Matrix Powers: Visualizing Convergence

Enter a $3 \times 3$ transition matrix and observe how $\mathbf{P}^{n}$ evolves as $n$ increases. For irreducible aperiodic chains, each row converges to the stationary distribution $\boldsymbol{\pi}$ .

Parameters

Number of steps n20

Compute P^1 through P^n

Quick Check

Which of the following is a valid (row-)stochastic matrix?

$\begin{pmatrix} 0.5 & 0.5 \\ 0.3 & 0.8 \end{pmatrix}$

$\begin{pmatrix} 0.5 & 0.5 \\ 0.3 & 0.7 \end{pmatrix}$

$\begin{pmatrix} 0.6 & -0.1 & 0.5 \\ 0.2 & 0.3 & 0.5 \end{pmatrix}$

$\begin{pmatrix} 1 & 0 \\ 0.5 & 0.5 \end{pmatrix}$

Correction:

\begin{pmatrix} 0.5 & 0.5 \\ 0.3 & 0.7 \end{pmatrix}

,

\begin{pmatrix} 1 & 0 \\ 0.5 & 0.5 \end{pmatrix}

Both rows sum to 1 and all entries are non-negative. This is a valid stochastic matrix.

Common Mistake: Row vs Column Convention

Mistake:

Confusing row-stochastic matrices (our convention, where rows sum to 1) with column-stochastic matrices (where columns sum to 1). Some references, especially in applied mathematics and PageRank, use the column convention.

Correction:

Always check which convention a source uses. In our convention, the state distribution is a row vector $\boldsymbol{\pi}$ satisfying $\boldsymbol{\pi} \mathbf{P} = \boldsymbol{\pi}$ (left eigenvector). In the column convention it would be $\mathbf{P} \boldsymbol{\pi} = \boldsymbol{\pi}$ (right eigenvector).

Key Takeaway

A discrete-time Markov chain is fully characterized by its transition matrix $\mathbf{P}$ and initial distribution. The Chapman-Kolmogorov equation shows that $n$ -step transition probabilities are given by $\mathbf{P}^{n}$ , reducing multi-step analysis to matrix powers.

Definition and Transition Probabilities

Why Markov Chains?

Definition: Markov Property and Discrete-Time Markov Chain

Markov property

Historical Note: Andrey Markov and Chains of Linked Events

Definition: Transition Matrix

Stochastic matrix

Example: Two-State Markov Chain (Sunny/Rainy Weather)

Identify the transition probabilities

Write the transition matrix

Compute the two-step probability

Transition Diagrams

Theorem: Chapman-Kolmogorov Equation

Condition on the intermediate state

Apply the Markov property

Matrix interpretation

Example: Three-State Chain: Computing P2\mathbf{P}^{2}P2

Matrix multiplication

Read off the answer

Verify row sums

Transition Matrix Powers: Visualizing Convergence

Parameters

Quick Check

Common Mistake: Row vs Column Convention

Key Takeaway

Definition:
Markov Property and Discrete-Time Markov Chain

Definition:
Transition Matrix

Example: Three-State Chain: Computing $\mathbf{P}^{2}$