Definition and Transition Probabilities

Why Markov Chains?

Many systems we encounter in engineering evolve randomly over time, but with a crucial simplification: the future depends on the past only through the present state. A packet retransmission protocol remembers only whether the last packet was acknowledged or lost; a wireless channel model tracks only the current fading state. This "memoryless" property β€” the Markov property β€” transforms the intractable problem of tracking full histories into an elegant matrix algebra.

Discrete-time Markov chains (DTMCs) are the simplest and most widely-used class of Markov models. They appear throughout communications: channel models, queueing, random access protocols, iterative decoding, and network analysis. The theory we develop in this chapter is the foundation for all of these.

Definition:

Markov Property and Discrete-Time Markov Chain

Let {Xn:nβ‰₯0}\{X_n : n \geq 0\} be a sequence of random variables taking values in a countable set S\mathcal{S} (the state space). The process is a discrete-time Markov chain (DTMC) if it satisfies the Markov property: for all nβ‰₯0n \geq 0 and all states i0,i1,…,inβˆ’1,i,j∈Si_0, i_1, \ldots, i_{n-1}, i, j \in \mathcal{S},

P(Xn+1=j∣Xn=i,Xnβˆ’1=inβˆ’1,…,X0=i0)=P(Xn+1=j∣Xn=i).\mathbb{P}(X_{n+1} = j \mid X_n = i, X_{n-1} = i_{n-1}, \ldots, X_0 = i_0) = \mathbb{P}(X_{n+1} = j \mid X_n = i).

The chain is time-homogeneous if the right-hand side does not depend on nn:

pijβ‰œP(Xn+1=j∣Xn=i),forΒ allΒ nβ‰₯0.p_{ij} \triangleq \mathbb{P}(X_{n+1} = j \mid X_n = i), \quad \text{for all } n \geq 0.

Unless stated otherwise, all Markov chains in this chapter are time-homogeneous.

The Markov property says: given the present, the future is independent of the past. This is sometimes called the "memoryless" property, though it should not be confused with the memoryless property of the exponential or geometric distributions.

,

Markov property

The property that the conditional distribution of Xn+1X_{n+1} given the entire past X0,…,XnX_0, \ldots, X_n depends only on XnX_n. Formally: P(Xn+1=j∣Xn=i,Xnβˆ’1,…,X0)=P(Xn+1=j∣Xn=i)\mathbb{P}(X_{n+1} = j \mid X_n = i, X_{n-1}, \ldots, X_0) = \mathbb{P}(X_{n+1} = j \mid X_n = i).

Related: Markov Property and Discrete-Time Markov Chain

Historical Note: Andrey Markov and Chains of Linked Events

Early 20th century

Andrey Andreyevich Markov (1856--1922) introduced dependent sequences in 1906 to prove that the law of large numbers does not require independence. His original application was a statistical analysis of the alternation of vowels and consonants in Pushkin's poem Eugene Onegin. This seemingly literary exercise established that convergence theorems extend far beyond i.i.d. sequences, opening the door to the modern theory of stochastic processes. The term "Markov chain" was coined by Sergei Bernstein in the 1920s.

Definition:

Transition Matrix

The transition matrix of a time-homogeneous DTMC on S={1,2,…,M}\mathcal{S} = \{1, 2, \ldots, M\} (or a countable set) is the matrix P\mathbf{P} whose (i,j)(i,j)-entry is

[P]ij=pij=P(Xn+1=j∣Xn=i).[\mathbf{P}]_{ij} = p_{ij} = \mathbb{P}(X_{n+1} = j \mid X_n = i).

The matrix P\mathbf{P} is a stochastic matrix: every entry is non-negative and each row sums to one:

pijβ‰₯0andβˆ‘j∈Spij=1,forΒ allΒ i∈S.p_{ij} \geq 0 \quad \text{and} \quad \sum_{j \in \mathcal{S}} p_{ij} = 1, \quad \text{for all } i \in \mathcal{S}.

Equivalently, P1=1\mathbf{P} \mathbf{1} = \mathbf{1}, where 1\mathbf{1} is the all-ones column vector.

We adopt the convention that rows of P\mathbf{P} correspond to the current state and columns to the next state. Some references transpose this convention; be careful when consulting different sources.

,

Stochastic matrix

A square matrix with non-negative entries whose rows each sum to one. Also called a row-stochastic matrix. If both rows and columns sum to one, it is doubly stochastic.

Related: Transition Matrix

Example: Two-State Markov Chain (Sunny/Rainy Weather)

A simplified weather model has two states: S={S,R}\mathcal{S} = \{S, R\} (Sunny, Rainy). On a sunny day, the probability of rain tomorrow is α=0.3\alpha = 0.3. On a rainy day, the probability of sun tomorrow is β=0.5\beta = 0.5. Find the transition matrix and compute P(X2=S∣X0=R)\mathbb{P}(X_2 = S \mid X_0 = R).

Transition Diagrams

A DTMC is often visualized as a directed graph (transition diagram): each state is a node, and a directed edge from ii to jj with label pijp_{ij} is drawn whenever pij>0p_{ij} > 0. Self-loops represent pii>0p_{ii} > 0. The transition diagram provides immediate insight into the structure: which states can reach which, whether the chain has absorbing states, and whether it decomposes into independent sub-chains.

Theorem: Chapman-Kolmogorov Equation

Let pij(n)=P(Xn=j∣X0=i)p_{ij}^{(n)} = \mathbb{P}(X_n = j \mid X_0 = i) denote the nn-step transition probability. Then for all m,nβ‰₯0m, n \geq 0 and all states i,j∈Si, j \in \mathcal{S}:

pij(m+n)=βˆ‘k∈Spik(m) pkj(n).p_{ij}^{(m+n)} = \sum_{k \in \mathcal{S}} p_{ik}^{(m)} \, p_{kj}^{(n)}.

In matrix form: Pm+n=PmPn\mathbf{P}^{m+n} = \mathbf{P}^{m} \mathbf{P}^{n}. In particular, Pn=Pn\mathbf{P}^{n} = \mathbf{P}^{n} (the ordinary nn-th matrix power).

To go from ii to jj in m+nm + n steps, the chain must be at some intermediate state kk after mm steps. The Chapman-Kolmogorov equation sums over all possible intermediaries, weighting by the probability of each two-leg journey.

,

Example: Three-State Chain: Computing P2\mathbf{P}^{2}

Consider a DTMC on S={1,2,3}\mathcal{S} = \{1, 2, 3\} with transition matrix

P=(0100.500.5010).\mathbf{P} = \begin{pmatrix} 0 & 1 & 0 \\ 0.5 & 0 & 0.5 \\ 0 & 1 & 0 \end{pmatrix}.

Compute P2\mathbf{P}^{2} and find the probability of returning to state 2 in exactly 2 steps.

Transition Matrix Powers: Visualizing Convergence

Enter a 3Γ—33 \times 3 transition matrix and observe how Pn\mathbf{P}^{n} evolves as nn increases. For irreducible aperiodic chains, each row converges to the stationary distribution Ο€\boldsymbol{\pi}.

Parameters
20

Compute P^1 through P^n

Quick Check

Which of the following is a valid (row-)stochastic matrix?

(0.50.50.30.8)\begin{pmatrix} 0.5 & 0.5 \\ 0.3 & 0.8 \end{pmatrix}

(0.50.50.30.7)\begin{pmatrix} 0.5 & 0.5 \\ 0.3 & 0.7 \end{pmatrix}

(0.6βˆ’0.10.50.20.30.5)\begin{pmatrix} 0.6 & -0.1 & 0.5 \\ 0.2 & 0.3 & 0.5 \end{pmatrix}

(100.50.5)\begin{pmatrix} 1 & 0 \\ 0.5 & 0.5 \end{pmatrix}

Common Mistake: Row vs Column Convention

Mistake:

Confusing row-stochastic matrices (our convention, where rows sum to 1) with column-stochastic matrices (where columns sum to 1). Some references, especially in applied mathematics and PageRank, use the column convention.

Correction:

Always check which convention a source uses. In our convention, the state distribution is a row vector Ο€\boldsymbol{\pi} satisfying Ο€P=Ο€\boldsymbol{\pi} \mathbf{P} = \boldsymbol{\pi} (left eigenvector). In the column convention it would be PΟ€=Ο€\mathbf{P} \boldsymbol{\pi} = \boldsymbol{\pi} (right eigenvector).

Key Takeaway

A discrete-time Markov chain is fully characterized by its transition matrix P\mathbf{P} and initial distribution. The Chapman-Kolmogorov equation shows that nn-step transition probabilities are given by Pn\mathbf{P}^{n}, reducing multi-step analysis to matrix powers.