Definition and Transition Probabilities
Why Markov Chains?
Many systems we encounter in engineering evolve randomly over time, but with a crucial simplification: the future depends on the past only through the present state. A packet retransmission protocol remembers only whether the last packet was acknowledged or lost; a wireless channel model tracks only the current fading state. This "memoryless" property β the Markov property β transforms the intractable problem of tracking full histories into an elegant matrix algebra.
Discrete-time Markov chains (DTMCs) are the simplest and most widely-used class of Markov models. They appear throughout communications: channel models, queueing, random access protocols, iterative decoding, and network analysis. The theory we develop in this chapter is the foundation for all of these.
Definition: Markov Property and Discrete-Time Markov Chain
Markov Property and Discrete-Time Markov Chain
Let be a sequence of random variables taking values in a countable set (the state space). The process is a discrete-time Markov chain (DTMC) if it satisfies the Markov property: for all and all states ,
The chain is time-homogeneous if the right-hand side does not depend on :
Unless stated otherwise, all Markov chains in this chapter are time-homogeneous.
The Markov property says: given the present, the future is independent of the past. This is sometimes called the "memoryless" property, though it should not be confused with the memoryless property of the exponential or geometric distributions.
Markov property
The property that the conditional distribution of given the entire past depends only on . Formally: .
Historical Note: Andrey Markov and Chains of Linked Events
Early 20th centuryAndrey Andreyevich Markov (1856--1922) introduced dependent sequences in 1906 to prove that the law of large numbers does not require independence. His original application was a statistical analysis of the alternation of vowels and consonants in Pushkin's poem Eugene Onegin. This seemingly literary exercise established that convergence theorems extend far beyond i.i.d. sequences, opening the door to the modern theory of stochastic processes. The term "Markov chain" was coined by Sergei Bernstein in the 1920s.
Definition: Transition Matrix
Transition Matrix
The transition matrix of a time-homogeneous DTMC on (or a countable set) is the matrix whose -entry is
The matrix is a stochastic matrix: every entry is non-negative and each row sums to one:
Equivalently, , where is the all-ones column vector.
We adopt the convention that rows of correspond to the current state and columns to the next state. Some references transpose this convention; be careful when consulting different sources.
Stochastic matrix
A square matrix with non-negative entries whose rows each sum to one. Also called a row-stochastic matrix. If both rows and columns sum to one, it is doubly stochastic.
Related: Transition Matrix
Example: Two-State Markov Chain (Sunny/Rainy Weather)
A simplified weather model has two states: (Sunny, Rainy). On a sunny day, the probability of rain tomorrow is . On a rainy day, the probability of sun tomorrow is . Find the transition matrix and compute .
Identify the transition probabilities
From the problem statement: , , , .
Write the transition matrix
0.7 + 0.3 = 10.5 + 0.5 = 1$. This is indeed a stochastic matrix.
Compute the two-step probability
$
Transition Diagrams
A DTMC is often visualized as a directed graph (transition diagram): each state is a node, and a directed edge from to with label is drawn whenever . Self-loops represent . The transition diagram provides immediate insight into the structure: which states can reach which, whether the chain has absorbing states, and whether it decomposes into independent sub-chains.
Theorem: Chapman-Kolmogorov Equation
Let denote the -step transition probability. Then for all and all states :
In matrix form: . In particular, (the ordinary -th matrix power).
To go from to in steps, the chain must be at some intermediate state after steps. The Chapman-Kolmogorov equation sums over all possible intermediaries, weighting by the probability of each two-leg journey.
Condition on the intermediate state
By the law of total probability and the Markov property:
Apply the Markov property
Since the chain is Markov, by time-homogeneity. Thus:
Matrix interpretation
The summation is precisely the -entry of the matrix product . Hence , and by induction is the -th power of .
Example: Three-State Chain: Computing
Consider a DTMC on with transition matrix
Compute and find the probability of returning to state 2 in exactly 2 steps.
Matrix multiplication
$
Read off the answer
. State 2 returns to itself with certainty in 2 steps. This occurs because from state 2 the chain goes to state 1 or 3 with equal probability, and both states 1 and 3 return to state 2 with probability 1.
Verify row sums
Each row of sums to 1, confirming that is also stochastic. This must hold: the product of stochastic matrices is stochastic.
Transition Matrix Powers: Visualizing Convergence
Enter a transition matrix and observe how evolves as increases. For irreducible aperiodic chains, each row converges to the stationary distribution .
Parameters
Compute P^1 through P^n
Quick Check
Which of the following is a valid (row-)stochastic matrix?
Both rows sum to 1 and all entries are non-negative. This is a valid stochastic matrix.
Common Mistake: Row vs Column Convention
Mistake:
Confusing row-stochastic matrices (our convention, where rows sum to 1) with column-stochastic matrices (where columns sum to 1). Some references, especially in applied mathematics and PageRank, use the column convention.
Correction:
Always check which convention a source uses. In our convention, the state distribution is a row vector satisfying (left eigenvector). In the column convention it would be (right eigenvector).
Key Takeaway
A discrete-time Markov chain is fully characterized by its transition matrix and initial distribution. The Chapman-Kolmogorov equation shows that -step transition probabilities are given by , reducing multi-step analysis to matrix powers.