Ferkans — Interactive Telecom Tutor

Why Markov Chains in Telecommunications?

Consider a wireless channel that alternates between a "good" state (low error rate) and a "bad" state (high error rate). The channel does not flip randomly at every symbol --- it stays in one state for a burst of symbols, then transitions. This bursty behaviour is not captured by a single random variable; it requires a model that tracks the state of the system over time.

Markov chains provide exactly this model. Their defining property --- the memoryless property --- states that the future evolution of the system depends only on the present state, not on the history of how we arrived there. This makes analysis tractable: instead of tracking an ever-growing history, we need only the current state and a matrix of transition probabilities.

Three major application areas in telecommunications are:

Fading channel models. The Gilbert--Elliott model (two-state Markov chain) captures bursty error patterns on wireless links. Extensions to multi-state Markov models approximate Rayleigh and Rician fading with arbitrary accuracy.
Protocol state machines. ARQ retransmission protocols, TCP congestion control, and random access protocols (ALOHA, CSMA) are naturally described as Markov chains whose stationary distributions yield throughput and delay metrics.
Queuing systems. The M/M/1 queue --- Poisson arrivals, exponential service --- is a birth-death Markov chain. Its analysis gives closed-form expressions for delay, buffer occupancy, and packet loss, which are essential for network dimensioning.

Remarkably, computing the long-run behaviour of a Markov chain reduces to finding a left eigenvector of the transition matrix with eigenvalue $1$ . This connects directly to the eigenvalue theory of Chapter 1 and demonstrates how linear algebra underpins the analysis of dynamic stochastic systems.

Definition:
Discrete-Time Markov Chain (DTMC)

A discrete-time Markov chain is a sequence of random variables $\{X_n\}_{n \geq 0}$ taking values in a countable state space $\mathcal{S}$ and satisfying the Markov property: for all $n \geq 0$ and all states $i, j, i_{n-1}, \ldots, i_0 \in \mathcal{S}$ ,

$P(X_{n+1} = j \mid X_n = i,\, X_{n-1} = i_{n-1},\, \ldots,\, X_0 = i_0) = P(X_{n+1} = j \mid X_n = i).$

In words: given the present state $X_n = i$ , the conditional distribution of the future $X_{n+1}$ is independent of the past $X_0, X_1, \ldots, X_{n-1}$ .

The chain is time-homogeneous if the transition probabilities do not depend on the time index $n$ :

$P(X_{n+1} = j \mid X_n = i) = p_{ij} \quad \text{for all } n.$

Throughout this section, we consider only time-homogeneous chains unless stated otherwise.

,

Definition:
Transition Probability Matrix

For a time-homogeneous DTMC on a finite state space $\mathcal{S} = \{1, 2, \ldots, M\}$ , the transition probability matrix is the $M \times M$ matrix $\mathbf{P}$ with entries

$[\mathbf{P}]_{ij} = p_{ij} = P(X_{n+1} = j \mid X_n = i), \quad i, j \in \mathcal{S}.$

The matrix $\mathbf{P}$ has two fundamental properties:

Non-negativity: $p_{ij} \geq 0$ for all $i, j$ .
Row stochasticity: each row sums to one, $\displaystyle\sum_{j \in \mathcal{S}} p_{ij} = 1$ for all $i$ .

A matrix satisfying both properties is called a (row) stochastic matrix.

The $n$ -step transition probability $p_{ij}^{(n)} = P(X_{m+n} = j \mid X_m = i)$ is given by the $(i,j)$ -entry of $\mathbf{P}^n$ :

$\mathbf{P}^{(n)} = \mathbf{P}^n.$

Thus, the probability distribution of the chain after $n$ steps, starting from initial distribution $\boldsymbol{\pi}_0^T$ , is

$\boldsymbol{\pi}_n^T = \boldsymbol{\pi}_0^T \mathbf{P}^n.$

Definition:
Chapman--Kolmogorov Equations

For any non-negative integers $m$ and $n$ , the $n$ -step and $m$ -step transition probabilities satisfy the Chapman--Kolmogorov equations:

$p_{ij}^{(m+n)} = \sum_{k \in \mathcal{S}} p_{ik}^{(m)}\, p_{kj}^{(n)}.$

In matrix form, this becomes

$\mathbf{P}^{m+n} = \mathbf{P}^m \, \mathbf{P}^n.$

The interpretation is intuitive: to go from state $i$ to state $j$ in $m + n$ steps, the chain must pass through some intermediate state $k$ after $m$ steps. Summing over all possible intermediate states $k$ gives the total probability.

This equation is the discrete-time analogue of the semigroup property of matrix exponentials in continuous-time systems.

,

Definition:
Irreducibility and Aperiodicity

Communication. State $j$ is accessible from state $i$ (written $i \to j$ ) if $p_{ij}^{(n)} > 0$ for some $n \geq 1$ . States $i$ and $j$ communicate (written $i \leftrightarrow j$ ) if $i \to j$ and $j \to i$ . Communication is an equivalence relation and partitions $\mathcal{S}$ into communicating classes.

Irreducibility. A Markov chain is irreducible if $\mathcal{S}$ consists of a single communicating class, i.e., every state is accessible from every other state. Equivalently, for every pair $(i, j)$ there exists $n \geq 1$ such that $p_{ij}^{(n)} > 0$ .

Period. The period of state $i$ is

$d(i) = \gcd\{n \geq 1 : p_{ii}^{(n)} > 0\}.$

If $d(i) = 1$ , the state is aperiodic; if $d(i) > 1$ , the state is periodic with period $d(i)$ .

Aperiodic chain. An irreducible chain is aperiodic if every state is aperiodic. Since all states in an irreducible chain share the same period, it suffices to check a single state.

A Markov chain that is both irreducible and aperiodic is sometimes called ergodic (in the Markov chain sense). For such chains, the long-run fraction of time spent in each state converges to a unique stationary distribution.

,

Definition:
Stationary Distribution

A probability vector $\boldsymbol{\pi} = [\pi_1, \pi_2, \ldots, \pi_M]^T$ is a stationary distribution (or steady-state distribution) of the Markov chain with transition matrix $\mathbf{P}$ if

$\boldsymbol{\pi}^T \mathbf{P} = \boldsymbol{\pi}^T, \qquad \sum_{i=1}^{M} \pi_i = 1, \qquad \pi_i \geq 0 \;\;\forall\, i.$

Component-wise, this reads

$\pi_j = \sum_{i=1}^{M} \pi_i \, p_{ij}, \quad j = 1, \ldots, M.$

Eigenvector interpretation. The stationarity condition $\boldsymbol{\pi}^T \mathbf{P} = \boldsymbol{\pi}^T$ says that $\boldsymbol{\pi}$ is a left eigenvector of $\mathbf{P}$ corresponding to eigenvalue $\lambda = 1$ . Equivalently, $\mathbf{P}^T \boldsymbol{\pi} = \boldsymbol{\pi}$ , so $\boldsymbol{\pi}$ is a right eigenvector of $\mathbf{P}^T$ with eigenvalue $1$ .

By the Perron--Frobenius theorem, every row-stochastic matrix has $\lambda = 1$ as its largest eigenvalue (in magnitude), so the stationary distribution is associated with the dominant eigenvalue --- the very eigenvector that the power method of Chapter 1 converges to.

,

Theorem: Existence and Uniqueness of the Stationary Distribution

Let $\{X_n\}$ be an irreducible, aperiodic, positive-recurrent Markov chain on a finite state space $\mathcal{S}$ with transition matrix $\mathbf{P}$ . Then:

There exists a unique stationary distribution $\boldsymbol{\pi}$ satisfying $\boldsymbol{\pi}^T \mathbf{P} = \boldsymbol{\pi}^T$ with $\pi_i > 0$ for all $i \in \mathcal{S}$ .
For every initial distribution, the $n$ -step transition matrix converges:

$\lim_{n \to \infty} \mathbf{P}^n = \mathbf{1}\,\boldsymbol{\pi}^T,$

where $\mathbf{1} = [1, 1, \ldots, 1]^T$ . Equivalently, $\lim_{n \to \infty} p_{ij}^{(n)} = \pi_j$ for all $i, j$ .
The stationary probability $\pi_i$ equals the long-run fraction of time the chain spends in state $i$ :

$\pi_i = \lim_{N \to \infty} \frac{1}{N} \sum_{n=0}^{N-1} \mathbf{1}_{[X_n = i]} \quad \text{a.s.}$

Every finite irreducible chain is positive-recurrent (the chain returns to every state in finite expected time). Aperiodicity ensures that $\mathbf{P}^n$ converges to a rank-one matrix rather than cycling. The Perron--Frobenius theorem guarantees that eigenvalue $1$ is simple and all other eigenvalues satisfy $|\lambda| < 1$ , so $\mathbf{P}^n \to \mathbf{1}\boldsymbol{\pi}^T$ geometrically fast.

Note: For countable (infinite) state spaces, irreducibility alone does not guarantee positive recurrence --- the chain may be null recurrent or transient. The finite-state-space assumption eliminates these subtleties.

,

Example: Gilbert--Elliott Channel Model

Markov Chain State Distribution Evolution

Parameters

Number of States

p_{01}

(transition 0

\to

1)0.3

p_{10}

(transition 1

\to

0)0.5

Number of Time Steps20

Definition:
Birth--Death Process

A birth--death process is a Markov chain on the non-negative integers $\mathcal{S} = \{0, 1, 2, \ldots\}$ in which transitions occur only to adjacent states:

$p_{i,\,i+1} = \lambda_i \quad (\text{birth rate}), \qquad p_{i,\,i-1} = \mu_i \quad (\text{death rate}), \qquad p_{i,i} = 1 - \lambda_i - \mu_i,$

with $\mu_0 = 0$ (no deaths in state $0$ ) and all other transition probabilities equal to zero.

Balance equations. For a stationary distribution $\boldsymbol{\pi}$ , the detailed balance equations (also called local balance) equate the probability flow across each boundary:

$\pi_i \, \lambda_i = \pi_{i+1} \, \mu_{i+1}, \quad i = 0, 1, 2, \ldots$

These are stronger than the global balance equations ( $\boldsymbol{\pi}^T \mathbf{P} = \boldsymbol{\pi}^T$ ) but are automatically satisfied for birth--death chains because transitions skip no states.

Solving recursively:

$\pi_i = \pi_0 \prod_{k=0}^{i-1} \frac{\lambda_k}{\mu_{k+1}}, \quad i \geq 1,$

where $\pi_0$ is determined by the normalisation $\sum_{i=0}^{\infty} \pi_i = 1$ .

Birth--death processes are the backbone of queuing theory: the "births" are customer arrivals and the "deaths" are service completions.

Definition:
The M/M/1 Queue

The M/M/1 queue is the simplest and most important single-server queuing model:

M (Markovian arrivals): packets arrive according to a Poisson process with rate $\lambda$ (hence exponential inter-arrival times with mean $1/\lambda$ ).
M (Markovian service): service times are i.i.d. exponential with rate $\mu$ (mean service time $1/\mu$ ).
1: a single server.

The queue length $\{X(t)\}_{t \geq 0}$ forms a continuous-time birth--death process with constant rates $\lambda_i = \lambda$ and $\mu_i = \mu$ for all $i$ . The discrete-time embedded chain at transition epochs is also a birth--death Markov chain.

Traffic intensity:

$\rho = \frac{\lambda}{\mu}.$

Stability condition: the queue has a stationary distribution if and only if $\rho < 1$ (the server must be faster than the arrival rate, on average).

Stationary distribution. From the birth--death recursion with $\lambda_i = \lambda$ , $\mu_{i+1} = \mu$ :

$\pi_k = (1 - \rho)\,\rho^k, \quad k = 0, 1, 2, \ldots$

This is a geometric distribution with parameter $\rho$ . The probability that the queue is empty is $\pi_0 = 1 - \rho$ , and the probability that the queue has $k$ or more customers is $P(X \geq k) = \rho^k$ .

,

Example: M/M/1 Queue Performance Metrics

,

Markov Chain State Evolution and Convergence to Stationarity

A 3-state Markov chain (extended Gilbert-Elliott model) with animated probability flow along edges. Watch the state distribution converge from a deterministic initial state to the stationary distribution over successive time steps.

Starting from state 0 (Good), the distribution over the three channel states evolves according to the transition matrix. After approximately 10--15 steps, the distribution stabilises at the stationary vector

\boldsymbol{\pi}

, regardless of the initial state.

Why This Matters: Markov Models for Fading Channels and Protocols

Fading channel models. The Gilbert--Elliott model (Example above) is the simplest finite-state Markov channel (FSMC). More accurate models partition the received SNR range into $M$ regions and define transition probabilities from the level-crossing rates of the fading process:

$p_{ij} \approx \frac{N(R_j)}{\pi_i \, f_s},$

where $N(R_j)$ is the level-crossing rate at threshold $R_j$ , $\pi_i$ is the steady-state probability of state $i$ , and $f_s$ is the symbol rate. The stationary distribution of the FSMC gives the fraction of time the channel SNR falls in each region, enabling computation of average throughput under adaptive modulation and coding.

Random access protocols. Slotted ALOHA can be modeled as a Markov chain where the state is the number of backlogged users. The analysis reveals a bistable system: one stable point near zero backlog (high throughput) and another at high backlog (near zero throughput). This insight motivated the development of exponential backoff algorithms and CSMA/CA.

ARQ protocols. Stop-and-wait, Go-Back- $N$ , and selective repeat ARQ protocols are naturally modeled as Markov chains. The state tracks the sender's window position, outstanding acknowledgements, and retransmission status. The stationary distribution yields the throughput efficiency $\eta$ , i.e., the fraction of transmitted frames that carry new data.

TCP congestion control. The AIMD (additive increase, multiplicative decrease) mechanism of TCP can be modeled as a Markov chain, leading to the well-known TCP throughput formula:

$\text{Throughput} \approx \frac{C}{\text{RTT}\,\sqrt{p}},$

where $p$ is the packet loss probability, RTT is the round-trip time, and $C$ is a constant.

Power Method for Computing the Stationary Distribution

Complexity:

O(M^2)

per iteration (vector-matrix product); typically converges in

O(\log(1/\epsilon) / \log(1/|\lambda_2|))

iterations

Input: Row-stochastic matrix

\mathbf{P} \in \mathbb{R}^{M \times M}

,

tolerance

\epsilon > 0

, maximum iterations

N_{\max}

Output: Stationary distribution

\boldsymbol{\pi}

1. Initialize

\boldsymbol{\pi}^{(0)} \leftarrow [1/M, \, 1/M, \, \ldots, \, 1/M]^T

(uniform)

2. for

k = 0, 1, 2, \ldots, N_{\max} - 1

do

3.

\quad \boldsymbol{\pi}^{(k+1)T} \leftarrow \boldsymbol{\pi}^{(k)T} \mathbf{P}

4.

\quad

if

\|\boldsymbol{\pi}^{(k+1)} - \boldsymbol{\pi}^{(k)}\|_1 < \epsilon

then return

\boldsymbol{\pi}^{(k+1)}

5. end for

6. return

\boldsymbol{\pi}^{(N_{\max})}

This is precisely the power method from Section 1.7, applied to $\mathbf{P}^T$ . Recall that the power method computes the dominant eigenvector of a matrix by repeated multiplication. Since $\mathbf{P}$ is row-stochastic, its largest eigenvalue is $1$ (by Perron--Frobenius), and the corresponding left eigenvector is the stationary distribution $\boldsymbol{\pi}$ .

Convergence rate. The convergence is geometric with ratio $|\lambda_2(\mathbf{P})|$ , the second-largest eigenvalue magnitude. The quantity $1 - |\lambda_2|$ is called the spectral gap and measures how fast the chain mixes.

Connection to Chapter 1. In Chapter 1, we used the power method to find the dominant eigenvector of symmetric matrices (e.g., for PCA or dominant mode analysis). Here the matrix $\mathbf{P}$ is generally not symmetric --- it is stochastic --- but the same iterative principle applies. The key insight is that the stationary distribution of a Markov chain is fundamentally an eigenvalue problem.

Alternatives. For very large sparse chains, direct methods (Gaussian elimination on $(\mathbf{P}^T - \mathbf{I})\boldsymbol{\pi} = \mathbf{0}$ plus the normalisation constraint) or iterative solvers (GMRES, Gauss--Seidel) may be more efficient.

Quick Check

Consider the two-state Markov chain with transition matrix

$\mathbf{P} = \begin{pmatrix} 0.7 & 0.3 \\ 0.4 & 0.6 \end{pmatrix}.$

What is the stationary probability $\pi_1$ (the long-run fraction of time spent in state 1)?

$\pi_1 = 3/7$

$\pi_1 = 4/7$

$\pi_1 = 1/2$

$\pi_1 = 0.3$

Correction:

\pi_1 = 4/7

Correct. From the balance equation $\pi_0 \cdot 0.3 = \pi_1 \cdot 0.4$ and $\pi_0 + \pi_1 = 1$ , we get $\pi_1 = 0.3/(0.3 + 0.4) = 3/7$ . Wait --- let us be careful with the indexing. If the states are labelled $0$ and $1$ , then $p_{01} = 0.3$ and $p_{10} = 0.4$ , so $\pi_1 = p_{01}/(p_{01} + p_{10}) = 0.3/0.7 = 3/7$ .

Actually, with states labelled $1$ and $2$ (as per the matrix rows), $p_{12} = 0.3$ and $p_{21} = 0.4$ , so $\pi_1 = p_{21}/(p_{12} + p_{21}) = 0.4/0.7 = 4/7 \approx 0.571$ .

Quick Check

An M/M/1 queue has arrival rate $\lambda = 900$ packets/s and service rate $\mu = 1000$ packets/s. What is the mean number of packets in the system $L$ ?

$L = 9$

$L = 0.9$

$L = 10$

$L = 90$

Correction:

L = 9

Correct. The traffic intensity is $\rho = 900/1000 = 0.9$ , so $L = \rho/(1 - \rho) = 0.9/0.1 = 9$ packets.

Quick Check

Which of the following is required for a finite Markov chain to have a unique stationary distribution?

The chain must be irreducible

The chain must be aperiodic

The transition matrix must be symmetric

All entries of $\mathbf{P}$ must be strictly positive

Correction:

The chain must be irreducible

Correct. Irreducibility (all states communicate) guarantees uniqueness of the stationary distribution for finite chains. Aperiodicity is additionally needed for convergence of $\mathbf{P}^n$ to the stationary distribution, but uniqueness holds even for periodic irreducible chains.

Markov Chain

A stochastic process $\{X_n\}$ (discrete-time) or $\{X(t)\}$ (continuous-time) satisfying the Markov (memoryless) property: conditioned on the present state, the future is independent of the past. Markov chains are the fundamental building block of queuing theory, protocol analysis, and finite-state channel models.

Transition Matrix

A row-stochastic matrix $\mathbf{P}$ whose $(i,j)$ -entry $p_{ij} = P(X_{n+1} = j \mid X_n = i)$ gives the one-step transition probability from state $i$ to state $j$ . The $n$ -step transition probabilities are given by $\mathbf{P}^n$ .

Stationary Distribution

A probability vector $\boldsymbol{\pi}$ satisfying $\boldsymbol{\pi}^T \mathbf{P} = \boldsymbol{\pi}^T$ . It is the left eigenvector of $\mathbf{P}$ associated with eigenvalue $1$ . For an irreducible chain, the stationary distribution is unique and represents the long-run fraction of time spent in each state.

Irreducible

A Markov chain is irreducible if every state can be reached from every other state in a finite number of steps. Equivalently, the state space consists of a single communicating class. Irreducibility guarantees uniqueness of the stationary distribution (for finite chains).

M/M/1 Queue

A single-server queuing model with Poisson arrivals (rate $\lambda$ ) and exponential service times (rate $\mu$ ). The stationary queue length distribution is geometric: $\pi_k = (1-\rho)\rho^k$ where $\rho = \lambda/\mu < 1$ . Key metrics: mean queue length $L = \rho/(1-\rho)$ , mean delay $W = 1/(\mu - \lambda)$ .

Common Mistake: Forgetting to Check $\rho < 1$ for M/M/1 Stability

Mistake:

Applying the M/M/1 formulas $L = \rho/(1-\rho)$ and $W = 1/(\mu - \lambda)$ without first verifying that $\rho < 1$ .

When $\rho \geq 1$ (i.e., $\lambda \geq \mu$ ), the queue has no stationary distribution: the queue length grows without bound, the mean delay is infinite, and the formulas produce negative or nonsensical values.

This error is common in homework and can also arise in system design when traffic spikes push the offered load above the service capacity.

Correction:

Always check stability first. Before computing any M/M/1 metric, verify:

$\rho = \frac{\lambda}{\mu} < 1.$

If $\rho \geq 1$ , the system is unstable and no steady-state analysis applies. In practice, if $\rho$ is close to $1$ (e.g., $\rho > 0.9$ ), the delay and queue length are extremely sensitive to small changes in $\lambda$ or $\mu$ --- the system operates near the "knee" of the delay curve.

For $\rho \geq 1$ , one must either increase the service rate (faster link, more efficient coding), decrease the arrival rate (admission control, traffic shaping), or add more servers (M/M/ $c$ queue).

Key Takeaway

The central message of this section in three points:

Markov chains reduce dynamics to matrices. A complex system with many interacting states --- fading channels, protocol machines, queuing networks --- is fully characterised by its transition matrix $\mathbf{P}$ . All temporal behaviour is encoded in the powers $\mathbf{P}^n$ .
The stationary distribution is an eigenvector. The long-run behaviour $\boldsymbol{\pi}^T \mathbf{P} = \boldsymbol{\pi}^T$ is a left eigenvector equation with eigenvalue $1$ . Computing $\boldsymbol{\pi}$ is exactly the eigenvalue problem from Chapter 1 applied to a stochastic matrix. The power method, QR algorithm, and spectral decomposition all carry over directly.
Queuing theory = birth--death Markov chains. The M/M/1 queue produces closed-form delay and throughput formulas that every network engineer uses daily. The key parameter is the traffic intensity $\rho = \lambda/\mu$ : as $\rho \to 1$ , delay blows up as $1/(1-\rho)$ --- a fundamental limit that no clever protocol can circumvent.

Markov chains are thus a bridge between the linear algebra of Chapter 1 and the probability theory of Chapter 2, unified by the observation that steady-state analysis of dynamic systems is an eigenvalue problem.

Markov Chains and Queuing Basics

Why Markov Chains in Telecommunications?

Definition: Discrete-Time Markov Chain (DTMC)

Definition: Transition Probability Matrix

Definition: Chapman--Kolmogorov Equations

Definition: Irreducibility and Aperiodicity

Definition: Stationary Distribution

Theorem: Existence and Uniqueness of the Stationary Distribution

Example: Gilbert--Elliott Channel Model

Markov Chain State Distribution Evolution

Parameters

Definition: Birth--Death Process

Definition: The M/M/1 Queue

Example: M/M/1 Queue Performance Metrics

Markov Chain State Evolution and Convergence to Stationarity

Why This Matters: Markov Models for Fading Channels and Protocols

Power Method for Computing the Stationary Distribution

Quick Check

Quick Check

Quick Check

Markov Chain

Transition Matrix

Stationary Distribution

Irreducible

M/M/1 Queue

Common Mistake: Forgetting to Check ρ<1\rho < 1ρ<1 for M/M/1 Stability

Key Takeaway

Definition:
Discrete-Time Markov Chain (DTMC)

Definition:
Transition Probability Matrix

Definition:
Chapman--Kolmogorov Equations

Definition:
Irreducibility and Aperiodicity

Definition:
Stationary Distribution

Definition:
Birth--Death Process

Definition:
The M/M/1 Queue

Common Mistake: Forgetting to Check $\rho < 1$ for M/M/1 Stability