Markov Chains and Queuing Basics

Why Markov Chains in Telecommunications?

Consider a wireless channel that alternates between a "good" state (low error rate) and a "bad" state (high error rate). The channel does not flip randomly at every symbol --- it stays in one state for a burst of symbols, then transitions. This bursty behaviour is not captured by a single random variable; it requires a model that tracks the state of the system over time.

Markov chains provide exactly this model. Their defining property --- the memoryless property --- states that the future evolution of the system depends only on the present state, not on the history of how we arrived there. This makes analysis tractable: instead of tracking an ever-growing history, we need only the current state and a matrix of transition probabilities.

Three major application areas in telecommunications are:

  • Fading channel models. The Gilbert--Elliott model (two-state Markov chain) captures bursty error patterns on wireless links. Extensions to multi-state Markov models approximate Rayleigh and Rician fading with arbitrary accuracy.
  • Protocol state machines. ARQ retransmission protocols, TCP congestion control, and random access protocols (ALOHA, CSMA) are naturally described as Markov chains whose stationary distributions yield throughput and delay metrics.
  • Queuing systems. The M/M/1 queue --- Poisson arrivals, exponential service --- is a birth-death Markov chain. Its analysis gives closed-form expressions for delay, buffer occupancy, and packet loss, which are essential for network dimensioning.

Remarkably, computing the long-run behaviour of a Markov chain reduces to finding a left eigenvector of the transition matrix with eigenvalue 11. This connects directly to the eigenvalue theory of Chapter 1 and demonstrates how linear algebra underpins the analysis of dynamic stochastic systems.

Definition:

Discrete-Time Markov Chain (DTMC)

A discrete-time Markov chain is a sequence of random variables {Xn}nβ‰₯0\{X_n\}_{n \geq 0} taking values in a countable state space S\mathcal{S} and satisfying the Markov property: for all nβ‰₯0n \geq 0 and all states i,j,inβˆ’1,…,i0∈Si, j, i_{n-1}, \ldots, i_0 \in \mathcal{S},

P(Xn+1=j∣Xn=i, Xnβˆ’1=inβˆ’1, …, X0=i0)=P(Xn+1=j∣Xn=i).P(X_{n+1} = j \mid X_n = i,\, X_{n-1} = i_{n-1},\, \ldots,\, X_0 = i_0) = P(X_{n+1} = j \mid X_n = i).

In words: given the present state Xn=iX_n = i, the conditional distribution of the future Xn+1X_{n+1} is independent of the past X0,X1,…,Xnβˆ’1X_0, X_1, \ldots, X_{n-1}.

The chain is time-homogeneous if the transition probabilities do not depend on the time index nn:

P(Xn+1=j∣Xn=i)=pijfor all n.P(X_{n+1} = j \mid X_n = i) = p_{ij} \quad \text{for all } n.

Throughout this section, we consider only time-homogeneous chains unless stated otherwise.

,

Definition:

Transition Probability Matrix

For a time-homogeneous DTMC on a finite state space S={1,2,…,M}\mathcal{S} = \{1, 2, \ldots, M\}, the transition probability matrix is the MΓ—MM \times M matrix P\mathbf{P} with entries

[P]ij=pij=P(Xn+1=j∣Xn=i),i,j∈S.[\mathbf{P}]_{ij} = p_{ij} = P(X_{n+1} = j \mid X_n = i), \quad i, j \in \mathcal{S}.

The matrix P\mathbf{P} has two fundamental properties:

  1. Non-negativity: pijβ‰₯0p_{ij} \geq 0 for all i,ji, j.
  2. Row stochasticity: each row sums to one, βˆ‘j∈Spij=1\displaystyle\sum_{j \in \mathcal{S}} p_{ij} = 1 for all ii.

A matrix satisfying both properties is called a (row) stochastic matrix.

The nn-step transition probability pij(n)=P(Xm+n=j∣Xm=i)p_{ij}^{(n)} = P(X_{m+n} = j \mid X_m = i) is given by the (i,j)(i,j)-entry of Pn\mathbf{P}^n:

P(n)=Pn.\mathbf{P}^{(n)} = \mathbf{P}^n.

Thus, the probability distribution of the chain after nn steps, starting from initial distribution Ο€0T\boldsymbol{\pi}_0^T, is

Ο€nT=Ο€0TPn.\boldsymbol{\pi}_n^T = \boldsymbol{\pi}_0^T \mathbf{P}^n.

Definition:

Chapman--Kolmogorov Equations

For any non-negative integers mm and nn, the nn-step and mm-step transition probabilities satisfy the Chapman--Kolmogorov equations:

pij(m+n)=βˆ‘k∈Spik(m) pkj(n).p_{ij}^{(m+n)} = \sum_{k \in \mathcal{S}} p_{ik}^{(m)}\, p_{kj}^{(n)}.

In matrix form, this becomes

Pm+n=Pm Pn.\mathbf{P}^{m+n} = \mathbf{P}^m \, \mathbf{P}^n.

The interpretation is intuitive: to go from state ii to state jj in m+nm + n steps, the chain must pass through some intermediate state kk after mm steps. Summing over all possible intermediate states kk gives the total probability.

This equation is the discrete-time analogue of the semigroup property of matrix exponentials in continuous-time systems.

,

Definition:

Irreducibility and Aperiodicity

Communication. State jj is accessible from state ii (written iβ†’ji \to j) if pij(n)>0p_{ij}^{(n)} > 0 for some nβ‰₯1n \geq 1. States ii and jj communicate (written i↔ji \leftrightarrow j) if iβ†’ji \to j and jβ†’ij \to i. Communication is an equivalence relation and partitions S\mathcal{S} into communicating classes.

Irreducibility. A Markov chain is irreducible if S\mathcal{S} consists of a single communicating class, i.e., every state is accessible from every other state. Equivalently, for every pair (i,j)(i, j) there exists nβ‰₯1n \geq 1 such that pij(n)>0p_{ij}^{(n)} > 0.

Period. The period of state ii is

d(i)=gcd⁑{nβ‰₯1:pii(n)>0}.d(i) = \gcd\{n \geq 1 : p_{ii}^{(n)} > 0\}.

If d(i)=1d(i) = 1, the state is aperiodic; if d(i)>1d(i) > 1, the state is periodic with period d(i)d(i).

Aperiodic chain. An irreducible chain is aperiodic if every state is aperiodic. Since all states in an irreducible chain share the same period, it suffices to check a single state.

A Markov chain that is both irreducible and aperiodic is sometimes called ergodic (in the Markov chain sense). For such chains, the long-run fraction of time spent in each state converges to a unique stationary distribution.

,

Definition:

Stationary Distribution

A probability vector Ο€=[Ο€1,Ο€2,…,Ο€M]T\boldsymbol{\pi} = [\pi_1, \pi_2, \ldots, \pi_M]^T is a stationary distribution (or steady-state distribution) of the Markov chain with transition matrix P\mathbf{P} if

Ο€TP=Ο€T,βˆ‘i=1MΟ€i=1,Ο€iβ‰₯0β€…β€Šβ€…β€Šβˆ€β€‰i.\boldsymbol{\pi}^T \mathbf{P} = \boldsymbol{\pi}^T, \qquad \sum_{i=1}^{M} \pi_i = 1, \qquad \pi_i \geq 0 \;\;\forall\, i.

Component-wise, this reads

Ο€j=βˆ‘i=1MΟ€i pij,j=1,…,M.\pi_j = \sum_{i=1}^{M} \pi_i \, p_{ij}, \quad j = 1, \ldots, M.

Eigenvector interpretation. The stationarity condition Ο€TP=Ο€T\boldsymbol{\pi}^T \mathbf{P} = \boldsymbol{\pi}^T says that Ο€\boldsymbol{\pi} is a left eigenvector of P\mathbf{P} corresponding to eigenvalue Ξ»=1\lambda = 1. Equivalently, PTΟ€=Ο€\mathbf{P}^T \boldsymbol{\pi} = \boldsymbol{\pi}, so Ο€\boldsymbol{\pi} is a right eigenvector of PT\mathbf{P}^T with eigenvalue 11.

By the Perron--Frobenius theorem, every row-stochastic matrix has Ξ»=1\lambda = 1 as its largest eigenvalue (in magnitude), so the stationary distribution is associated with the dominant eigenvalue --- the very eigenvector that the power method of Chapter 1 converges to.

,

Theorem: Existence and Uniqueness of the Stationary Distribution

Let {Xn}\{X_n\} be an irreducible, aperiodic, positive-recurrent Markov chain on a finite state space S\mathcal{S} with transition matrix P\mathbf{P}. Then:

  1. There exists a unique stationary distribution Ο€\boldsymbol{\pi} satisfying Ο€TP=Ο€T\boldsymbol{\pi}^T \mathbf{P} = \boldsymbol{\pi}^T with Ο€i>0\pi_i > 0 for all i∈Si \in \mathcal{S}.

  2. For every initial distribution, the nn-step transition matrix converges:

    lim⁑nβ†’βˆžPn=1 πT,\lim_{n \to \infty} \mathbf{P}^n = \mathbf{1}\,\boldsymbol{\pi}^T,

    where 1=[1,1,…,1]T\mathbf{1} = [1, 1, \ldots, 1]^T. Equivalently, lim⁑nβ†’βˆžpij(n)=Ο€j\lim_{n \to \infty} p_{ij}^{(n)} = \pi_j for all i,ji, j.

  3. The stationary probability Ο€i\pi_i equals the long-run fraction of time the chain spends in state ii:

    Ο€i=lim⁑Nβ†’βˆž1Nβˆ‘n=0Nβˆ’11[Xn=i]a.s.\pi_i = \lim_{N \to \infty} \frac{1}{N} \sum_{n=0}^{N-1} \mathbf{1}_{[X_n = i]} \quad \text{a.s.}

Every finite irreducible chain is positive-recurrent (the chain returns to every state in finite expected time). Aperiodicity ensures that Pn\mathbf{P}^n converges to a rank-one matrix rather than cycling. The Perron--Frobenius theorem guarantees that eigenvalue 11 is simple and all other eigenvalues satisfy ∣λ∣<1|\lambda| < 1, so Pnβ†’1Ο€T\mathbf{P}^n \to \mathbf{1}\boldsymbol{\pi}^T geometrically fast.

Note: For countable (infinite) state spaces, irreducibility alone does not guarantee positive recurrence --- the chain may be null recurrent or transient. The finite-state-space assumption eliminates these subtleties.

,

Example: Gilbert--Elliott Channel Model

Markov Chain State Distribution Evolution

Parameters
0.3
0.5
20

Definition:

Birth--Death Process

A birth--death process is a Markov chain on the non-negative integers S={0,1,2,…}\mathcal{S} = \{0, 1, 2, \ldots\} in which transitions occur only to adjacent states:

pi, i+1=Ξ»i(birthΒ rate),pi, iβˆ’1=ΞΌi(deathΒ rate),pi,i=1βˆ’Ξ»iβˆ’ΞΌi,p_{i,\,i+1} = \lambda_i \quad (\text{birth rate}), \qquad p_{i,\,i-1} = \mu_i \quad (\text{death rate}), \qquad p_{i,i} = 1 - \lambda_i - \mu_i,

with ΞΌ0=0\mu_0 = 0 (no deaths in state 00) and all other transition probabilities equal to zero.

Balance equations. For a stationary distribution Ο€\boldsymbol{\pi}, the detailed balance equations (also called local balance) equate the probability flow across each boundary:

Ο€i λi=Ο€i+1 μi+1,i=0,1,2,…\pi_i \, \lambda_i = \pi_{i+1} \, \mu_{i+1}, \quad i = 0, 1, 2, \ldots

These are stronger than the global balance equations (Ο€TP=Ο€T\boldsymbol{\pi}^T \mathbf{P} = \boldsymbol{\pi}^T) but are automatically satisfied for birth--death chains because transitions skip no states.

Solving recursively:

Ο€i=Ο€0∏k=0iβˆ’1Ξ»kΞΌk+1,iβ‰₯1,\pi_i = \pi_0 \prod_{k=0}^{i-1} \frac{\lambda_k}{\mu_{k+1}}, \quad i \geq 1,

where Ο€0\pi_0 is determined by the normalisation βˆ‘i=0βˆžΟ€i=1\sum_{i=0}^{\infty} \pi_i = 1.

Birth--death processes are the backbone of queuing theory: the "births" are customer arrivals and the "deaths" are service completions.

Definition:

The M/M/1 Queue

The M/M/1 queue is the simplest and most important single-server queuing model:

  • M (Markovian arrivals): packets arrive according to a Poisson process with rate Ξ»\lambda (hence exponential inter-arrival times with mean 1/Ξ»1/\lambda).
  • M (Markovian service): service times are i.i.d. exponential with rate ΞΌ\mu (mean service time 1/ΞΌ1/\mu).
  • 1: a single server.

The queue length {X(t)}tβ‰₯0\{X(t)\}_{t \geq 0} forms a continuous-time birth--death process with constant rates Ξ»i=Ξ»\lambda_i = \lambda and ΞΌi=ΞΌ\mu_i = \mu for all ii. The discrete-time embedded chain at transition epochs is also a birth--death Markov chain.

Traffic intensity:

ρ=λμ.\rho = \frac{\lambda}{\mu}.

Stability condition: the queue has a stationary distribution if and only if ρ<1\rho < 1 (the server must be faster than the arrival rate, on average).

Stationary distribution. From the birth--death recursion with Ξ»i=Ξ»\lambda_i = \lambda, ΞΌi+1=ΞΌ\mu_{i+1} = \mu:

Ο€k=(1βˆ’Ο) ρk,k=0,1,2,…\pi_k = (1 - \rho)\,\rho^k, \quad k = 0, 1, 2, \ldots

This is a geometric distribution with parameter ρ\rho. The probability that the queue is empty is Ο€0=1βˆ’Ο\pi_0 = 1 - \rho, and the probability that the queue has kk or more customers is P(Xβ‰₯k)=ρkP(X \geq k) = \rho^k.

,

Example: M/M/1 Queue Performance Metrics

,

Markov Chain State Evolution and Convergence to Stationarity

A 3-state Markov chain (extended Gilbert-Elliott model) with animated probability flow along edges. Watch the state distribution converge from a deterministic initial state to the stationary distribution over successive time steps.
Starting from state 0 (Good), the distribution over the three channel states evolves according to the transition matrix. After approximately 10--15 steps, the distribution stabilises at the stationary vector Ο€\boldsymbol{\pi}, regardless of the initial state.

Why This Matters: Markov Models for Fading Channels and Protocols

Fading channel models. The Gilbert--Elliott model (Example above) is the simplest finite-state Markov channel (FSMC). More accurate models partition the received SNR range into MM regions and define transition probabilities from the level-crossing rates of the fading process:

pijβ‰ˆN(Rj)Ο€i fs,p_{ij} \approx \frac{N(R_j)}{\pi_i \, f_s},

where N(Rj)N(R_j) is the level-crossing rate at threshold RjR_j, Ο€i\pi_i is the steady-state probability of state ii, and fsf_s is the symbol rate. The stationary distribution of the FSMC gives the fraction of time the channel SNR falls in each region, enabling computation of average throughput under adaptive modulation and coding.

Random access protocols. Slotted ALOHA can be modeled as a Markov chain where the state is the number of backlogged users. The analysis reveals a bistable system: one stable point near zero backlog (high throughput) and another at high backlog (near zero throughput). This insight motivated the development of exponential backoff algorithms and CSMA/CA.

ARQ protocols. Stop-and-wait, Go-Back-NN, and selective repeat ARQ protocols are naturally modeled as Markov chains. The state tracks the sender's window position, outstanding acknowledgements, and retransmission status. The stationary distribution yields the throughput efficiency Ξ·\eta, i.e., the fraction of transmitted frames that carry new data.

TCP congestion control. The AIMD (additive increase, multiplicative decrease) mechanism of TCP can be modeled as a Markov chain, leading to the well-known TCP throughput formula:

Throughputβ‰ˆCRTT p,\text{Throughput} \approx \frac{C}{\text{RTT}\,\sqrt{p}},

where pp is the packet loss probability, RTT is the round-trip time, and CC is a constant.

Power Method for Computing the Stationary Distribution

Complexity: O(M2)O(M^2) per iteration (vector-matrix product); typically converges in O(log⁑(1/ϡ)/log⁑(1/∣λ2∣))O(\log(1/\epsilon) / \log(1/|\lambda_2|)) iterations
Input: Row-stochastic matrix P∈RMΓ—M\mathbf{P} \in \mathbb{R}^{M \times M},
tolerance ϡ>0\epsilon > 0, maximum iterations Nmax⁑N_{\max}
Output: Stationary distribution Ο€\boldsymbol{\pi}
1. Initialize Ο€(0)←[1/M, 1/M, …, 1/M]T\boldsymbol{\pi}^{(0)} \leftarrow [1/M, \, 1/M, \, \ldots, \, 1/M]^T (uniform)
2. for k=0,1,2,…,Nmaxβ‘βˆ’1k = 0, 1, 2, \ldots, N_{\max} - 1 do
3. Ο€(k+1)T←π(k)TP\quad \boldsymbol{\pi}^{(k+1)T} \leftarrow \boldsymbol{\pi}^{(k)T} \mathbf{P}
4. \quad if βˆ₯Ο€(k+1)βˆ’Ο€(k)βˆ₯1<Ο΅\|\boldsymbol{\pi}^{(k+1)} - \boldsymbol{\pi}^{(k)}\|_1 < \epsilon then return Ο€(k+1)\boldsymbol{\pi}^{(k+1)}
5. end for
6. return Ο€(Nmax⁑)\boldsymbol{\pi}^{(N_{\max})}

This is precisely the power method from Section 1.7, applied to PT\mathbf{P}^T. Recall that the power method computes the dominant eigenvector of a matrix by repeated multiplication. Since P\mathbf{P} is row-stochastic, its largest eigenvalue is 11 (by Perron--Frobenius), and the corresponding left eigenvector is the stationary distribution Ο€\boldsymbol{\pi}.

Convergence rate. The convergence is geometric with ratio ∣λ2(P)∣|\lambda_2(\mathbf{P})|, the second-largest eigenvalue magnitude. The quantity 1βˆ’βˆ£Ξ»2∣1 - |\lambda_2| is called the spectral gap and measures how fast the chain mixes.

Connection to Chapter 1. In Chapter 1, we used the power method to find the dominant eigenvector of symmetric matrices (e.g., for PCA or dominant mode analysis). Here the matrix P\mathbf{P} is generally not symmetric --- it is stochastic --- but the same iterative principle applies. The key insight is that the stationary distribution of a Markov chain is fundamentally an eigenvalue problem.

Alternatives. For very large sparse chains, direct methods (Gaussian elimination on (PTβˆ’I)Ο€=0(\mathbf{P}^T - \mathbf{I})\boldsymbol{\pi} = \mathbf{0} plus the normalisation constraint) or iterative solvers (GMRES, Gauss--Seidel) may be more efficient.

Quick Check

Consider the two-state Markov chain with transition matrix

P=(0.70.30.40.6).\mathbf{P} = \begin{pmatrix} 0.7 & 0.3 \\ 0.4 & 0.6 \end{pmatrix}.

What is the stationary probability Ο€1\pi_1 (the long-run fraction of time spent in state 1)?

Ο€1=3/7\pi_1 = 3/7

Ο€1=4/7\pi_1 = 4/7

Ο€1=1/2\pi_1 = 1/2

Ο€1=0.3\pi_1 = 0.3

Quick Check

An M/M/1 queue has arrival rate Ξ»=900\lambda = 900 packets/s and service rate ΞΌ=1000\mu = 1000 packets/s. What is the mean number of packets in the system LL?

L=9L = 9

L=0.9L = 0.9

L=10L = 10

L=90L = 90

Quick Check

Which of the following is required for a finite Markov chain to have a unique stationary distribution?

The chain must be irreducible

The chain must be aperiodic

The transition matrix must be symmetric

All entries of P\mathbf{P} must be strictly positive

Markov Chain

A stochastic process {Xn}\{X_n\} (discrete-time) or {X(t)}\{X(t)\} (continuous-time) satisfying the Markov (memoryless) property: conditioned on the present state, the future is independent of the past. Markov chains are the fundamental building block of queuing theory, protocol analysis, and finite-state channel models.

Transition Matrix

A row-stochastic matrix P\mathbf{P} whose (i,j)(i,j)-entry pij=P(Xn+1=j∣Xn=i)p_{ij} = P(X_{n+1} = j \mid X_n = i) gives the one-step transition probability from state ii to state jj. The nn-step transition probabilities are given by Pn\mathbf{P}^n.

Stationary Distribution

A probability vector Ο€\boldsymbol{\pi} satisfying Ο€TP=Ο€T\boldsymbol{\pi}^T \mathbf{P} = \boldsymbol{\pi}^T. It is the left eigenvector of P\mathbf{P} associated with eigenvalue 11. For an irreducible chain, the stationary distribution is unique and represents the long-run fraction of time spent in each state.

Irreducible

A Markov chain is irreducible if every state can be reached from every other state in a finite number of steps. Equivalently, the state space consists of a single communicating class. Irreducibility guarantees uniqueness of the stationary distribution (for finite chains).

M/M/1 Queue

A single-server queuing model with Poisson arrivals (rate Ξ»\lambda) and exponential service times (rate ΞΌ\mu). The stationary queue length distribution is geometric: Ο€k=(1βˆ’Ο)ρk\pi_k = (1-\rho)\rho^k where ρ=Ξ»/ΞΌ<1\rho = \lambda/\mu < 1. Key metrics: mean queue length L=ρ/(1βˆ’Ο)L = \rho/(1-\rho), mean delay W=1/(ΞΌβˆ’Ξ»)W = 1/(\mu - \lambda).

Common Mistake: Forgetting to Check ρ<1\rho < 1 for M/M/1 Stability

Mistake:

Applying the M/M/1 formulas L=ρ/(1βˆ’Ο)L = \rho/(1-\rho) and W=1/(ΞΌβˆ’Ξ»)W = 1/(\mu - \lambda) without first verifying that ρ<1\rho < 1.

When ρβ‰₯1\rho \geq 1 (i.e., Ξ»β‰₯ΞΌ\lambda \geq \mu), the queue has no stationary distribution: the queue length grows without bound, the mean delay is infinite, and the formulas produce negative or nonsensical values.

This error is common in homework and can also arise in system design when traffic spikes push the offered load above the service capacity.

Correction:

Always check stability first. Before computing any M/M/1 metric, verify:

ρ=λμ<1.\rho = \frac{\lambda}{\mu} < 1.

If ρβ‰₯1\rho \geq 1, the system is unstable and no steady-state analysis applies. In practice, if ρ\rho is close to 11 (e.g., ρ>0.9\rho > 0.9), the delay and queue length are extremely sensitive to small changes in Ξ»\lambda or ΞΌ\mu --- the system operates near the "knee" of the delay curve.

For ρβ‰₯1\rho \geq 1, one must either increase the service rate (faster link, more efficient coding), decrease the arrival rate (admission control, traffic shaping), or add more servers (M/M/cc queue).

Key Takeaway

The central message of this section in three points:

  1. Markov chains reduce dynamics to matrices. A complex system with many interacting states --- fading channels, protocol machines, queuing networks --- is fully characterised by its transition matrix P\mathbf{P}. All temporal behaviour is encoded in the powers Pn\mathbf{P}^n.

  2. The stationary distribution is an eigenvector. The long-run behaviour Ο€TP=Ο€T\boldsymbol{\pi}^T \mathbf{P} = \boldsymbol{\pi}^T is a left eigenvector equation with eigenvalue 11. Computing Ο€\boldsymbol{\pi} is exactly the eigenvalue problem from Chapter 1 applied to a stochastic matrix. The power method, QR algorithm, and spectral decomposition all carry over directly.

  3. Queuing theory = birth--death Markov chains. The M/M/1 queue produces closed-form delay and throughput formulas that every network engineer uses daily. The key parameter is the traffic intensity ρ=Ξ»/ΞΌ\rho = \lambda/\mu: as ρ→1\rho \to 1, delay blows up as 1/(1βˆ’Ο)1/(1-\rho) --- a fundamental limit that no clever protocol can circumvent.

Markov chains are thus a bridge between the linear algebra of Chapter 1 and the probability theory of Chapter 2, unified by the observation that steady-state analysis of dynamic systems is an eigenvalue problem.