Exercises
ex-fsp-ch02-01
EasyA fair six-sided die is rolled. Let (even) and (greater than 2). Compute and .
Identify and apply the definition directly.
.
Identify the intersection
, so . , .
Apply definition
\mathbb{P}(A \mid B) = \mathbb{P}(A)AB1/2 \times 2/3 = 1/3 = \mathbb{P}(A \cap B)$).
ex-fsp-ch02-02
EasyAn urn contains 4 red balls and 6 blue balls. Two balls are drawn without replacement. Compute the probability that the second ball is red, using the law of total probability (condition on the first draw).
Let = 'first ball is red' and = 'second ball is red'.
forms a partition of .
Setup the partition
, . , .
Apply total probability
2/5 = 4/10$: by symmetry, each ball is equally likely to be red regardless of draw position.
ex-fsp-ch02-03
EasyVerify that the conditional probability mapping for a fixed with satisfies the three Kolmogorov axioms.
Check: (1) , (2) , (3) countable additivity.
Non-negativity
since and .
Normalization
.
Countable additivity
For pairwise disjoint events :
ex-fsp-ch02-04
EasyA fair coin is tossed 4 times. Compute the probability of exactly 3 heads using the binomial formula. Then verify by listing all outcomes.
There are equally likely outcomes.
.
Binomial formula
.
Direct enumeration
Outcomes with exactly 3 heads: HHHT, HHTH, HTHH, THHH. That is 4 outcomes out of 16 equally likely outcomes. . β
ex-fsp-ch02-05
EasyIn a Bernoulli sequence with success probability , compute the probability that the first success occurs on trial 5. Also compute the mean waiting time.
Use the geometric distribution PMF.
Geometric PMF
.
Mean
trials.
ex-fsp-ch02-06
EasyProve that if and are independent, then and are also independent.
Write .
Decompose
, with the two pieces disjoint. So .
Substitute
Since : .
ex-fsp-ch02-07
MediumA binary symmetric channel (BSC) with crossover probability is used to transmit with prior probabilities and . (a) Derive the MAP decision rule. (b) Show that for equal priors (), MAP coincides with ML. (c) Compute the posterior for general .
MAP rule: .
Use Bayes: .
BSC channel model
, . , .
(a) MAP rule given $Y=0$
Decide if , i.e., , which gives , equivalently ...
More cleanly: decide iff , i.e., iff . For the MAP detector decides (output equals received bit).
(b) Equal priors
When , Bayes' theorem gives since the prior is constant. MAP ML: choose the maximizing the likelihood .
(c) Posterior
\pi_0 = 1/2\mathbb{P}(X=0 \mid Y=0) = (1-\epsilon)/(1) = 1-\epsilon$.
ex-fsp-ch02-08
MediumLet , , be events with . Show that if , then .
Conditional probability given is a probability measure β apply the same argument as for marginal independence.
Use that $\mathbb{P}(\cdot \mid C)$ is a probability measure
Let . By assumption . We want to show .
Apply the marginal result
, so . So .
ex-fsp-ch02-09
MediumA manufacturing plant has two machines, A and B. Machine A runs on 40% of production days; machine B runs on all production days. Machine A produces a defective item with probability 0.01; machine B produces a defective item with probability 0.05. On a day when machine A is running, both machines produce items.
(a) Compute the probability that a randomly selected item is defective. (b) Given that an item is defective, compute the probability it came from machine A.
Let = defective, = item from machine A, = item from machine B. Think of the population of all items.
Machine B produces 100% of items; machine A produces an additional item on 40% of days.
Model the population
On days machine A runs (40% of days): two items produced, one by each machine. On days only B runs (60% of days): one item produced. In a population of 140 items per 100 days: 40 from A, 100 from B. So and .
(a) Total probability
$
(b) Posterior via Bayes
$ Despite machine A having a lower defect rate, it contributes only about 7.4% of defective items because it produces fewer items.
ex-fsp-ch02-10
MediumIn a Bernoulli sequence with parameter , let be the waiting time until the first success. Prove the memoryless property: using the definition of conditional probability and the product rule.
The event means the first trials are all failures.
.
Compute tail probabilities
= "first trials are all failures" = . By independence: .
Apply conditional probability
, so .
ex-fsp-ch02-11
MediumConstruct three events , , on a finite sample space such that (a) and are independent, (b) and are independent, (c) and are independent, but (d) , , are NOT mutually independent. (This is a concrete example showing pairwise mutual.)
Try with uniform probability and define the events to mirror the two-coin example from Section 2.3.
Define the events
with . , , . .
Pairwise independence
, . β , . β , . β
Failure of mutual independence
, . But . So , , are not mutually independent.
ex-fsp-ch02-12
MediumLet be a Markov chain. Prove that the chain is reversible: also holds.
Show β this is equivalent to .
The relation is symmetric in and .
Markov condition implies $A \perp C \mid B$
From the proof in Theorem TMarkov Chain Implies Conditional Independence: .
Symmetry of conditional independence
The condition is symmetric in and : it also reads . This is the Markov condition for .
ex-fsp-ch02-13
MediumIn a BPSK system, transmit symbols with equal priors. The received signal is where . Using Bayes' theorem, derive the MAP decision rule and show it equals the minimum distance decoder .
The likelihood is .
With equal priors, MAP ML.
Posterior via Bayes
. Similarly for .
MAP rule
Decide if : So , the minimum distance decoder.
ex-fsp-ch02-14
MediumDerive the PMF of the negative binomial distribution from first principles: the waiting time until the -th success in a Bernoulli sequence.
The event means: exactly successes in the first trials, AND the -th trial is a success.
Decompose the event
= {exactly successes in trials } {trial is a success}.
Compute probabilities
By the binomial formula, the first event has probability . The second event (trial is a success) has probability , and is independent of all earlier trials.
Multiply
$
ex-fsp-ch02-15
HardLet be independent events with . Show that: (a) . (b) . (This is the Borel-Cantelli lemma completed with the independence condition.)
Part (a) is the first Borel-Cantelli lemma from Chapter 1 β it does not need independence.
For part (b): compute .
Use the bound for .
(a) First Borel-Cantelli (recap)
By the first Borel-Cantelli lemma (Chapter 1): if , then . This holds without the independence assumption.
(b) Show complement has probability 0
. Its complement is . By continuity of probability, it suffices to show for each .
Use independence and the exponential bound
By independence and : Taking : since (the tail of a divergent series is still infinite), the exponential . So .
ex-fsp-ch02-16
HardThree radar sensors independently detect a target, each with detection probability and false alarm probability . The target is present with prior . A fusion center receives the binary decisions from the three sensors.
(a) Using Bayes' theorem, derive the posterior assuming sensors report independently given the target state.
(b) Show that the fusion rule reduces to: "decide target present if " for some threshold (a simple vote). What is ?
Let = 'target present', = 'no target'. Apply Bayes with the product likelihood.
Take the log of the posterior ratio and group terms.
(a) Posterior
Let (number of "yes" votes) and . By conditional independence of sensors given the target state:
(b) Threshold rule
Decide iff , i.e., iff numerator denominator: Taking and rearranging (using , so and ): This is a threshold for some determined by the priors and channel parameters. For equal priors () and equal log-likelihood ratio per bit, , giving the majority vote rule : decide target present if 2 or more sensors say yes.
ex-fsp-ch02-17
HardA memoryless binary source emits i.i.d. bits with . (a) Compute for any sequence . (b) Show that for long sequences, almost all probability mass is concentrated on sequences with empirical frequency of 1's close to . (Hint: use the law of large numbers.) (c) How many such "typical" sequences are there, approximately?
(a) Independence gives a product formula.
(b) By LLN, a.s.
(c) Count sequences with ones using Stirling's approximation.
(a) Probability of a sequence
By independence: where is the number of ones.
(b) LLN concentration
By the weak law of large numbers (Chapter 11), . For any : . Most probability mass concentrates on sequences with .
(c) Number of typical sequences
Sequences with exactly ones each have probability . There are such sequences. By Stirling: where is the binary entropy. This is the seed of the source coding theorem.
ex-fsp-ch02-18
HardLet , , be events with . Prove: does NOT imply (marginal independence). Give an explicit counterexample, then explain the "explaining away" phenomenon.
Try a Bayesian network with and as causes, as their common effect.
Counterexample
Let with uniform probability . (first coordinate is 1), (second coordinate is 1), (exactly one coordinate is 1).
Marginal independence
, . So . β
Conditional dependence given $C$
. . , . . So and are NOT conditionally independent given .
Explaining away
is a common effect of and . Once we observe (one of two possible causes is active), knowing one cause is active () tells us the other is not (). This "explaining away" makes and negatively correlated given , even though they are independent marginally.
ex-fsp-ch02-19
ChallengeA sequence of events satisfies . Without assuming independence, can we conclude ?
(a) Give a simple counterexample showing the answer is no without independence. (b) State the additional assumption (beyond ) under which a positive-probability conclusion CAN be drawn. (Hint: look up the Kochen-Stone lemma.)
(a) Consider events with strong positive correlation β e.g., all for a fixed event with .
(b) The Kochen-Stone lemma uses the Cauchy-Schwarz inequality on .
(a) Counterexample
Let all where . Then and , so . But this example is not a counterexample β it gives positive probability. Let us try: , all for . Then .
Better counterexample: are nested and decreasing with . Then but can have probability 0 if the sets shrink to .
(b) Kochen-Stone lemma
Kochen-Stone (1964): If and then . This condition controls the level of correlation between events. The proof uses Cauchy-Schwarz applied to .
ex-fsp-ch02-20
ChallengeA secret key generation protocol proceeds as follows. Alice and Bob each observe i.i.d. binary symbols: Alice observes , Bob observes , where are i.i.d. copies of with and (a BSC with crossover ).
(a) Using the chain rule and independence, compute . (b) Compute (probability of complete agreement). (c) Using the binomial distribution, show that the number of positions where concentrates near for large .
(a) Use the i.i.d. product factorization.
(b) The number of disagreements is .
(a) Joint probability of sequences
By independence: . Each factor equals .
(b) Probability of complete agreement
for each . By independence: . For , : .
(c) Concentration of disagreements
Let . The are i.i.d. Bernoulli, so with mean and variance . By Chebyshev (Chapter 10): . The disagreement fraction concentrates near .