Exercises
ex-sp-ch32-01
EasyCreate a Gymnasium environment and run 100 episodes with random actions. Plot cumulative rewards.
ex-sp-ch32-02
EasyImplement tabular Q-learning for a 4x4 gridworld. Visualise learned Q-values.
ex-sp-ch32-03
EasyImplement epsilon-greedy exploration with epsilon decaying from 1.0 to 0.01.
ex-sp-ch32-04
EasyImplement a replay buffer with fixed capacity and uniform sampling.
ex-sp-ch32-05
EasyCompute discounted returns for a trajectory with rewards [1, 0, -1, 10] and gamma=0.99.
ex-sp-ch32-06
MediumImplement DQN with experience replay and target network for CartPole-v1.
ex-sp-ch32-07
MediumImplement Double DQN and compare Q-value estimates to standard DQN.
ex-sp-ch32-08
MediumImplement REINFORCE for CartPole. Plot learning curve over 500 episodes.
ex-sp-ch32-09
MediumAdd a learned baseline (value function) to REINFORCE and compare variance.
ex-sp-ch32-10
MediumImplement a custom Gym environment for multi-user power control.
ex-sp-ch32-11
HardImplement PPO with clipped surrogate objective and GAE advantage estimation.
ex-sp-ch32-12
HardTrain DQN for power control in a 4-user interference channel. Compare to max-SINR heuristic.
ex-sp-ch32-13
HardImplement prioritised experience replay and compare to uniform replay.
ex-sp-ch32-14
ChallengeTrain a PPO agent for multi-user scheduling that optimises proportional fairness.
ex-sp-ch32-15
ChallengeImplement multi-agent RL for distributed power control where each BS is an independent agent.