References & Further Reading

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, MIT Press, 2018
The definitive RL textbook. Covers everything from bandits to policy gradients.
V. Mnih et al., Human-Level Control Through Deep Reinforcement Learning, Nature, 2015
The DQN paper: deep RL playing Atari from pixels.
J. Schulman et al., Proximal Policy Optimization Algorithms, arXiv:1707.06347, 2017
PPO: simple, stable policy gradient with clipped surrogate.
N. C. Luong et al., Applications of Deep Reinforcement Learning in Communications and Networking, IEEE Comm. Surveys, 2019
Survey of DRL for wireless: power control, scheduling, resource allocation.