Value-Based Methods

Definition:

DQN Loss Function

L=E(s,a,r,s)D[(r+γmaxaQθˉ(s,a)Qθ(s,a))2]L = \mathbb{E}_{(s,a,r,s') \sim \mathcal{D}}\left[\left(r + \gamma \max_{a'} Q_{\bar{\theta}}(s', a') - Q_\theta(s, a)\right)^2\right]wherewhere\mathcal{D}isthereplaybufferandis the replay buffer andQ_{\bar{\theta}}$ is the target network.

Definition:

Double DQN

Standard DQN overestimates Q-values. Double DQN decouples action selection from evaluation:

y=r+γQθˉ(s,argmaxaQθ(s,a))y = r + \gamma Q_{\bar{\theta}}(s', \arg\max_{a'} Q_\theta(s', a'))