Policy Gradient Methods
Definition: REINFORCE Algorithm
REINFORCE Algorithm
G_t^it$.
Definition: Proximal Policy Optimization (PPO)
Proximal Policy Optimization (PPO)
PPO clips the policy ratio to prevent large updates:
where .
REINFORCE Algorithm
G_t^it$.
Proximal Policy Optimization (PPO)
PPO clips the policy ratio to prevent large updates:
where .