Proximal Policy Optimization Implementation: 8 Details for Continuous Actions (3/3)

Описание к видео Proximal Policy Optimization Implementation: 8 Details for Continuous Actions (3/3)

Proximal Policy Optimization (PPO) is one of the most popular reinforcement learning algorithms, and works with a variety of domains from robotics control to Atari games to chip design

In this video, we dive deep into 8 implementation details for continuous action spaces and build from the PPO implementation from our first video (   • Part 1 of 3 — Proximal Policy Optimiz...  ).

---

Source code: https://github.com/vwxyzjn/ppo-implem...
Related blog post: https://iclr-blog-track.github.io/202...
Background music: Flutes Will Chill — https://artlist.io/song/48722/flutes-...
Homework solution: https://wandb.ai/cleanrl/cleanrl.benc...

---

0:00 Introduction
0:41 Setup
1:30 1. Continuous actions via normal distributions
2:46 2. State-independent log standard deviation
3:50 3. Independent action components
4:37 Note on MultiDiscrete action space
5:36 Match hyperparameters
6:14 Environment preprocessing
6:33 4. Action clipped to the valid range
7:02 5. Observation normalization
7:54 6. Observation clipping
8:10 7. Reward normalization
9:00 8. Reward clipping
9:29 Experiment results
10:49 Related work
11:10 Summary of code change
11:58 Homework

Комментарии

Информация по комментариям в разработке