Reinforcement Learning via Q-Learning: Learning the Values of the Best Actions

Описание к видео Reinforcement Learning via Q-Learning: Learning the Values of the Best Actions

** Apologies for the low volume. Just turn it up **
The idea of Temporal Difference learning is introduced, by which an agent can learn state/action utilities from scratch. The specific Q-learning algorithm is discussed, by showing the rule it uses to update Q values, and by demoing its behavior in a grid world.

The program used in this video is part of the Pac-Man projects at: http://ai.berkeley.edu/project_overvi...
The specific project from which this program comes is available at this link: http://ai.berkeley.edu/reinforcement....

The grid world problem is from Artificial Intelligence: A Modern Approach, by Russell and Norvig

Комментарии

Информация по комментариям в разработке