Naoshige Uchida - Dissociating reward prediction error and value in dopamine signals - ViDA 2020

Описание к видео Naoshige Uchida - Dissociating reward prediction error and value in dopamine signals - ViDA 2020

Naoshige Uchida
Harvard University
05-20-2020
Dissociating reward prediction error and value in dopamine signals
Previous studies have revealed an exceptional correspondence between the activity of midbrain dopamine neurons and a ‘teaching signal’ in reinforcement learning algorithms. In particular, the reward prediction error (RPE) used in the temporal difference (TD) learning algorithm captures aspects of phasic dopamine responses. However, this idea has been challenged by recent observations that dopamine signals ramp up gradually over the timescale of seconds as animals approach a reward location. It has been argued that these slow fluctuations of dopamine are inconsistent with the RPE model, and instead represent the state value, which gradually increases toward a reward location. Whether these slowly fluctuating dopamine signals represent value or RPE, and under what conditions a dopamine ramp occurs, remain elusive. As originally formulated, the TD RPE approximates the derivative of the value function. Based on this core property, we developed a set of experimental paradigms that dissociate RPE from value. We employed visual virtual reality in mice to manipulate the location of the animal and the speed of scene movement independent of the animal’s locomotion. We found that the manipulation of scene movement – teleport and speed manipulations – caused dopamine responses in the ventral striatum that were consistent with TD RPEs but inconsistent with state values. Furthermore, we found that a more abstract, non-navigational stimulus that indicates temporal proximity to reward is sufficient to cause a dopamine ramp. These results indicate that the RPE account of dopamine responses can be extended to slowly fluctuating dopamine signals in addition to phasic dopamine responses, and support the previously untested central tenet of TD RPEs that dopamine neurons signal RPEs through a derivative-like computation over value on a moment-by-moment basis.

Комментарии

Информация по комментариям в разработке