Скачать или смотреть Lecture 11 - Function Approximation Methods|Reinforcement Learning Phase|Reasoning LLMs from Scratch

Lecture 11 - Function Approximation Methods|Reinforcement Learning Phase|Reasoning LLMs from Scratch

Скачать Lecture 11 - Function Approximation Methods|Reinforcement Learning Phase|Reasoning LLMs from Scratch бесплатно в качестве 4к (2к / 1080p)

У нас вы можете скачать бесплатно Lecture 11 - Function Approximation Methods|Reinforcement Learning Phase|Reasoning LLMs from Scratch или посмотреть видео с ютуба в максимальном доступном качестве.

Для скачивания выберите вариант из формы ниже:

Информация по загрузке:

Cкачать музыку Lecture 11 - Function Approximation Methods|Reinforcement Learning Phase|Reasoning LLMs from Scratch бесплатно в формате MP3:

Если иконки загрузки не отобразились, ПОЖАЛУЙСТА, НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если у вас возникли трудности с загрузкой, пожалуйста, свяжитесь с нами по контактам, указанным в нижней части страницы.
Спасибо за использование сервиса video2dn.com

Описание к видео Lecture 11 - Function Approximation Methods|Reinforcement Learning Phase|Reasoning LLMs from Scratch

So far, in the Reinforcement Learning Phase, we have looked at tabular methods for calculating the value functions. That is, the states and their values are represented in the form of tables.

In most practical problems, these methods are not useful, since the number of states are quite large. For example, the number of states in a game of chess are ~10^46.

From this lecture onwards, we will start to look at function approximate methods, use to calculate values of a certain states and then generalize to other states.

It is quite similar to supervised learning except:

(1) The Target is not known beforehand
(2) The Target is non-stationary

We will learn how to use a function to express the value function, and also how to use gradient descent to optimize this function.

We are now getting closer to understanding how reinforcement learning is used in language models. This lecture marks the beginning of this transition.

Комментарии

Информация по комментариям в разработке