Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained

videosharingcamera phonevideo phonefreeupload

Скачать Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained бесплатно в качестве 4к (2к / 1080p)

У нас вы можете скачать бесплатно Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained или посмотреть видео с ютуба в максимальном доступном качестве.

Для скачивания выберите вариант из формы ниже:

Cкачать музыку Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained бесплатно в формате MP3:

Если иконки загрузки не отобразились, ПОЖАЛУЙСТА, НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если у вас возникли трудности с загрузкой, пожалуйста, свяжитесь с нами по контактам, указанным в нижней части страницы.
Спасибо за использование сервиса video2dn.com

Описание к видео Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained

Direct Preference Optimization (DPO) to finetune LLMs without reinforcement learning. DPO was one of the two Outstanding Main Track Runner-Up papers.
➡️ AI Coffee Break Merch! 🛍️ https://aicoffeebreak.creator-spring....

📜 Rafailov, Rafael, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, and Chelsea Finn. "Direct preference optimization: Your language model is secretly a reward model." arXiv preprint arXiv:2305.18290 (2023). https://arxiv.org/abs/2305.18290

Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏
Dres. Trost GbR, Siltax, Vignesh Valliappan, ‪@Mutual_Information‬ , Kshitij

Outline:
00:00 DPO motivation
00:53 Finetuning with human feedback
01:39 RLHF explained
03:05 DPO explained
04:24 Why Reinforcement Learning in the first place?
05:58 Shortcomings
06:50 Results

▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to help with our Coffee Bean production! ☕
Patreon:   / aicoffeebreak
Ko-fi: https://ko-fi.com/aicoffeebreak
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀

🔗 Links:
AICoffeeBreakQuiz:    / aicoffeebreak
Twitter:   / aicoffeebreak
Reddit:   / aicoffeebreak
YouTube:    / aicoffeebreak

#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research

Video editing: Nils Trost
Music 🎵 : Ice & Fire - King Canyon