LLM Training & Reinforcement Learning from Google Engineer | SFT + RLHF | PPO vs GRPO vs DPO

Описание к видео LLM Training & Reinforcement Learning from Google Engineer | SFT + RLHF | PPO vs GRPO vs DPO

As a regular normal swe, I want to share the most typical LLM training process nowadays (Pre-Training + SFT + RLHF), along with the basics of Reinforcement Learning and some of the most popular policy optimization algorithms.

This will also be a prequel of DeepSeek R1 intro, stay tuned!

Related content:
What is AI Alignment:    • What is AI Alignment | Why is Alignme...  
Advanced Prompting:    • LLM Prompt Intro | Advanced Prompting...  

#llm
#openai
#google
#ai
#reinforcementlearning
#machinelearning

0:00 Intro
0:25 Modern LLM Training Flow
1:00 Pre-Training
1:47 Post-Training
4:46 SFT
6:09 Reinforcement Learning
10:31 Policy Gradient
12:08 PPO
15:30 GRPO
16:36 DPO
20:01 Post-Training Example Flow

Комментарии

Информация по комментариям в разработке