Direct Preference Optimization (DPO)

Описание к видео Direct Preference Optimization (DPO)

Get the Dataset: https://huggingface.co/datasets/Treli...
Get the DPO Script + Dataset: https://buy.stripe.com/cN2cNyg8t0zp2g...
Get the full Advanced Fine Tuning Repo: https://trelis.com/advanced-fine-tuni...

Resources:
- Google Slides Presentation: https://tinyurl.com/mtd2ehnp
- Anthropic Helpful and Harmless Dataset: https://huggingface.co/datasets/Anthr...
- Ultrachat dataset: https://huggingface.co/datasets/Huggi...
- DPO Trainer: https://huggingface.co/docs/trl/dpo_t...
- Runpod Affiliate link (helps support the channel): https://runpod.io?ref=jmfkcdio

Chapters:
0:00 Direct Preference Optimisation
0:37 Video Overview
1:37 How does “normal” fine-tuning work?
3:41 How does DPO work?
8:31 DPO Datasets: UltraChat
10:59 DPO Datasets: Helpful and Harmless
14:00 DPO vs RLHF
15:25 Required datasets and SFT models
18:26 DPO Notebook Run through
28:22 DPO Evaluation Results
31:15 Weights and Biases Results Interpretation
35:16 Runpod Setup for 1 epoch Training Run
41:58 Resources

Комментарии

Информация по комментариям в разработке