Multi GPU Fine tuning with DDP and FSDP

Описание к видео Multi GPU Fine tuning with DDP and FSDP

➡️ Get Life-time Access to the complete scripts (and future improvements): https://trelis.com/advanced-fine-tuni...
➡️ Multi-GPU test scripts (so you can follow-along): https://github.com/TrelisResearch/ins...
➡️ Runpod one-click fine-tuning template (affiliate link, supports Trelis' channel): https://runpod.io/console/deploy?temp...
➡️ Newsletter: https://blog.Trelis.com
➡️ Resources/Support/Discord: https://Trelis.com/About

VIDEO RESOURCES:
- Slides: https://docs.google.com/presentation/...
- Pytorch DDP: https://pytorch.org/tutorials/interme...
- Pytorch FSDP: https://pytorch.org/tutorials/interme...
- HuggingFace PEFT w/ FSDP: https://github.com/huggingface/peft/t...

TIMESTAMPS:
0:00 Multi-GPU Distributed Training
0:24 Video Overview
1:18 Choosing a GPU setup
1:59 Understanding VRAM requirements (in detail)
4:40 Understanding Optimisation and Gradient Descent
7:25 How does the Adam optimizer work?
11:16 How the Adam optimiser affects VRAM requirements
13:43 Effect of activations, model context and batch size on VRAM
14:40 Tip for GPU setup - start with a small batch size
15:35 Reducing VRAM with LoRA and quantisation
19:27 Quality trade-offs with quantisation and LoRA
20:36 Choosing between MP, DDP or FSDP
21:12 Distributed Data Parallel
24:40 Model Parallel and Fully Sharded Data Parallel (FSDP)
29:50 Trade-offs with DDP and FSDP
31:27 How does DeepSpeed compare to FSDP
33:33 Using FSDP and DeepSpeed with Accelerate
36:59 Code examples for MP, DDP and FSDP
38:31 Using SSH with rented GPUs (Runpod)
42:35 Installation
44:11 (slight detour) Setting a username and email for GitHub
44:41 Basic Model Parallel (MP) fine-tuning script
48:28 Fine-tuning script with Distributed Data Parallel (DDP)
52:26 Fine-tuning script with Fully Shaded Data Parallel (FSDP)
55:12 Running ‘accelerate config’ for FSDP
59:55 Saving a model after FSDP fine-tuning
1:00:58 Quick demo of a complete FSDP LoRA training script
1:05:15 Quick demo of an inference script after training
1:07:06 Wrap up

Комментарии

Информация по комментариям в разработке