GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Описание к видео GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Large language models (LLMs) typically demand substantial GPU memory, rendering training impractical on a single consumer GPU, especially for a 7-billion-parameter model that necessitates 58GB of memory. In response, the GaLore paper introduces an innovative strategy that projects gradients into a low-rank space, enabling the model to fit within the constraints of a single GPU. Remarkably, this approach not only addresses the memory challenge but also outperforms other parameter-efficient tuning methods like LoRA, delivering superior results.

paper link: https://arxiv.org/abs/2403.03507

Table of Content:
00:00 Intro
02:17 LoRA
03:18 Limitations of LoRA
05:58 GaLore
18:18 Adam with GaLore
21:01 8-Bit Optimizers
22:50 LOMO
24:48 GaLore vs LoRA
26:20 Rank vs Perplexity
27:07 results

Icon made by Freepik from flaticon.com

Комментарии

Информация по комментариям в разработке