Boost Fine-Tuning Performance of LLM: Optimal Architecture w/ PEFT LoRA Adapter-Tuning on Your GPU

Описание к видео Boost Fine-Tuning Performance of LLM: Optimal Architecture w/ PEFT LoRA Adapter-Tuning on Your GPU

Not enough memory to fine-tune your Language Model: T5, GPT, OPT, BLOOM, Llama, ..? Optimize your model architecture to the MAX for optimal fine-tuning (adapter-tuning) for faster, cheaper and MIN memory on your GPU!

LLM Fine-Tuning on a Budget: Supercharge Your Language Model on a Normal GPU with PEFT, LoRA, and Adapter-Tuning

Real-time code demo with HuggingFace PEFT and LoRA plus INT8 quantization of LLM for adapter-tuning (since main weights tensors of PLM are frozen).
PLM = Pre-trained Large Language Model (smile)

A 30 min code tutorial for Parameter-Efficient Fine-tuning your LLM (PEFT) on consumer GPU (... less than 80GB).

Low-rank adaptation (LoRA) for LLM adapter-tuning applied to INT8 quantized LLM models, with frozen weights of the pre-trained Language Model (PLM) and a tiny set of layer specific trainable adapter (LRA) parameters, in a complete code tutorial in PyTorch2.

LLM was pretrained using a causal language modeling (CAUSAL_LM) objective and now adapter-tuned for a downstream specific task on a public data set available on Huggingface.

PEFT is just an adapter-tuned method, since not (!) all model weights are updated, like in the classical fine-tuning method (more expensive, more time, ..).

Your Jupyter notebook (all rights w/ the authors) to follow along:
https://colab.research.google.com/dri...

Your HuggingFace blog post to read (all rights w/ the authors):
https://huggingface.co/blog/peft

00:00 PEFT source code (LoRA, pre-fix tuning,..)
01:53 Llama - LoRA fine-tuning code
04:39 Create PEFT - LoRA Model (Seq2Seq)
08.29 LoRA configuration
10:05 Trainable parameters of PEFT - LoRA model
13:09 get_peft_model
14:21 PEFT - LoRA - 8bit model of OPT 6.7B LLM
15:25 load_in_8bit
16:30 INT8 Quantization explained
18:08 Fine-tune a quantized model
22:56 bfloat16 and XLA compiler PyTorch 2.0
25:20 Freeze all pre-trained layer weight tensors
27:52 Adapter-tuning of PEFT - LoRA model
30:50 Save tuned PEFT - LoRA Adapter weights
31:30 Run inference of new PEFT - LoRA adapter - tuned LLM
32:57 Load your Adapter-tuned PEFT - LoRA model



#ai
#PEFT
#LoRA
#datascience
#finetuning
#finetune
#machinelearning
#datascience
#naturallanguageprocessing

Комментарии

Информация по комментариям в разработке