LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU

Описание к видео LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU

Full explanation of the LLaMA 1 and LLaMA 2 model from Meta, including Rotary Positional Embeddings, RMS Normalization, Multi-Query Attention, KV-Cache, Grouped Multi-Query Attention (GQA), the SwiGLU Activation function and more!

I also review the Transformer concepts that are needed to understand LLaMA and everything is visually explained!

As always, the PDF slides are freely available on GitHub: https://github.com/hkproj/pytorch-lla...

Chapters
00:00:00 - Introduction
00:02:20 - Transformer vs LLaMA
00:05:20 - LLaMA 1
00:06:22 - LLaMA 2
00:06:59 - Input Embeddings
00:08:52 - Normalization & RMSNorm
00:24:31 - Rotary Positional Embeddings
00:37:19 - Review of Self-Attention
00:40:22 - KV Cache
00:54:00 - Grouped Multi-Query Attention
01:04:07 - SwiGLU Activation function

Комментарии

Информация по комментариям в разработке