RoPE (Rotary positional embeddings) explained: The positional workhorse of modern LLMs

Описание к видео RoPE (Rotary positional embeddings) explained: The positional workhorse of modern LLMs

Unlike sinusoidal embeddings, RoPE are well behaved and more resilient to predictions exceeding the training sequence length. Modern LLMs have already steered away from sinusoidal embeddings for better alternatives like RoPE. Stay with me in the video and learn about what's wrong with sinusoidal embeddings, the intuition or RoPE and how RoPE works.

Original Transformer paper: https://arxiv.org/pdf/1706.03762.pdf
RoPE paper: https://arxiv.org/pdf/2104.09864.pdf
Using interpolation for RoPE: https://arxiv.org/pdf/2306.15595.pdf

0:00 - Introduction
1:06 - Attention computation
1:51 - Token and positional similarity
2:52 - Vector view of query and key
4:52 - Sinusoidal embeddings
5:53 - Problem with sinusiodal embeddings
6:34 - Conversational view
8:50 - Rope embeddings
10:20 - Rope beyond 2D
12:36 - Changes to the equations
13:00 - Conclusion

Комментарии

Информация по комментариям в разработке