Why Do LLM’s Have Context Limits? How Can We Increase the Context? ALiBi and Landmark Attention!

Описание к видео Why Do LLM’s Have Context Limits? How Can We Increase the Context? ALiBi and Landmark Attention!

In this video, we discuss large language models and why they have context length limits. We begin by explaining the significance and challenges of increasing the context length limit. The video offers a detailed explanation of computational concepts like Big O notation, its implications, and examples. It also clarifies how time and space complexity works with attention layers in large language models. We also discuss the role of token embeddings and positional encoding. Then we touch upon the challenges faced due to positional encoding and explain how it affects the context length of large language models.

We also introduce the solution for issues with positional encoding, called ALiBi, or Attention Layer with Linear Biases. The inner workings of Alibi and how it manages to increase token context length are explored in detail, giving a deeper understanding of the mechanism. We then walk through how computational memory complexity can be reduced through techniques like sparse attention, including methods like Extended Transformer Construction and BigBird. It explores how these techniques allow us to reduce computational complexity from quadratic to nearly linear, allowing us to increase the context length.

We then introduce Landmark attention, a recent technique to increase context length. We give a practical example explaining how Landmark attention works, emphasizing the significance of Landmark attention in the context of large language models and its potential to handle longer contexts efficiently.

0:00 Intro
0:20 Review of Complexity, Embeddings, Encodings, and Graphs
4:50 Why Context Limits Exist
8:14 ALiBi
10:33 Sparse Attention
13:22 Landmark Attention
19:30 Outro

#ContextLengthLimit #SparseAttention #ExtendedTransformerConstruction #BigBird #LandmarkAttention

Комментарии

Информация по комментариям в разработке