Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math

Описание к видео Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math

Explanation of the paper Mamba: Linear-Time Sequence Modeling with Selective State Spaces

In this video I will be explaining Mamba, a new sequence modeling architecture that can compete with the Transformer. I will first start by introducing the various sequence modeling architectures (RNN, CNN and Transformer) and then deep dive into State Space Models. To fully understand State Space Models, we need to have some background in differential equations. That's why, I will provide a brief introduction to differential equations (in 5 minutes!) and then proceed to derive the recurrent formula and the convolutional formula from first principles. I will also prove mathematically (with the help of visual diagrams) why State Space Models can be run as a convolution. I will explain what is the HIPPO matrix and how it can help the model "memorize" the input history in a finite state.

In the second part of the video, I will explore Mamba and in particular the Selective Scan algorithm, but first explaining what is the scan operation and how it can be parallelized, and then showing how the authors further improved the algorithm with Kernel Fusion and activations recomputation. I will also provide a brief lesson on the memory hierarchy in the GPU and why some operations may be IO-bound.

In the last part of the video we will explore the architecture of Mamba and some performance results to compare it with the Transformer.

Slides PDF and Parallel Scan (excel file): https://github.com/hkproj/mamba-notes

Chapters
00:00:00 - Introduction
00:01:46 - Sequence modeling
00:07:12 - Differential equations (basics)
00:11:38 - State Space Models
00:13:53 - Discretization
00:23:08 - Recurrent computation
00:26:32 - Convolutional computation
00:34:18 - Skip connection term
00:35:21 - Multidimentional SSM
00:37:44 - The HIPPO theory
00:43:30 - The motivation behind Mamba
00:46:56 - Selective Scan algorithm
00:51:34 - The Scan operation
00:54:24 - Parallel Scan
00:57:20 - Innovations in Selective Scan
00:58:00 - GPU Memory Hierarchy
01:01:23 - Kernel Fusion
01:01:48 - Activations recomputation
01:06:48 - Mamba architecture
01:10:18 - Performance considerations
01:12:54 - Conclusion

Комментарии

Информация по комментариям в разработке