Mamba, Mamba-2 and Post-Transformer Architectures for Generative AI with Albert Gu - 693

Описание к видео Mamba, Mamba-2 and Post-Transformer Architectures for Generative AI with Albert Gu - 693

Today, we're joined by Albert Gu, assistant professor at Carnegie Mellon University, to discuss his research on post-transformer architectures for multi-modal foundation models, with a focus on state-space models in general and Albert’s recent Mamba - https://arxiv.org/abs/2312.00752 and Mamba-2 - https://arxiv.org/abs/2405.21060 papers in particular. We dig into the efficiency of the attention mechanism and its limitations in handling high-resolution perceptual modalities, and the strengths and weaknesses of transformer architectures relative to alternatives for various tasks. We dig into the role of tokenization and patching in transformer pipelines, emphasizing how abstraction and semantic relationships between tokens underpin the model's effectiveness, and explore how this relates to the debate between handcrafted pipelines versus end-to-end architectures in machine learning. Additionally, we touch on the evolving landscape of hybrid models which incorporate elements of attention and state, the significance of state update mechanisms in model adaptability and learning efficiency, and the contribution and adoption of state-space models like Mamba and Mamba-2 in academia and industry. Lastly, Albert shares his vision for advancing foundation models across diverse modalities and applications.

🎧 / 🎥 Listen or watch the full episode on our page: https://twimlai.com/go/693.

🔔 Subscribe to our channel for more great content just like this: https://youtube.com/twimlai?sub_confi...


🗣️ CONNECT WITH US!
===============================
Subscribe to the TWIML AI Podcast: https://twimlai.com/podcast/twimlai/
Follow us on Twitter:   / twimlai  
Follow us on LinkedIn:   / twimlai  
Join our Slack Community: https://twimlai.com/community/
Subscribe to our newsletter: https://twimlai.com/newsletter/
Want to get in touch? Send us a message: https://twimlai.com/contact/


📖 CHAPTERS
===============================
00:00 - Introduction
05:36 - Post transformer approaches
07:46 - Attention
10:54 - Tokens
14:25 - Transformers
19:00 - Convolutions
22:04 - Recurrent models
24:36 - Mamba and state-space models
42:35 - Performance on multimodal data
46:24 - Handcrafted pipelines vs. end-to-end architectures
51:52 - Future directions


🔗 LINKS & RESOURCES
===============================
Mamba: Linear-Time Sequence Modeling with Selective State Spaces - https://arxiv.org/abs/2312.00752
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality - https://arxiv.org/abs/2405.21060
Efficiently Modeling Long Sequences with Structured State Spaces - https://arxiv.org/abs/2111.00396
Improving the Gating Mechanism of Recurrent Neural Networks - https://arxiv.org/abs/1910.09890
CKConv: Continuous Kernel Convolution For Sequential Data - https://arxiv.org/abs/2102.02611
On the Parameterization and Initialization of Diagonal State Space Models - https://arxiv.org/abs/2206.11893
Long Context Language Models and their Biological Applications with Eric Nguyen - 690 - https://twimlai.com/podcast/twimlai/l...
Language Modeling With State Space Models with Dan Fu - 630 - https://twimlai.com/podcast/twimlai/l...


📸 Camera: https://amzn.to/3TQ3zsg
🎙️Microphone: https://amzn.to/3t5zXeV
🚦Lights: https://amzn.to/3TQlX49
🎛️ Audio Interface: https://amzn.to/3TVFAIq
🎚️ Stream Deck: https://amzn.to/3zzm7F5

Комментарии

Информация по комментариям в разработке