Hardware-aware Algorithms for Sequence Modeling - Tri Dao | Stanford MLSys #87

Скачать Hardware-aware Algorithms for Sequence Modeling - Tri Dao | Stanford MLSys #87 бесплатно в качестве 4к (2к / 1080p)

У нас вы можете скачать бесплатно Hardware-aware Algorithms for Sequence Modeling - Tri Dao | Stanford MLSys #87 или посмотреть видео с ютуба в максимальном доступном качестве.

Для скачивания выберите вариант из формы ниже:

Информация по загрузке:

Cкачать музыку Hardware-aware Algorithms for Sequence Modeling - Tri Dao | Stanford MLSys #87 бесплатно в формате MP3:

Если иконки загрузки не отобразились, ПОЖАЛУЙСТА, НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если у вас возникли трудности с загрузкой, пожалуйста, свяжитесь с нами по контактам, указанным в нижней части страницы.
Спасибо за использование сервиса video2dn.com

Описание к видео Hardware-aware Algorithms for Sequence Modeling - Tri Dao | Stanford MLSys #87

Episode 87 of the Stanford MLSys Seminar Series!

Hardware-aware Algorithms for Sequence Modeling
Speaker: Tri Dao

Abstract:
Transformers are slow and memory-hungry on long sequences, since the time and memory complexity of self-attention are quadratic in sequence length.
In the first half, we describe attention approximation algorithms using sparsity and low-rank structures, as well as algorithms (e.g. FlashAttention) to achieve fast and memory-efficient exact attention. By making attention algorithms IO-aware (accounting for reads and writes between levels of GPU memory) one can speed up attention by 4-8x, enabling 4-16x longer context in Transformers and yielding higher quality models. We will also describe optimizations for long-context LLM inference, leading to 2-8x faster end-to-end inference time.
In the second half, we describe recent progress on subquadratic-time architectures such as RNNs, gated convolution, and structured state space models (SSMs). We identify that a key weakness of such models is their inability to perform content-based reasoning, and propose a selection mechanism to address this shortcoming. Though this change prevents the use of efficient convolutions, we design a hardware-aware parallel algorithm in recurrent mode. We integrate these selective SSMs into a simplified end-to-end neural network architecture (Mamba) without attention or even MLP blocks. Mamba matches or exceeds the performance of strong modern Transformers on language modeling.

Bio:
Tri Dao is an incoming Assistant Professor at Princeton University and is currently chief scientist of Together AI. He completed his PhD in Computer Science at Stanford, co-advised by Christopher Ré and Stefano Ermon. He works at the intersection of machine learning and systems, and his research interests include sequence models with long-range memory and structured matrices for compact deep learning models. His work has received the ICML 2022 Outstanding paper runner-up award.

--

Stanford MLSys Seminar hosts: Avanika Narayan, Benjamin Spector, Michael Zhang

Twitter:
  / avanika15
  / bfspector
  / mzhangio

--

Check out our website for the schedule: http://mlsys.stanford.edu
Join our mailing list to get weekly updates: https://groups.google.com/forum/#!for...

#machinelearning #ai #artificialintelligence #systems #mlsys #computerscience #stanford