Monarch Mixer: Making Foundation Models More Efficient - Dan Fu | Stanford MLSys #86

Описание к видео Monarch Mixer: Making Foundation Models More Efficient - Dan Fu | Stanford MLSys #86

Episode 86 of the Stanford MLSys Seminar Series!

Monarch Mixer: Making Foundation Models More Efficient
Speaker: Dan Fu

Abstract:
Machine learning models are increasingly being scaled in both sequence length and model dimension to reach longer contexts and better performance. However, existing architectures like Transformers scale quadratically along both these axes. In this talk I'll discuss Monarch Mixer (M2), a new architecture that uses the same sub-quadratic primitive along both sequence length and model dimension. M2 mixes information along the sequence and model dimensions using Monarch matrices, a simple class of expressive structured matrices that captures many linear transforms, achieves high hardware efficiency on GPUs, and scales sub-quadratically.

Bio:
Dan Fu is a PhD student in the Computer Science Department at Stanford University, where he is co-advised by Christopher Ré and Kayvon Fatahalian. His research is at the intersection of systems and machine learning and focuses on developing algorithms and architectures to make machine learning more efficient.

Monarch Mixer arXiv: https://arxiv.org/abs/2310.12109
FlashFFTConv arXiv: https://arxiv.org/abs/2311.05908

--

Stanford MLSys Seminar hosts: Simran Arora, Dan Fu

Twitter:
  / simran_s_arora  
  / realdanfu​  

--

Check out our website for the schedule: http://mlsys.stanford.edu
Join our mailing list to get weekly updates: https://groups.google.com/forum/#!for...

#machinelearning #ai #artificialintelligence #systems #mlsys #computerscience #stanford

Комментарии

Информация по комментариям в разработке