GPU optimization workshop with OpenAI, NVIDIA, PyTorch, and Voltron Data

Описание к видео GPU optimization workshop with OpenAI, NVIDIA, PyTorch, and Voltron Data

00:30 Workshop overview by ‪@ChipHuyen‬
03:51 Crash course to GPU optimization (Mark Saroufim, Meta)
39:18 High-performance LLM serving on NVIDIA GPUs (Sharan Chetlur, NVIDIA)
1:19:18 Block-based GPU Programming with Triton (Philippe Tillet, OpenAI)
1:59:00 Scaling data processing from CPU to distributed GPUs (William Malpica, Voltron Data)

Join the discussion on Discord:   / discord  
Shared note (during the event): https://docs.google.com/document/d/1T...
GitHub repo with schedule: https://github.com/mlops-discord/gpu-...
For more events hosted by Chip in the future: https://lu.ma/chiphuyen

​Philippe Tillet is leading the Triton team at OpenAI. He previously worked at pretty much all major chip makers, including NVIDIA, AMD, Intel, and Nervana.

​Sharan Chetlur, Principal engineer working on TensorRT-LLM at NVIDIA. He’s been working on CUDA since 2012, having optimized the performance of deep learning models from single GPU to full data center scale. Previously, he was Director of Engineer on the Kernels team at Cerebras.

​William Malpica, co-founder of Voltron Data and creator of BlazingSQL. He helped scale our GPU-native query engine to handle 100TB queries!

Mark Saroufim, PyTorch core developer and cofounder of CUDA MODE. He also ran the really fun NeurIPS LLM Efficiency challenge last year. Previously, he was at Graphcore and Microsoft.

Комментарии

Информация по комментариям в разработке