Exploring the Latency/Throughput & Cost Space for LLM Inference // Timothée Lacroix // CTO Mistral

Описание к видео Exploring the Latency/Throughput & Cost Space for LLM Inference // Timothée Lacroix // CTO Mistral

Join us at our first in-person conference on June 25 all about AI Quality: https://www.aiqualityconference.com/

// Abstract
Getting the right LLM inference stack means choosing the right model for your task, and running it on the right hardware, with proper inference code. This talk will go through popular inference stacks and set-ups, detailing what makes inference costly. We'll talk about the current generation of open-source models and how to make the best use of them, but we will also touch on features currently missing from the open-source serving stack as well as what the future generations of models will unlock.

// Bio
Timothée Lacroix, aged 31, is Chief Technical Officer in charge of technical issues relating to product efficacy and research. Started as an engineer at Facebook AI Research in 2015 in New York, where he completed his thesis between 2016 and 2019, in collaboration with École des Ponts, on tensor factorization for recommender systems. He continued his career at Meta until 2023 when he co-founded ‪@Mistral-AI‬.

// Sign up for our Newsletter to never miss an event:
https://mlops.community/join/

// Watch all the conference videos here:
https://home.mlops.community/home/col...

// Check out the MLOps Community podcast: https://open.spotify.com/show/7wZygk3...

// Read our blog:
mlops.community/blog

// Join an in-person local meetup near you:
https://mlops.community/meetups/

// MLOps Swag/Merch:
https://mlops-community.myshopify.com/

// Follow us on Twitter:
  / mlopscommunity  

//Follow us on Linkedin:
  / mlopscommunity  

Комментарии

Информация по комментариям в разработке