Accelerate Your GenAI Model Inference with Ray and Kubernetes - Richard Liu, Google Cloud

Описание к видео Accelerate Your GenAI Model Inference with Ray and Kubernetes - Richard Liu, Google Cloud

Accelerate Your GenAI Model Inference with Ray and Kubernetes - Richard Liu, Google Cloud

Generative AI has become increasingly prevalent in recent years, and is reaching a critical point as the models are demonstrating human-level capabilities. However, serving these massive models have presented new technical challenges, as they contain hundreds of billions of model parameters and require massive computational resources. In this talk, we will discuss how to serve GenAI models using KubeRay on Kubernetes with hardware accelerators like GPUs and TPUs. Practitioners will learn how to get these large models into production on a performant and cost-effective Kubernetes platform. Ray is an open-source framework for distributed machine learning. It enables ML practitioners to scale their workloads out to large clusters of machines. Ray Serve offers a scalable and framework-agnostic library for online inference that’s suitable for large and complex models. The audience will learn how integrating Ray with accelerators can create a powerful platform for serving GenAI models.

Комментарии

Информация по комментариям в разработке