How to Make Machine Learning Models Run Faster in Production

Описание к видео How to Make Machine Learning Models Run Faster in Production

Speed and efficiency are the name of the game when it comes to production ML, but it can be difficult to optimize model performance for different environments. In this talk, we dive into techniques you can use to make your ML models run faster on any type of infrastructure. We cover batch processing and streaming, unique considerations with running models on edge devices, and practical tips for selecting the right hardware and software configurations that make ML workloads successful. We will also explore the use of model compilation and other optimization techniques to improve performance and discuss the trade-offs involved in different design choices.

#ml #ai #mlops #edgeai

00:53 Why is fast inference important for production ML?
02:51 4 techniques to make your ML models run faster
3:02 CPUs vs. GPUs
03:50 Batch Processing
04:58 Split pre/postprocessing & inference
06:17 Model optimization and compilation
07:40 Demo - testing the 4 techniques
08:40 Test on CPU - baseline
09:04 Comparing optimized mode using Intel's OpenVino
09:48 Running in GPU mode
10:04 Running in batch mode on GPU
10:30 Running in split mode
11:06 Wrap up

For more resources, visit https://www.modzy.com/modzy-blog.

Комментарии

Информация по комментариям в разработке