Leaner, Greener and Faster Pytorch Inference with Quantization

Описание к видео Leaner, Greener and Faster Pytorch Inference with Quantization

Speaker:
Suraj Subramanian, Developer Advocate, PyTorch
Suraj is a developer advocate and ML engineer at Meta AI. In a previous life, he was a data engineer and data scientist in personal finance. After being bitten by the ML bug, he worked in healthcare research (predicting patient risk factors) and behavioral finance (preventing overly-risky trading). Outside of work, you can find him hiking barefoot in the Catskills or being thrown on the Aikido mat.


Abstract:
Quantization refers to the practice of taking a neural network's painstakingly-tuned FP32 parameters and rounding that to an integer - without destroying accuracies, while actually making the model leaner, greener and faster. In this session, we'll learn more about this sorcery from first principles and see how this is implemented in PyTorch. We'll break down all of the available approaches to quantize your model, their benefits and pitfalls, and most importantly how you can make an informed decision for your use case. Finally, we put our learnings to the test on a large non-academic model to see how this works in the real world.

Комментарии

Информация по комментариям в разработке