Llama 1-bit quantization - why NVIDIA should be scared

Описание к видео Llama 1-bit quantization - why NVIDIA should be scared

New research has dropped showing how the Llama model can be drastically shrunk without reducing output quality. This new method means it can take advantages of specialized hardware and perform so much faster than before that Nvidia should be scared.

This video is based on this paper: https://arxiv.org/pdf/2402.17764.pdf

Комментарии

Информация по комментариям в разработке