LocalAI LLM Testing: Distributed Inference on a network? Llama 3.1 70B on Multi GPUs/Multiple Nodes

Описание к видео LocalAI LLM Testing: Distributed Inference on a network? Llama 3.1 70B on Multi GPUs/Multiple Nodes

This week in the RoboTF lab:
Blown power supply
Saying goodbye to some of the 4060's
Most importantly hitting the topic of Distributed Inference! It's a long video...

This week we are taking Llama 3.1 70B at a Q5 quant running 56k of context through the gauntlet with several GPUs and across Nodes in a distributed swarm of llama.cpp workers! The whole lab is getting involved in this one to run a single model.
Both GPU Kubernetes nodes
3x 4060Ti 16GB
6x A4500 20GB
1x 3090 24GB

LocalAI docs on distributed inference: https://localai.io/features/distribute/
Llama.cpp docs: https://github.com/ggerganov/llama.cp...

Link to blog on Llama 3.1 and memory requirements https://huggingface.co/blog/llama31

Just a fun day in the lab, grab your favorite relaxation method and join in.

Recorded and best viewed in 4K

Комментарии

Информация по комментариям в разработке