Скачать или смотреть How fast? NVIDIA DGX Spark and DGX Station token/s speed prediction

How fast? NVIDIA DGX Spark and DGX Station token/s speed prediction

Скачать How fast? NVIDIA DGX Spark and DGX Station token/s speed prediction бесплатно в качестве 4к (2к / 1080p)

У нас вы можете скачать бесплатно How fast? NVIDIA DGX Spark and DGX Station token/s speed prediction или посмотреть видео с ютуба в максимальном доступном качестве.

Для скачивания выберите вариант из формы ниже:

Информация по загрузке:

Cкачать музыку How fast? NVIDIA DGX Spark and DGX Station token/s speed prediction бесплатно в формате MP3:

Если иконки загрузки не отобразились, ПОЖАЛУЙСТА, НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если у вас возникли трудности с загрузкой, пожалуйста, свяжитесь с нами по контактам, указанным в нижней части страницы.
Спасибо за использование сервиса video2dn.com

Описание к видео How fast? NVIDIA DGX Spark and DGX Station token/s speed prediction

NVIDIA DGX Spark and DGX Station have been announced by NVIDIA at the 2025 GTC.

In this video I predict the speed you can expect these NVIDIA machines to perform at, in token/s.

I also give details on the calculation method I use, so you can use it yourself with other models of your choice ( memory bandwidth / model size = theoretical limit). I make assumptions about real-world factors and real-world performance.

For DGX Station, I discuss the GPU memory, and the CPU memory, and how they need to be treated differently in this calculation.

As always, I'm curious on your thoughts about this.

Highlights DGX Spark:

DGX Spark will run the DeepSeek R1 Distill Qwen 32B Q8 at 3.5 - 6.7 token/s.

DGX Spark will run the 70B LLAMA (Q4) at about 1.3 token/s - 3.6 token/s.
Probably a little bit slower than spoken text.

DGX Spark will run the 70B LLAMA (Q8) at about 1 token/s - 3 token/s.
(This is the same value as DeepSeek R1 Distill Llama 70B Q8 will achieve)

Highlights DGX Station:

DGX Station will run DeepSeek R1 Distill Llama 70B Q8 at 32 token/s - 85 token/s

DGX Station will run the 70B LLAMA FP16 at about 17 token/s - 45 token/s

DGX Station will run DeepSeek R1 Q2_K_XS at about 11 token/s - 29 token/s.
This will be the largest DeepSeek R1 Quant I predict to fit comfortably into the GPU RAM.

For bigger quants:

DeepSeek R1 Q4_K_M - I predict a performance on DGX Station of 1 token/s - 2.7 token/s. So quite slow.

If you want better performance, or run a less quantized version of DeepSeek R1 on DGX Station, you will need a couple of them (maybe 3-4).

I predict that DGX Station will cost between $150k to $250k - stay tuned and subscribe to find out more in an upcoming video.

The data is based on this fantastic article here:
https://www.hardware-corner.net/guide...
Thank you to Allan Witt of HardwareCorner

Subscribe to my channel for more tips on AI for managers, entrepreneurs and business people, upcoming AI tools which will save you time and make you money.

Комментарии

Информация по комментариям в разработке