Скачать или смотреть Inference Time Scaling for Enterprises | No Math AI

Inference Time Scaling for Enterprises | No Math AI

Red Hat

Скачать Inference Time Scaling for Enterprises | No Math AI бесплатно в качестве 4к (2к / 1080p)

У нас вы можете скачать бесплатно Inference Time Scaling for Enterprises | No Math AI или посмотреть видео с ютуба в максимальном доступном качестве.

Для скачивания выберите вариант из формы ниже:

Информация по загрузке:

Cкачать музыку Inference Time Scaling for Enterprises | No Math AI бесплатно в формате MP3:

Если иконки загрузки не отобразились, ПОЖАЛУЙСТА, НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если у вас возникли трудности с загрузкой, пожалуйста, свяжитесь с нами по контактам, указанным в нижней части страницы.
Спасибо за использование сервиса video2dn.com

Описание к видео Inference Time Scaling for Enterprises | No Math AI

In this episode of "No Math AI," Akash and Isha visit the Red Hat Summit to connect with Red Hat CEO Matt Hicks and CTO Chris Wright, discussing the practical necessities of bringing inference time scaling (also referred to as test time scaling/compute) to enterprise users worldwide.

Matt Hicks explores the pivotal role of an AI platform in abstracting complexity and absorbing costs as AI shifts from static models to dynamic, agentic applications. These applications heavily rely on inference time scaling techniques, such as reasoning and particle filtering, which generate numerous tokens to achieve greater accuracy. Hicks emphasizes the need for platforms to lower the unit price of these capabilities, enable enterprises to easily adopt such techniques, and instill confidence by providing cost transparency to overcome the "fear response" associated with unpredictable expenses when performing more inferencing.

Chris Wright outlines the open-source AI roadmap for reliably deploying these new, inference-heavy technologies in production. He discusses the challenges of moving beyond single-instance inference to a distributed infrastructure capable of accommodating concurrent users and efficiently handling the massive token generation required by these scaled inference processes. Wright introduces LLM-d, a new Red Hat project focused on creating a standard for distributed inference platforms. LLM-d aims to optimize hardware utilization, manage distributed KV caches, and intelligently route requests based on hardware requirements, integrating with Kubernetes. The goal is to build repeatable blueprints for a common architecture to handle these inference-time-scaling workloads through collaborative open-source efforts.

Together, Hicks and Wright highlight that effectively scaling the underlying inference infrastructure from single-server instances to a robust, distributed, and transparent platform is a critical bottleneck. Addressing this bottleneck through community efforts is essential for the future of enterprise AI and the widespread adoption of inference time scaling.

RSS feed: https://feeds.simplecast.com/c1PFREqr
Spotify: https://open.spotify.com/show/7Cpcy42...

For more episodes No Math AI subscribe to: @redhat

Комментарии

Информация по комментариям в разработке