Скачать или смотреть The Art of Scaling Reinforcement Learning Compute for LLMs

The Art of Scaling Reinforcement Learning Compute for LLMs

Скачать The Art of Scaling Reinforcement Learning Compute for LLMs бесплатно в качестве 4к (2к / 1080p)

У нас вы можете скачать бесплатно The Art of Scaling Reinforcement Learning Compute for LLMs или посмотреть видео с ютуба в максимальном доступном качестве.

Для скачивания выберите вариант из формы ниже:

Информация по загрузке:

Cкачать музыку The Art of Scaling Reinforcement Learning Compute for LLMs бесплатно в формате MP3:

Если иконки загрузки не отобразились, ПОЖАЛУЙСТА, НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если у вас возникли трудности с загрузкой, пожалуйста, свяжитесь с нами по контактам, указанным в нижней части страницы.
Спасибо за использование сервиса video2dn.com

Описание к видео The Art of Scaling Reinforcement Learning Compute for LLMs

The field of Reinforcement Learning (RL) is vital for training advanced Large Language Models (LLMs), but until now, it has lacked systematic methods for predicting performance gains as computational resources increase, forcing researchers to rely on large-scale, costly experimentation. To address this gap, the authors conducted the first major systematic study, utilizing over 400,000 GPU-hours, to define a principled framework for RL scaling. This framework models the relationship between training compute and performance (pass rate) using a *sigmoidal compute-performance curve* (Equation 1), which allows researchers to extrapolate performance from smaller runs to larger compute budgets. Key findings from this research show that training methods yield different achievable performance ceilings (asymptotic performance, A), while specific design details—such as normalization or data curriculum—mainly modulate compute efficiency (B) without significantly shifting that ceiling. Based on these insights, the authors propose a best-practice recipe, named **ScaleRL**, which integrates optimized components like the asynchronous PipelineRL setup, FP32 precision for logits, and the CISPO loss function. ScaleRL demonstrated exceptional stability and predictability in a large-scale run of 100,000 GPU-hours, confirming that its actual validation performance closely aligned with the predictions made using the new scaling methodology, thereby establishing a rigorous, predictable approach to RL training comparable to the predictability found in LLM pre-training.

https://arxiv.org/pdf/2510.13786

Комментарии

Информация по комментариям в разработке