Logo video2dn
  • Сохранить видео с ютуба
  • Категории
    • Музыка
    • Кино и Анимация
    • Автомобили
    • Животные
    • Спорт
    • Путешествия
    • Игры
    • Люди и Блоги
    • Юмор
    • Развлечения
    • Новости и Политика
    • Howto и Стиль
    • Diy своими руками
    • Образование
    • Наука и Технологии
    • Некоммерческие Организации
  • О сайте

Скачать или смотреть Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use - SWiRL

  • Srikanth Bhakthan
  • 2025-04-21
  • 29
Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use - SWiRL
  • ok logo

Скачать Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use - SWiRL бесплатно в качестве 4к (2к / 1080p)

У нас вы можете скачать бесплатно Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use - SWiRL или посмотреть видео с ютуба в максимальном доступном качестве.

Для скачивания выберите вариант из формы ниже:

  • Информация по загрузке:

Cкачать музыку Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use - SWiRL бесплатно в формате MP3:

Если иконки загрузки не отобразились, ПОЖАЛУЙСТА, НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если у вас возникли трудности с загрузкой, пожалуйста, свяжитесь с нами по контактам, указанным в нижней части страницы.
Спасибо за использование сервиса video2dn.com

Описание к видео Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use - SWiRL

arxiv: https://arxiv.org/pdf/2504.04736

Brief: Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use (SWiRL)

1. Key Findings

Significant Performance Improvement: The proposed Step-Wise Reinforcement Learning (SWiRL) method significantly outperforms baseline approaches on complex multi-step reasoning and tool use tasks. Relative accuracy improvements reported are:

GSM8K (Math): 21.5%

HotPotQA (QA): 12.3%

CofCA (QA): 14.8%

MuSiQue (QA): 11.1%

BeerQA (QA): 15.3%

Cross-Dataset Generalization: SWiRL demonstrates strong generalization across different datasets within the same task type (e.g., training on HotPotQA improves performance on MuSiQue, CofCA, BeerQA).

Cross-Task Generalization: Excitingly, the method shows generalization across disparate tasks.

Quote: "...training only on HotPotQA (text question-answering) improves zero-shot performance on GSM8K (a math dataset) by a relative 16.9%."

Training on GSM8K (math) improves HotPotQA (QA) by 9.2%.

Effectiveness of Process Filtering: Models trained with SWiRL perform best when using synthetic data filtered step-wise for reasoning quality ("process filtering"), even if the final answer is incorrect. This contrasts with Supervised Fine-Tuning (SFT), which benefits more from outcome-filtered data (correct final answer).

Quote: "...achieve our best results by including process-filtered data, regardless of the correctness of the outcome."

Quote: "SWiRL, unlike SFT, can learn even from trajectories that end in incorrect final answers."

SWiRL Outperforms SFT: SWiRL generally leads to better performance and generalization compared to SFT on these tasks, particularly when using process-filtered data. SFT tends to memorize rather than generalize.

Improved Core Reasoning: SWiRL training enhances the model's intrinsic ability to break down problems, showing improvements even without tool access at inference time. The performance gains are linked to improved step-wise reasoning quality.

Quote: "...suggesting that the downstream performance gains are driven by improved multi-step reasoning."

Scalability: Performance scales positively with the size of the synthetic training dataset (significant gains observed moving from 100 to 1000 trajectories, and further gains up to 10,000). Larger models (Gemma-2-27b) exhibit better generalization compared to smaller models (2b, 9b).

2. Key Concepts

Step-Wise Reinforcement Learning (SWiRL): An offline, multi-step optimization technique combining synthetic data generation and step-wise RL.

Multi-Step Reasoning & Tool Use: Tasks requiring a sequence of actions (text generation, reasoning, calculations, information retrieval via tools like search engines or calculators) before reaching a final solution.

Synthetic Data Generation: Using an LLM (Gemma 2) augmented with tools to iteratively create multi-step problem-solving trajectories (sequences of thought, tool calls, and environment responses).

more: https://open.substack.com/pub/bhaktha...

Created with AI

Комментарии

Информация по комментариям в разработке

Похожие видео

  • О нас
  • Контакты
  • Отказ от ответственности - Disclaimer
  • Условия использования сайта - TOS
  • Политика конфиденциальности

video2dn Copyright © 2023 - 2025

Контакты для правообладателей [email protected]