Logo video2dn
  • Сохранить видео с ютуба
  • Категории
    • Музыка
    • Кино и Анимация
    • Автомобили
    • Животные
    • Спорт
    • Путешествия
    • Игры
    • Люди и Блоги
    • Юмор
    • Развлечения
    • Новости и Политика
    • Howto и Стиль
    • Diy своими руками
    • Образование
    • Наука и Технологии
    • Некоммерческие Организации
  • О сайте

Скачать или смотреть Can We Get Asymptotic Safety Guarantees Based On Scalable Oversight?

  • Simons Institute for the Theory of Computing
  • 2025-04-15
  • 436
Can We Get Asymptotic Safety Guarantees Based On Scalable Oversight?
Simons Institutetheoretical computer scienceUC BerkeleyComputer ScienceTheory of Computingfoundations of computingSafety-Guaranteed LLMsGeoffrey Irving
  • ok logo

Скачать Can We Get Asymptotic Safety Guarantees Based On Scalable Oversight? бесплатно в качестве 4к (2к / 1080p)

У нас вы можете скачать бесплатно Can We Get Asymptotic Safety Guarantees Based On Scalable Oversight? или посмотреть видео с ютуба в максимальном доступном качестве.

Для скачивания выберите вариант из формы ниже:

  • Информация по загрузке:

Cкачать музыку Can We Get Asymptotic Safety Guarantees Based On Scalable Oversight? бесплатно в формате MP3:

Если иконки загрузки не отобразились, ПОЖАЛУЙСТА, НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если у вас возникли трудности с загрузкой, пожалуйста, свяжитесь с нами по контактам, указанным в нижней части страницы.
Спасибо за использование сервиса video2dn.com

Описание к видео Can We Get Asymptotic Safety Guarantees Based On Scalable Oversight?

Geoffrey Irving (UK AI Safety Institute)
https://simons.berkeley.edu/talks/geo...
Safety-Guaranteed LLMs

Scalable oversight attempts to align AI systems to human values by training AI models based on human feedback and using AI assistance to strengthen that human feedback signal. This talk will cover:

1. Recent theoretical work applying tools from computational complexity, multi-agent training dynamics, and learning theory to design improved scalable oversight methods which achieve theoretical guarantees given simplified assumptions about human feedback.
2. Prospects for extending such methods to weaker (and thus more realistic) assumptions about human feedback, and stronger requirements on solutions.
3. Prospects for integrating these developments into practical ML training.

For (1), we have a new "prover-predictor game" variant of debate which (in a theoretical setting with sufficiently strong assumptions) avoids the "obfuscated arguments" problem discovered during human participant scalable oversight experiments in 2020. Previous versions of debate either assumed infinitely powerful agents or required computational complexity proportional to the length of a human-checkable argument. The new method allows ML systems to spend time related to the length of an ML-checkable argument, which can be much shorter if superhuman heuristics are involved.

For (2), the talk will lay out some sources of optimism in the hopes of encouraging more work in this area. There are concrete theoretical limitations in the current methods which may be addressable using tools from theory. It is not clear that this work will succeed, but it is importantly orthogonal to much of the safety research occurring at AI labs today, and I believe there are strong prospects for bringing new ideas from other areas of theoretical computer science which have not yet been applied to AI safety.

For (3), the new method has the structure of a zero-sum, adversarial team game, and both theoretical and practical evidence shows that such games admit practical, convergent training methods. Importantly, while the asymptotic guarantees provided by this type of theory are weaker than full verification, they may also be more likely to translate into practice.

Комментарии

Информация по комментариям в разработке

Похожие видео

  • О нас
  • Контакты
  • Отказ от ответственности - Disclaimer
  • Условия использования сайта - TOS
  • Политика конфиденциальности

video2dn Copyright © 2023 - 2025

Контакты для правообладателей [email protected]