Logo video2dn
  • Сохранить видео с ютуба
  • Категории
    • Музыка
    • Кино и Анимация
    • Автомобили
    • Животные
    • Спорт
    • Путешествия
    • Игры
    • Люди и Блоги
    • Юмор
    • Развлечения
    • Новости и Политика
    • Howto и Стиль
    • Diy своими руками
    • Образование
    • Наука и Технологии
    • Некоммерческие Организации
  • О сайте

Скачать или смотреть AI Sleeper Agents: How Anthropic Trains and Catches Them

  • Rational Animations
  • 2025-08-30
  • 221612
AI Sleeper Agents: How Anthropic Trains and Catches Them
AnthropicAI SafetyAIAlignmentSleeper AgentsAI alignment
  • ok logo

Скачать AI Sleeper Agents: How Anthropic Trains and Catches Them бесплатно в качестве 4к (2к / 1080p)

У нас вы можете скачать бесплатно AI Sleeper Agents: How Anthropic Trains and Catches Them или посмотреть видео с ютуба в максимальном доступном качестве.

Для скачивания выберите вариант из формы ниже:

  • Информация по загрузке:

Cкачать музыку AI Sleeper Agents: How Anthropic Trains and Catches Them бесплатно в формате MP3:

Если иконки загрузки не отобразились, ПОЖАЛУЙСТА, НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если у вас возникли трудности с загрузкой, пожалуйста, свяжитесь с нами по контактам, указанным в нижней части страницы.
Спасибо за использование сервиса video2dn.com

Описание к видео AI Sleeper Agents: How Anthropic Trains and Catches Them

In this video, we explain how Anthropic trained "sleeper agent" AIs to study deception. A "sleeper agent" is an AI model that behaves normally until it encounters a specific trigger in the prompt, at which point it awakens and executes a harmful behavior. Anthropic found that they couldn't undo the sleeper agent training using standard safety training, but they could detect sleeper agents through a simple interpretability technique.

▀▀▀▀▀▀▀▀▀SOURCES & READINGS▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀

Sleeper agents: training deceptive LLMs that persist through safety training:
https://www.anthropic.com/research/sl...
https://www.alignmentforum.org/posts/...

Simple probes can catch sleeper agents: https://www.anthropic.com/research/pr...

Alignment Faking in Large Language Models (mentioned in passing as a more natural demonstration of deceptive alignment): https://www.anthropic.com/research/al...

▀▀▀▀▀▀▀▀▀PATREON, MEMBERSHIP, MERCH▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀

🟠 Patreon:   / rationalanimations  

🔵 Channel membership:    / @rationalanimations  

🟢 Merch: https://rational-animations-shop.four...

🟤 Ko-fi, for one-time and recurring donations: https://ko-fi.com/rationalanimations

▀▀▀▀▀▀▀▀▀SOCIAL & DISCORD▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀

Rational Animations Discord:   / discord  

Reddit:   / rationalanimations  

X/Twitter:   / rationalanimat1  

Instagram:   / rationalanimations  

▀▀▀▀▀▀▀▀▀PATRONS & MEMBERS▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
A
Alcher Black
Alex Hall
Amir Saboury
Apuis Retsam
blasted0glass
Bleys
BlueNotesBlues
bparro
Chad M Jones
Chris Painter
Christian Loomis
Colin Ricardo
Craig Falls
Danealor
Danilo Stefani - Alessandra Erba
David Piepgrass
Dawson
Ducky
Edward Yu
Ellis Jones
Felix Akkermans
Forodriac Origamius
Fraser Cain
Gabriel Ledung
Glenn Tarigan
Honyopenyoko
Ingvi Gautsson
Ivan Bachcin
Jackson Emanuel
James Babcock
Jana
JanJan
Jasper L
Jeroen De Dauw
joe39504589
John
John Everett-Slape
Joshua Adrian Cahyono
Juan Benet
Klemen Slavic
Kristin Lindquist
loopuleasa
Luke Freeman
Martin Skalstad Steen
Matthew Shinkle
Michael Andregg
Michael Hewitt
Nathan Fish
Nathan Metzger
Neal Strobl
NMS
noggieB
Odet Abadia
rictic
Robert Paul Schwin
Scott Alexander
SQRT42Pi
steven michaels
Stuart Alldritt
Superslowmojoe
Terberlo.dog
Tomas Campos
Tor Barstad
ttw
Vladimir Silyaev
Fede Mathieu
ronvil
Michael Suazo
rx
Laissez Scholar
BestProGaming
7ic7ac
Devin King
RED
Rinthean
Thomas Grip
Boris Bend
J H
Richard Stambaugh
Teo Val
Ken Mc
Alcher Black
AWyattLife
Torstein Haldorsen
Michał Zieliński

▀▀▀▀▀▀▀CREDITS▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀

Directed by:
Hannah Levingstone | @hannah_luloo

Writers:
John Burden

Producer:
Emanuele Ascani

Art Director:
Hané Harnett | @Peony_Vibes / @peonyvibes (insta)

Line Producer:
Kristy Steffens | https://linktr.ee/kstearb

Production Managers:
Jay McMichen | @Jay_TheJester
Kristy Steffens | https://linktr.ee/kstearb
Grey Colson | https://linktr.ee/earl.gravy

Quality Assurance Lead:
Lara Robinowitz | @CelestialShibe

Storyboard Artists:
Emmalaine Wright | @emmalainearts (insta)
Hannah Levingstone | @hannah_luloo
Ira Klages | @dux

Lead Animators & Q/A:
Ethan DeBoer | https://linktr.ee/deboer_art
Lara Robinowitz | @CelestialShibe
Owen Peurois | @owenpeurois

Animators:
Colors Giraldo | @colorsofdoom
Ethan DeBoer https://linktr.ee/deboer_art
Ira Klages | @dux
Jay McMichen | @Jay_TheJester
Jodi Kuchenbecker | @viral_genesis (insta)
Jordan Gilbert | @Twin_Knight (twitter) Twin Knight Studios (YT)
Keith Kavanagh | @johnnycigarettex
Lara Robinowitz | @CelestialShibe
Michela Biancini
Owen Peurois | @owenpeurois
Patrick O' Callaghan | @patrick.h264
Patrick Sholar | @Sholarscribbles
Renan Kogut | @kogut_r
Skylar O'Brien | @mutodaes
Vaughn Oeth | @gravy_navy
Zack Gilbert | @Twin_Knight (twitter) Twin Knight Studios (YT)

Background Lead:
Pierre Broissand | @pierrebrsnd (insta) / artstation.com/brsnd

Asset/Background Artists:
Emmalaine Wright | @emmalainearts (insta)
Hané Harnett | @peonyvibes (insta) @peony_vibes (twitter)
Olivia Wang | @whalesharkollie
Pierre Broissand | @pierrebrsnd (insta) / artstation.com/brsnd
Zoe Martin-Parkinson | @zoemar_son

Compositing Lead:
Renan Kogut | @kogut_r

Compositing:
Grey Colson | https://linktr.ee/earl.gravy
Ira Klages | @dux
Patrick O' Callaghan | @patrick.h264
Renan Kogut | @kogut_r

Narrator:
Rob Miles |    / robertmilesai  

VO Editor:
Tony Dipiazza

Original Soundtrack & Sound Design:
Epic Mountain

Комментарии

Информация по комментариям в разработке

Похожие видео

  • О нас
  • Контакты
  • Отказ от ответственности - Disclaimer
  • Условия использования сайта - TOS
  • Политика конфиденциальности

video2dn Copyright © 2023 - 2025

Контакты для правообладателей [email protected]