The Inside View

AI Alignment video explainers and podcasts

The Battle For The Future Of AI — Full Documentary

The Battle For The Future Of AI — Full Documentary

2024: The Year Of Artificial General Intelligence

2024: The Year Of Artificial General Intelligence

Evan Hubinger (Anthropic)—Deception, Sleeper Agents, Responsible Scaling

Evan Hubinger (Anthropic)—Deception, Sleeper Agents, Responsible Scaling

Holly Elmore—Pausing Frontier AI Development

Holly Elmore—Pausing Frontier AI Development

GPT-2 Teaches GPT-4: Weak-to-Strong Generalization

GPT-2 Teaches GPT-4: Weak-to-Strong Generalization

How to Catch an AI Liar

How to Catch an AI Liar

Anthropic Solved Interpretability?

Anthropic Solved Interpretability?

We Beat The Strongest Go AI

We Beat The Strongest Go AI

Paul Christiano's Views on AI Doom (ft. Robert Miles)

Paul Christiano's Views on AI Doom (ft. Robert Miles)

Neel Nanda–Mechanistic Interpretability, Superposition, Grokking

Neel Nanda–Mechanistic Interpretability, Superposition, Grokking

Joscha Bach—Is AI Risk Real?

Joscha Bach—Is AI Risk Real?

Erik Jones—Automatically Auditing Large Language Models

Erik Jones—Automatically Auditing Large Language Models

Dylan Patel—GPU Shortage, Nvidia, Semiconductor Supply Chain

Dylan Patel—GPU Shortage, Nvidia, Semiconductor Supply Chain

Andi Peng—A Human-in-the-Loop Framework for Test-Time Policy Adaptation

Andi Peng—A Human-in-the-Loop Framework for Test-Time Policy Adaptation

Hailey Schoelkopf—Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

Hailey Schoelkopf—Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

Tomek Korbak—Pretraining Language Models with Human Preferences

Tomek Korbak—Pretraining Language Models with Human Preferences

Tim Dettmers—k-bit Inference Scaling Laws

Tim Dettmers—k-bit Inference Scaling Laws

Eric Wallace—Poisoning Language Models During Instruction Tuning

Eric Wallace—Poisoning Language Models During Instruction Tuning

Tony Wang—Beating Superhuman Go AIs

Tony Wang—Beating Superhuman Go AIs

Дэвид Бау — Редактирование фактов в GPT, Интерпретируемость

Дэвид Бау — Редактирование фактов в GPT, Интерпретируемость

Alexander Pan–Are AIs Machiavellian?

Alexander Pan–Are AIs Machiavellian?

Vincent Weisser–Funding Alignment Research

Vincent Weisser–Funding Alignment Research

Aran Komatsuzaki–Scaling, GPT-J

Aran Komatsuzaki–Scaling, GPT-J

Curtis Huebner—AGI by 2028, 90% Doom

Curtis Huebner—AGI by 2028, 90% Doom

Eric Michaud—Scaling, Grokking, Quantum Interpretability

Eric Michaud—Scaling, Grokking, Quantum Interpretability

Daniel Filan–AXRP, LLMs, Interpretability

Daniel Filan–AXRP, LLMs, Interpretability

Existential Risk From AI Is Higher Than 10%—Change My Mind

Existential Risk From AI Is Higher Than 10%—Change My Mind

Jesse Hoogland–AI Risk, Interpretability

Jesse Hoogland–AI Risk, Interpretability

Clarifying and predicting AGI

Clarifying and predicting AGI

Alan Chan and Max Kaufmann–Model Evaluations, Timelines, Coordination

Alan Chan and Max Kaufmann–Model Evaluations, Timelines, Coordination