Logo video2dn
  • Сохранить видео с ютуба
  • Категории
    • Музыка
    • Кино и Анимация
    • Автомобили
    • Животные
    • Спорт
    • Путешествия
    • Игры
    • Люди и Блоги
    • Юмор
    • Развлечения
    • Новости и Политика
    • Howto и Стиль
    • Diy своими руками
    • Образование
    • Наука и Технологии
    • Некоммерческие Организации
  • О сайте

Скачать или смотреть Goodfire AI’s Bet: Interpretability as the Next Frontier of Model Design — Myra Deng & Mark Bissell

  • Latent Space
  • 2026-02-05
  • 2804
Goodfire AI’s Bet: Interpretability as the Next Frontier of Model Design — Myra Deng & Mark Bissell
  • ok logo

Скачать Goodfire AI’s Bet: Interpretability as the Next Frontier of Model Design — Myra Deng & Mark Bissell бесплатно в качестве 4к (2к / 1080p)

У нас вы можете скачать бесплатно Goodfire AI’s Bet: Interpretability as the Next Frontier of Model Design — Myra Deng & Mark Bissell или посмотреть видео с ютуба в максимальном доступном качестве.

Для скачивания выберите вариант из формы ниже:

  • Информация по загрузке:

Cкачать музыку Goodfire AI’s Bet: Interpretability as the Next Frontier of Model Design — Myra Deng & Mark Bissell бесплатно в формате MP3:

Если иконки загрузки не отобразились, ПОЖАЛУЙСТА, НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если у вас возникли трудности с загрузкой, пожалуйста, свяжитесь с нами по контактам, указанным в нижней части страницы.
Спасибо за использование сервиса video2dn.com

Описание к видео Goodfire AI’s Bet: Interpretability as the Next Frontier of Model Design — Myra Deng & Mark Bissell

From Palantir and Two Sigma to building Goodfire into the poster-child for actionable mechanistic interpretability, Mark Bissell (Member of Technical Staff) and Myra Deng (Head of Product) are trying to turn “peeking inside the model” into a repeatable production workflow by shipping APIs, landing real enterprise deployments, and now scaling the bet with a recent $150M Series B funding round at a $1.25B valuation. (https://www.goodfire.ai/blog/our-seri...)

In this episode, we go far beyond the usual “SAEs are cool” take. We talk about Goodfire’s core bet: that the AI lifecycle is still fundamentally broken because the only reliable control we have is data and we post-train, RLHF, and fine-tune by “slurping supervision through a straw,” hoping the model picks up the right behaviors while quietly absorbing the wrong ones. Goodfire’s answer is to build a bi-directional interface between humans and models: read what’s happening inside, edit it surgically, and eventually use interpretability during training so customization isn’t just brute-force guesswork. (https://www.goodfire.ai/blog/on-optim...)

We discuss:
• Myra + Mark’s path: Palantir (health systems, forward-deployed engineering) → Goodfire early team; Two Sigma → Head of Product, translating frontier interpretability research into a platform and real-world deployments
• What “interpretability” actually means in practice: not just post-hoc poking, but a broader “science of deep learning” approach across the full AI lifecycle (data curation → post-training → internal representations → model design)
• Why post-training is the first big wedge: “surgical edits” for unintended behaviors likereward hacking, sycophancy, noise learned during customization plus the dream of targeted unlearning and bias removal without wrecking capabilities
• SAEs vs probes in the real world: why SAE feature spaces sometimes underperform classifiers trained on raw activations for downstream detection tasks (hallucination, harmful intent, PII), and what that implies about “clean concept spaces”
• Rakuten in production (https://www.goodfire.ai/research/raku... deploying interpretability-based token-level PII detection at inference time to prevent routing private data to downstream providers plus the gnarly constraints: no training on real customer PII, synthetic→real transfer, English + Japanese, and tokenization quirks
• Real-time steering at frontier scale: a demo of steering Kimi K2 (~1T params) live and finding features via SAE pipelines, auto-labeling via LLMs, and toggling a “Gen-Z slang” feature across multiple layers without breaking tool use
• Hallucinations as an internal signal: the case that models have latent uncertainty / “user-pleasing” circuitry you can detect and potentially mitigate more directly than black-box methods
• Steering vs prompting (https://www.goodfire.ai/blog/feature-... the emerging view that activation steering and in-context learning are more closely connected than people think, including work mapping between the two (even for jailbreak-style behaviors)
• Interpretability for science: using the same tooling across domains (genomics, medical imaging, materials) to debug spurious correlations and extract new knowledge up to and including early biomarker discovery work with major partners

—

Goodfire AI
• Website: https://goodfire.ai
• LinkedIn:   / goodfire-ai  
• X: https://x.com/GoodfireAI

Myra Deng
• Website: https://myradeng.com/
• LinkedIn:   / myra-deng  
• X: https://x.com/myra_deng

Mark Bissell
• LinkedIn:   / mark-bissell  
• X: https://x.com/MarkMBissell

00:00 Introduction
00:45 Welcome + episode setup + intro to Goodfire
02:16 Fundraise news + what’s changed recently
02:44 Guest backgrounds + what they do day-to-day
05:52 “What is interpretability?” (SAEs, probing, steering and quick map of the space)
08:29 Post-training failures (sycophancy/reward hacking) + using interp to guide learning
10:26 Surgical edits: bias vectors + grokking/double descent + subliminal learning
14:04 How Goodfire decides what to work on (customers → research agenda)
16:58 SAEs vs probes: what works better for real-world detection tasks
19:04 Rakuten case study: production PII monitoring + multilingual + token-level scrubbing
22:06 Live steering demo on a 1T-parameter model (and scaling challenges)
25:29 Feature labeling + auto-interpretation + can we “turn down” hallucinations?
31:03 Steering vs prompting equivalence + jailbreak math + customization implications
38:36 Open problems + how to get started in mech interp
46:29 Applications: healthcare + scientific discovery (biomarkers, Mayo Clinic, etc.)
57:10 Induction + sci-fi intuition (Ted Chiang)

Комментарии

Информация по комментариям в разработке

Похожие видео

  • О нас
  • Контакты
  • Отказ от ответственности - Disclaimer
  • Условия использования сайта - TOS
  • Политика конфиденциальности

video2dn Copyright © 2023 - 2025

Контакты для правообладателей [email protected]