Скачать или смотреть CITP Seminar Stephan Rabanser - Towards a Science of AI Agent Reliability

CITP Seminar Stephan Rabanser - Towards a Science of AI Agent Reliability

Скачать CITP Seminar Stephan Rabanser - Towards a Science of AI Agent Reliability бесплатно в качестве 4к (2к / 1080p)

У нас вы можете скачать бесплатно CITP Seminar Stephan Rabanser - Towards a Science of AI Agent Reliability или посмотреть видео с ютуба в максимальном доступном качестве.

Для скачивания выберите вариант из формы ниже:

Информация по загрузке:

Cкачать музыку CITP Seminar Stephan Rabanser - Towards a Science of AI Agent Reliability бесплатно в формате MP3:

Если иконки загрузки не отобразились, ПОЖАЛУЙСТА, НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если у вас возникли трудности с загрузкой, пожалуйста, свяжитесь с нами по контактам, указанным в нижней части страницы.
Спасибо за использование сервиса video2dn.com

Описание к видео CITP Seminar Stephan Rabanser - Towards a Science of AI Agent Reliability

AI agents are increasingly performing consequential tasks autonomously: writing code, making purchases, and providing advice. But how do we know when to trust them? Current evaluation focuses predominantly on success rates: how often does the agent complete the task? This misses critical questions about how agents behave: Do they give the same answer twice? Do they fail gracefully when conditions change? Can they tell us when they’re likely to be wrong? Drawing on decades of practice from aviation, nuclear power, and other safety-critical domains, we propose a framework that decomposes reliability into four dimensions: consistency, robustness, predictability, and safety. Evaluating 12 frontier AI models, we find a striking result: despite rapid capability improvements over 18 months, reliability has barely budged. Agents that are substantially more accurate remain inconsistent across runs and poorly calibrated about their own uncertainty. The implication is clear: building capable AI is not the same as building dependable AI. As agents take on higher-stakes tasks, we need evaluation practices that ask not just “does it work?” but “can we count on it?”

Bio:

Stephan Rabanser works on trustworthy machine learning, with a particular focus on uncertainty quantification, selective prediction, and out-of-distribution generalization/robustness. At a high level, his research aims to improve the reliability of machine learning systems under uncertainty and distribution shift. Rabanser develops principled yet practical methods that help models understand what they know—and crucially, when they should abstain—whether by quantifying predictive uncertainty, deferring to expert models, or rejecting unfamiliar inputs. He also studies how models can generalize reliably under distribution shift, with applications ranging from out-of-distribution detection and time series anomaly detection to robustness in federated learning. A recurring theme of his research is to design intelligent systems that remain trustworthy even under imperfect or adversarial conditions, such as privacy constraints, limited data, or non-stationary environments. His current research explores how uncertainty can be designed and leveraged in large generative models to support more reliable decision-making and safer deployment.

Rabanser holds a Ph.D. in computer science from the University of Toronto, an M.Sc. and a B.Sc. in informatics from the Technical University of Munich (TUM), and an Honours Degree in technology management from the Center for Digital Technology and Management (CDTM). Over the past years, he has held engineering and research positions at Amazon / AWS AI Labs and Google. Previously, Rabanser has also been a research visitor at the Massachusetts Institute of Technology (MIT), Carnegie Mellon University (CMU), and the University of Cambridge.

Rabanser’s Google Scholar webpage

In-person attendance is open to Princeton University faculty, staff and students.

If you need an accommodation for a disability please contact Jean Butcher at [email protected] at least one week prior to the event.

Sponsorship of an event does not constitute institutional endorsement of external speakers or views presented.

Комментарии

Информация по комментариям в разработке