Lecture 3 | AI Safety, Ethics, & Society: Single Agent Safety

Описание к видео Lecture 3 | AI Safety, Ethics, & Society: Single Agent Safety

You can find more information including the corresponding section of the AI Safety, Ethics, & Society textbook at https://www.aisafetybook.com/textbook....

Topics: Single Agent Safety
Dan Hendrycks, Director, Center for AI Safety; PhD Computer Science, UC Berkeley
https://www.safe.ai/
Section Synopsis: ML systems have grown more competent and general as the field of deep learning has matured. Reasoning about the behavior and internal structure of such systems can be challenging, especially since some failure modes arise only once an AI system is sufficiently sophisticated. We discuss some of the fundamental technical challenges around monitoring, robustness and control of AI systems. Current AI systems lack transparency and can exhibit surprising emergent capabilities. They are vulnerable to adversarial examples, Trojans and other attacks. These challenges in turn may make it hard to control AI systems and prevent unintended behaviour such as deception. When conducting research to advance AI safety, it is important to consider the risk of inadvertently accelerating AI capabilities in a way that undermines the overall goal of better understanding and controlling AI systems.

0:00 Introduction
0:51 Robustness
8:45 Trojans
11:42 Monitoring
23:40 Anomaly Detection
27:38 Representation Control
29:33 Systemic Safety
32:18 Recap

Комментарии

Информация по комментариям в разработке