AI Safety and Theoretical Computer Science - Scott Aaronson (UT Austin and OpenAI)

Описание к видео AI Safety and Theoretical Computer Science - Scott Aaronson (UT Austin and OpenAI)

Progress on AI safety andalignment, like the current AI revolution more generally, has been almostentirely empirical. In this talk, however, I'll survey a few areas whereI think theoretical computer science can contribute to AI safety, including:

- How can we robustlywatermark the outputs of Large Language Models and other generative AI systems,to help identify academic cheating, deepfakes, and AI-enabled fraud? I'llexplain my proposal and its basic mathematical properties, as well as whatremains to be done.

- Can one insert undetectablecryptographic backdoors into neural nets, for good or ill? In what sensescan those backdoors also be unremovable? How robust are they againstfine-tuning?

- Should we expect neuralnets to be "generically" interpretable? I'll discuss abeautiful formalization of that question due to Paul Christiano, alongwith some initial progress on it, and an unexpected connection to quantumcomputing.

Комментарии

Информация по комментариям в разработке