Alignment and Jailbreaking of Large Language Models - Christos Malliopoulos | codeweek April 2024

Описание к видео Alignment and Jailbreaking of Large Language Models - Christos Malliopoulos | codeweek April 2024

💡 Webinar: Alignment and Jailbreaking of Large Language Models
Speaker: Christos Malliopoulos

Large Language Model (LLM) learning objectives (e.g. next token prediction and semantic similarity between texts) are not always aligned with human expectations. Such discrepancies include the generation of undesirable content (profanity, ethical bias, disclosure of private and sensitive info) and misinformation (also known as hallucination). In the talk we'll touch upon the major LLM alignment methods but most importantly, we will discuss ways to evaluate LLM alignment with a process known as "jailbreaking", in particular the most useful form of it, "black-box jailbreaking". Along the way we will briefly describe "chain-of-thought" reasoning, an increasingly used prompting technique that forms the basis of successful LLM jailbreaking.

#karieragr #codeweek #coding #devcommunity

Комментарии

Информация по комментариям в разработке