Preventing Threats to LLMs: Detecting Prompt Injections & Jailbreak Attacks

Описание к видео Preventing Threats to LLMs: Detecting Prompt Injections & Jailbreak Attacks

Join this hands-on workshop to learn how to identify and mitigate malicious issues in your LLMs, focusing on two categories of attacks, prompt injections and jailbreaks.

Language models have fallen victim to numerous attacks that pose serious risks - specifically jailbreak attacks and prompt injections, which have the power to generate hate speech, create misinformation, cause private data leakage, and other restricted behaviors.

This workshop will cover:
- What jailbreaks and prompt injections are and how to identify them in your LLMs
- Using privilege control, robust system prompts, human in the loop and monitoring to prevent the attacks
- Using similarity to known attacks to compare a set of known jailbreak/prompt injection attacks to incoming user prompts
- Proactive prompt injection detection to devise a preflight instruction prompt combined with the target prompt to analyze the response


What you’ll need:
- A free WhyLabs account (https://whylabs.ai/free)
- A Google account (for saving a Google Colab)


Who should attend:
Anyone interested in building applications with LLMs, AI Observability, Model monitoring, MLOps, and DataOps! This workshop is designed to be approachable for most skill levels. Familiarity with machine learning and Python will be useful, but it's not required to attend.

Комментарии

Информация по комментариям в разработке