Navigating LLM Threats: Detecting Prompt Injections and Jailbreaks

Описание к видео Navigating LLM Threats: Detecting Prompt Injections and Jailbreaks

*Every time a new technology emerges, some people will inevitably attempt to use it maliciously – language models are no exception. Recently, we have seen numerous examples of attacks on large language models (LLMs). These attacks elicit restricted behavior from language models, such as producing hate speech, creating misinformation, leaking private information, or misleading the model into performing a different task than intended.

In this hands-on workshop, we will examine differences in navigating natural and algorithmic adversarial attacks, concentrating on prompt injections and jailbreaks. We first explore a few examples of how such attacks are generated via state-of-the-art cipher and language suffix approaches. We then focus on adaptive strategies for detecting these attacks in LLM-based applications using LangKit, our open-source package for feature extraction for LLM and NLP applications, with practical examples and limitation considerations. Namely, semantic similarity against a set of known attacks and LLM-based proactive detection techniques.two approaches for detecting the attacks with LangKit.

By the end of this workshop, attendees will understand:

*What LLM prompt injections and jailbreaks are and measures to mitigate those attacks
*How to use semantic similarity techniques to verify incoming prompts against a set of known jailbreak and prompt injection attacks
*How to use LLM-based proactive detection techniques to preemptively detect prompt injection attacks


Register for our DeepLearning.AI short courses:
https://bit.ly/4aL0O4t

Join our community to learn more:
https://bit.ly/3tCI2f0

Slides:
https://docs.google.com/presentation/...

Notebook:
https://colab.research.google.com/dri...

WhyLabs:

As teams across industries adopt AI, WhyLabs enables them to operate with certainty by providing model monitoring, preventing costly model failures, and facilitating cross-functional collaboration. Incubated at the Allen Institute for AI, WhyLabs is a privately-held, venture-funded company based in Seattle.

Speakers

Felipe Adachi, Applied Scientist, WhyLabs
LinkedIn Profile:   / en  

Bernease Herman, Sr. Data Scientist, WhyLabs
LinkedIn Profile:   / bernease  

Комментарии

Информация по комментариям в разработке