[Webinar] Safeguarding AI Models: Exploring Prompt Injection Variants

Описание к видео [Webinar] Safeguarding AI Models: Exploring Prompt Injection Variants

Prompt injections refer to a large category of attacks on large language models (LLMs) and multimodal models that are meant to elicit unintended behavior. But what are these types of “unintended behaviors”? And what techniques are used to accomplish this?

In this session, ML Research Scientist Teresa Datta introduced a taxonomy to organize different types of prompt injection attacks—both direct and indirect—and break them down into subtypes. By clearly defining these subtypes, we will be able to better understand and thus better detect and mitigate their harmful effects.

She also covered:
The broad swath of techniques referred to as "prompt injections"
Direct vs. indirect prompt injections
5 main types of direct prompt injections (to be aware of)
Actions defenders can take to respond to the constantly evolving prompt injection attack surface

__

About Arthur: Arthur is the AI performance company. Our platform monitors, measures, and improves machine learning models to deliver better results. We help data scientists, product owners, and business leaders accelerate model operations and optimize for accuracy, explainability, and fairness.

Arthur’s research-led approach to product development drives exclusive capabilities in LLMs, computer vision, NLP, bias mitigation, and other critical areas. We’re on a mission to make AI work for everyone, and we are deeply passionate about building ML technology to drive responsible business results.

Learn more about Arthur → http://bit.ly/3KA31Vh
Follow us on X → https://x.com/itsarthurai
Follow us on LinkedIn →   / arthurai  
Sign up for our newsletter → https://www.arthur.ai/newsletter

Комментарии

Информация по комментариям в разработке