Deep Dive into LLM Evaluation with Weights & Biases

Описание к видео Deep Dive into LLM Evaluation with Weights & Biases

In the dynamic world of Large Language Models (LLMs), we've unlocked the power to build smart systems from our data. Just like any other piece of automation software, it's essential we take the time to assess these LLM systems. In this webinar, we're going to dive into how we can effectively evaluate these systems, with a particular focus on Retrieval Augmented Generation (RAG) systems.

We'll start by discussing the 'eye-balling' technique and why Weight & Biases Prompts stands out as the first great tool in this area. Then, we'll move on to supervised evaluation, highlighting why it's worth considering and pointing out some limitations.

To wrap things up, we'll look at how LLMs can be used to evaluate themselves - from generating their own evaluation datasets, to using standard metrics like SQuAD or BLUE, and even evaluating retrieval systems.

On top of all that, we'll also touch on how W&B Sweeps, an excellent tool for hyperparameter optimization, can be utilized to find the ideal balance to maximize accuracy and minimize costs. The session will end with a Q&A with the presenters.

This workshop is based off the foundational learnings of DeepLearning.AI’s course on Evaluating & Debugging Generative AI built in collaboration with the Weights & Biases team. Everything covered in the workshop is presented as continued education from the course.

Event Agenda
40-minute Workshop
10-minute Q&A: Answering questions from the audience.

​​About the Speakers:
Morgan McGuire - Growth Director at Weights & Biases
Morgan leads the Growth ML team and is a ML Engineer at Weights & Biases. He has a background in NLP and previously worked at Facebook on the Safety team where he helped classify and flag potentially high-severity content for removal.

Ayush Thakur - Machine Learning Engineer at Weights & Biases
Ayush is a MLE at Weights and Biases and Google Developer Expert in Machine Learning (TensorFlow). He is interested in everything computer vision and representation learning. For the past 7 months he’s been working with LLMs and have covered RLHF and how and what of building LLM-based systems.

Carey Phelps -Founding Product Manager at Weights & Biases
Carey is the founding Product Manager at Weights & Biases. She studied computer science at Stanford and went on to found Carta Healthcare before joining Weights & Biases.

Комментарии

Информация по комментариям в разработке