From Eyeballing to Excellence: 7 Ways to Evaluate & Monitor LLM Performance

Описание к видео From Eyeballing to Excellence: 7 Ways to Evaluate & Monitor LLM Performance

Workshop Links:
WhyLabs Sign-up: https://whylabs.ai/free
LangKit GitHub: https://github.com/whylabs/langkit
Colab Notebook: https://bit.ly/Intro_to_LangKit
Slack Group: http://join.slack.whylabs.ai/

It’s easy to get started with Large Language Models (LLMs) but it’s hard to move beyond the proof of concept. Especially when you don’t know how to evaluate the quality of the LLM-powered experience. And unfortunately, the most popular evaluation approaches - eyeballing or asking the LLM to self-evaluate - are both flawed.

During this workshop, we will explore 7 different approaches to calculate metrics for evaluating the quality of LLMs for your specific use case, so you never have to eyeball again.

With our expert guidance and hands-on exercises, you'll learn how to measure the effectiveness of your LLMs in a way that's accurate, consistent, and meaningful!

What you’ll need:
A free WhyLabs account (https://whylabs.ai/free)
A Google account (for saving a Google Colab)

About the Speaker:
Alessya Visnjic is the co-founder & CEO of WhyLabs, the AI Observability company. Prior to WhyLabs, Alessya was a CTO-in-residence at the Allen Institute for AI, where she evaluated commercial potential for the latest AI research. Earlier, Alessya spent 9 years at Amazon leading AI initiatives, including leading the development of the internal AI platform and working on the early SageMaker. Alessya is the founder of Rsqrd AI, a global community of AI practitioners who are committed to making enterprise AI technology robust and responsible.

About WhyLabs:
WhyLabs is an AI observability platform that prevents data & model performance degradation by allowing you to monitor your data and machine learning models in production.

Do you want to connect with the community, learn about WhyLabs, or get project support? Join the WhyLabs + Robust & Responsible AI community Slack: https://bit.ly/rsqrd-slack

Комментарии

Информация по комментариям в разработке