Building a LLM Judge with Weights & Biases

Описание к видео Building a LLM Judge with Weights & Biases

Evaluating LLM outputs accurately is critical to being able to iterate quickly on a LLM system. Human annotations can be slow and expensive and using LLMs instead promises to solve this. However, aligning a LLM Judge with human judgements is often hard with many implementation details to consider. In this workshop we will explore:
Evaluating specialized LLMs using Weave
Productionizing the latest LLM-as-a-judge research
Improving on your existing judge
Building annotation UIs

#MicrosoftReactor

[eventID:23760]

Комментарии

Информация по комментариям в разработке