Day 8 - J. Cheung: Benchmarking and Evaluation in NLP: How Do We Know What LLMs Can Do?

Скачать Day 8 - J. Cheung: Benchmarking and Evaluation in NLP: How Do We Know What LLMs Can Do? бесплатно в качестве 4к (2к / 1080p)

У нас вы можете скачать бесплатно Day 8 - J. Cheung: Benchmarking and Evaluation in NLP: How Do We Know What LLMs Can Do? или посмотреть видео с ютуба в максимальном доступном качестве.

Для скачивания выберите вариант из формы ниже:

Cкачать музыку Day 8 - J. Cheung: Benchmarking and Evaluation in NLP: How Do We Know What LLMs Can Do? бесплатно в формате MP3:

Если иконки загрузки не отобразились, ПОЖАЛУЙСТА, НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если у вас возникли трудности с загрузкой, пожалуйста, свяжитесь с нами по контактам, указанным в нижней части страницы.
Спасибо за использование сервиса video2dn.com

Описание к видео Day 8 - J. Cheung: Benchmarking and Evaluation in NLP: How Do We Know What LLMs Can Do?

Conflicting claims about how large language models (LLMs) “can do X”, “have property Y”, or even “know Z” have been made in recent literature in natural language processing (NLP) and related fields, as well as in popular media. However, unclear and often inconsistent standards for how to infer these conclusions from experimental results bring the the validity of such claims into question. In this lecture, I focus on the crucial role that benchmarking and evaluation methodology in NLP plays in assessing LLMs’ capabilities. I review common practices in the evaluation of NLP systems, including types of evaluation metrics, assumptions regarding these evaluations, and contexts in which they are applied. I then present case studies showing how less than careful application of current practices may result in invalid claims about model capabilities. Finally, I present our current efforts to encourage more structured reflection during the process of benchmark design and creation by introducing a novel framework, Evidence-Centred Benchmark Design, inspired by work in educational assessment.

References
Porada, I., Zou, X., & Cheung, J. C. K. (2024). A Controlled Reevaluation of Coreference Resolution Models. arXiv preprint arXiv:2404.00727.

Liu, Y. L., Cao, M., Blodgett, S. L., Cheung, J. C. K., Olteanu, A., & Trischler, A. (2023). Responsible AI Considerations in Text Summarization Research: A Review of Current Practices. arXiv preprint arXiv:2311.11103.

Комментарии

Информация по комментариям в разработке

Day 8 - J. Cheung: Benchmarking and Evaluation in NLP: How Do We Know What LLMs Can Do?

Скачать Day 8 - J. Cheung: Benchmarking and Evaluation in NLP: How Do We Know What LLMs Can Do? бесплатно в качестве 4к (2к / 1080p)

Cкачать музыку Day 8 - J. Cheung: Benchmarking and Evaluation in NLP: How Do We Know What LLMs Can Do? бесплатно в формате MP3:

Описание к видео Day 8 - J. Cheung: Benchmarking and Evaluation in NLP: How Do We Know What LLMs Can Do?

Похожие видео