The Power of Open Internet Data in Training Large Language Models

Описание к видео The Power of Open Internet Data in Training Large Language Models

In this interview with Thom Vaughan (linkedin.com/in/tevaughan/) and Pedro Ortiz Suarez (linkedin.com/in/pjox/), we talk about the role of open internet data in training AI models, such as Large Language Models (LLMs). We discuss how Common Crawl maintains a free, open repository of data crawled from the internet and provides an easy way to access it via snapshots. We talk about data annotation, and how following standard practices as a web developer can be crucial for the crawling process. #ai-PULSE #CommonCrawl #LargeLanguageModels #OpenInternetData #OscarProject

Комментарии

Информация по комментариям в разработке