Databricks LLM, DBRX: Model design and challenges. The lecture for the

Описание к видео Databricks LLM, DBRX: Model design and challenges. The lecture for the

In this talk, Shashank Rajput, a research scientist at Databricks, presents three key aspects of training DBRX (Databricks LLM) to the ‪@BuzzRobot‬ community. He discusses the model's Mixture of Experts architecture; the selection process for components and hyperparameters; and the challenges Shashank and his collaborators encountered during large-scale training, along with the solutions they implemented.

Timestamps:
0:00 - Comparing DBRX with other open models.
0:16 - Architecture of the model.
4:14 - Training overview (to train a good quality model, you need high-quality data, an optimal model architecture, and sufficient compute).
10:17 - What the Mixture of Experts architecture is and how it works.
24:35 - Mixture of Experts allows faster training and inference: advantages of the architecture.
26:45 - What Expert parallelism is and why Databricks preferred to build the model using this approach.
31:05 - Challenges that Expert parallelism presents.
38:32 - The challenges of large-scale training.

Комментарии

Информация по комментариям в разработке