Fast, Flexible, and Scalable Data Loading for ML Training with Ray Data

Описание к видео Fast, Flexible, and Scalable Data Loading for ML Training with Ray Data

Data loading and preprocessing can easily become the performance bottleneck in ML training pipelines. Data preprocessing requirements are also becoming more complicated as the types of data being processed are becoming more diverse. With Ray Data, data loading can be fast, flexible and scalable. In this talk, we’ll dive into the performance of different open-source data loader solutions. We’ll show how Ray Data can match PyTorch DataLoader and tf.data in performance on a single node, while also providing advanced features necessary for scale, such as in-memory streaming, automatic recovery from out-of-memory failures, and support for heterogeneous clusters.

Takeaway:

• Ray Data provides a combination of speed, scale, and flexibility unmatched by other open-source data loaders.

Find the slide deck here:https://drive.google.com/file/d/19mzr...


About Anyscale
---
Anyscale is the AI Application Platform for developing, running, and scaling AI.

https://www.anyscale.com/

If you're interested in a managed Ray service, check out:
https://www.anyscale.com/signup/

About Ray
---
Ray is the most popular open source framework for scaling and productionizing AI workloads. From Generative AI and LLMs to computer vision, Ray powers the world’s most ambitious AI workloads.
https://docs.ray.io/en/latest/


#llm #machinelearning #ray #deeplearning #distributedsystems #python #genai

Комментарии

Информация по комментариям в разработке