Transforming a Jupyter Notebook into a Reproducible Pipeline for ML Experiments

Описание к видео Transforming a Jupyter Notebook into a Reproducible Pipeline for ML Experiments

Rob De Wit's latest talk on transforming a Jupyter Notebook into a Reproducible Pipeline with DVC was presented at PyCon USA 2023. In this project, he creates Pokemon Sprites with Stable Diffusion and LoRA in the Jupyter Notebook and then moves then sets up the stages from the notebook into a DVC pipeline to run experiments. Finally, he shows a quick overview of the DVC Extension for VS Code.
-------
Session Abstract:
Jupyter Notebooks are part of every data scientist's arsenal and for good reason. But while they're great for prototyping in data science projects, they are not ideal for experimenting with different configurations. I have been guilty of running experiments with changing parameters while keeping track on a notepad, and the result has always been messy.

In this session, we will explore how we can transform our notebook prototype into a reproducible pipeline. We will discuss what goes wrong without proper experiment tracking, why reproducibility is the key to solving this, and how we can achieve that with Git and DVC.

I will discuss this topic using a text2image project with Stable Diffusion. I'll show how to break up a notebook into modules, create a pipeline from them, run experiments through the pipeline, and compare their results to find the best possible outcomes.
The target audience will be data scientists that don't have a strong engineering background but would like to move beyond messing about in notebooks. Much like myself a year or two ago.

-------
Try out the DVC Extension for VS Code here: https://marketplace.visualstudio.com/...

To learn more about Iterative's open-source and SaaS tools please visit:
🧑🏽‍💻 Our free online course: https://learn.iterative.ai
✍🏼 Our docs: https://dvc.org/doc (Data Version Control, Pipelines, Experiments)
https://cml.dev/doc (CI/CD for Machine Learning)
https://mlem.ai/doc (Package and Serve your models)
https://studio.iterative.ai (Team Collaboration, Experiments, Model Registry)

Join the Community on our Discord server:   / discord  

For more information on the HighLoad Conference: https://highload.rs/2023/

#dvc #machinelearning #datascience #generativeai

Комментарии

Информация по комментариям в разработке