How to perform End-to-End ETL from Kaggle to Snowflake on Databricks

Описание к видео How to perform End-to-End ETL from Kaggle to Snowflake on Databricks

#data #etl #pyspark #python
In this tutorial, let's explore how to perform a full-fledged Extract-Transform-Load(ETL) job on Databricks using Pyspark.

1. We will perform data extraction from Kaggle datasets using Kaggle's public API and Kaggle CLI.
2. We will perform file handling and data movement from cluster driver memory to Databricks Filestore using bash and dbutils.
3. With the data in the Bronze layer, we will perform filer transformation and witness how to perform data partitioning as a complementary part.
4. We will connect to Snowflake and push the transformed/filtered data onto Snowflake as a table for BI/Data analytics.

Connect with me and explore more educational content I offer - https://jayachandra27.github.io/datab...

00:00 - Introduction
00:40 - Kaggle setup and Data Extraction
05:30 - Data unzipping and movement
07:44 - Data Exploration
08:31 - Data partitioning and exploration
12:56 - Data Transformation
14:10 - Data Load
18:06 - Conclusion

Kaggle API Usage - https://www.kaggle.com/docs/api
Dataset we used - https://www.kaggle.com/datasets/aemyj...
Code and notebook URL - https://databricks-prod-cloudfront.cl...

LET'S CONNECT!
📰 LinkedIn ➔   / jayachandra-sekhar-reddy  
✉️Substack ➔ https://databracket.substack.com/
❤️Kofi ➔ https://ko-fi.com/databracket

#dataengineering #bigdata #databricks #snowflake #kaggle #analytics #dataanalytics

Комментарии

Информация по комментариям в разработке