Building a Batch Data Pipeline using Airflow, Spark, EMR & Snowflake

Описание к видео Building a Batch Data Pipeline using Airflow, Spark, EMR & Snowflake

In this project we will demonstrate the use of:
✅Airflow to orchestrate and manage the data pipeline
✅AWS EMR for the heavy data processing
✅Use Airflow to create the EMR cluster, and then terminate once the processing is complete to save on cost.

Prerequisite:
---------------------
Transient Cluster on AWS from Scratch using boto3 | Trigger Spark job from AWS Lambda
   • Transient Cluster on AWS from Scratch...  
Connect Apache Airflow to Snowflake Data Warehouse
   • Connect Apache Airflow to Snowflake D...  
Reference Architecture for Batch ETL Workloads using AWS EMR & Best Practices
  / reference-architecture-for-batch-etl-workl...  


Code:
---------------
https://github.com/SatadruMukherjee/D...
https://github.com/SatadruMukherjee/D...
https://github.com/SatadruMukherjee/D...
https://github.com/SatadruMukherjee/D...


Check this playlist for more Data Engineering related videos:
   • Demystifying Data Engineering with Cl...  

Apache Kafka form scratch
   • Apache Kafka for Python Developers  

Snowflake Complete Course from scratch with End-to-End Project with in-depth explanation--
https://doc.clickup.com/37466271/d/h/...

🙏🙏🙏🙏🙏🙏🙏🙏
YOU JUST NEED TO DO
3 THINGS to support my channel
LIKE
SHARE
&
SUBSCRIBE
TO MY YOUTUBE CHANNEL

Комментарии

Информация по комментариям в разработке