5.2 - Airflow-Dataproc Integration | Apache Spark on Dataproc | Google Cloud Series

Описание к видео 5.2 - Airflow-Dataproc Integration | Apache Spark on Dataproc | Google Cloud Series

Apache Airflow is the tool of choice of Data Engineers for orchestrating large scale data pipelines and integrates with lot of tools such as Apache Pig, Apache Hive, Apache Pinot, Google Kubernetes Engine, Google Dataproc to name a few.

In this video we'll discuss the Airflow's integration with Dataproc and see how we can setup a simple workflow of creating a transient cluster, submitting a job and then deleting the cluster.

This video is part of the course Apache Spark on Dataproc. You can find all the videos for this course in the following playlist.
   • Apache Spark on Dataproc | Google Clo...  

I regularly blog and post on my other social media channels as well, so do make sure to follow me there as well.

Sample DAG : https://gist.github.com/kaysush/ade06...
PySpark Code : https://gist.github.com/kaysush/65fdd...

Medium :   / sushil_kumar  
Github : https://github.com/kaysush
Linkedin :   / sushilkumar93  

Комментарии

Информация по комментариям в разработке