Airflow Sensors : Get started in 10 mins

Описание к видео Airflow Sensors : Get started in 10 mins

Airflow Sensors : Get started in 10 mins

👍 Smash the like button to become an Airflow Super Hero!
❤️ Subscribe to my channel to become a master of Airflow
🏆 BECOME A PRO: https://www.udemy.com/course/the-comp...

🚨 My Patreon:   / marclamberti  

Airflow Sensors are one of the most commonly used type of operators. Why? Because they allow you to check if a criteria is met to get completed. You need to wait for a file? check if a SQL entry exists? delay the execution of your DAG? That’s the few possibilities of the Airflow Sensors. If you want to make complex and powerful data pipelines you have to truly understand how Sensors work. Well, guess what, that’s exactly what you are going to discover now.

A really common use case is when you have multiple partners (A, B and C in this example) and wait for the data coming from them each day at a more or less specific time. For example, Partner A sends you data at 9:00 AM, B at 9:30 AM and C and 10:00 AM. Hoping without delay, but we will come back to this later. So, your goal is to wait for all files to be available before moving to the task Process. Ok, that being said, what are the tasks Partner A, B and C exactly?

Well, when people are not aware about Sensors, they tend to use the PythonOperator. As they need to wait for a file, they create a python function, do their stuff in it to wait for that file and call the python function with the PythonOperator. This is the worst way to do it. Don’t do this, forget about it. You will not leverage the benefits of Airflow and it will be a nightmare to maintain. Don’t do it. Therefore, what’s the solution?

Airflow Sensors! 😎

Airflow brings different sensors, here are a non exhaustive list of the most commonly used:

The FileSensor: Waits for a file or folder to land in a filesystem.
The S3KeySensor: Waits for a key to be present in a S3 bucket.
The SqlSensor: Runs a sql statement repeatedly until a criteria is met.
The HivePartitionSensor: Waits for a partition to show up in Hive.
The ExternalTaskSensor: Waits for a different DAG or a task in a different DAG to complete for a specific execution date. (Pretty useful that one 🤓 )

Комментарии

Информация по комментариям в разработке