Reddit Data Pipeline Engineering | AWS End to End Data Engineering

Описание к видео Reddit Data Pipeline Engineering | AWS End to End Data Engineering

🚀 In this video, we walk you through the integration of Reddit, Airflow, Celery, Postgres, S3, AWS Glue, Athena, and Redshift to create a seamless ETL process. 📊🔍

MORE FREE COURSES: https://datamasterylab.com

What You Will Learn 📝:
🌐 How to extract data from Reddit using its API.
🔄 Setting up and orchestrating ETL processes with Apache Airflow and Celery.
📦 Storing efficiently with Amazon S3 using Airflow.
🧠 Leveraging AWS Glue for data cataloging and ETL jobs.
📜 Querying and transforming data with Amazon Athena.
🏢 Setting up Redshift Cluster and Best practices for loading data into Amazon Redshift for analytics.

⏰ Timestamps:
0:00 Introduction
1:27 Setting up Apache airflow with Celery Backend and Postgres
9:20 Reddit Data Pipeline with airflow
41:00 Cleaning and Transforming Reddit Data
50:00 Connecting to AWS from Airflow
1:11:17 AWS Glue data transformation
1:22:13 Querying Data with Athena
1:24:47 Setting up Redshift Data Warehouse
1:27:26 Redshift Data Warehouse Query Tool
1:29:00 Loading Data into Data Warehouse
1:32:25 Charting with Redshift Data Warehouse


🔗 Useful Links:
Source Code: https://github.com/airscholar/RedditD...
Starting with Reddit:   / api  
Creating Reddit App:   / apps  
Apache Airflow Official Site: https://airflow.apache.org/docs/
AWS Glue Documentation: https://docs.aws.amazon.com/glue/late...

💬 Let us know in the comments if you have any questions or if there's another topic you'd like us to cover next!

🌟 Don't forget to like, share, and subscribe for more data tutorials! 🌟

My Linkedin:   / yusuf-ganiyu-b90140107  
Medium:   / yusuf.ganiyu  
X: https://x.com/YusufOGaniyu

Комментарии

Информация по комментариям в разработке