Скачать или смотреть How To Perform First ETL On Your Spark Cluster: Planning in Jupyter & Executing with Spark-Submit

How To Perform First ETL On Your Spark Cluster: Planning in Jupyter & Executing with Spark-Submit

Скачать How To Perform First ETL On Your Spark Cluster: Planning in Jupyter & Executing with Spark-Submit бесплатно в качестве 4к (2к / 1080p)

У нас вы можете скачать бесплатно How To Perform First ETL On Your Spark Cluster: Planning in Jupyter & Executing with Spark-Submit или посмотреть видео с ютуба в максимальном доступном качестве.

Для скачивания выберите вариант из формы ниже:

Информация по загрузке:

Cкачать музыку How To Perform First ETL On Your Spark Cluster: Planning in Jupyter & Executing with Spark-Submit бесплатно в формате MP3:

Если иконки загрузки не отобразились, ПОЖАЛУЙСТА, НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если у вас возникли трудности с загрузкой, пожалуйста, свяжитесь с нами по контактам, указанным в нижней части страницы.
Спасибо за использование сервиса video2dn.com

Описание к видео How To Perform First ETL On Your Spark Cluster: Planning in Jupyter & Executing with Spark-Submit

ETL process in production requires multiple nodes depending on the size and number of different transformation you are doing on the file. Video walks through the below steps.

1) use kaggle notebook to explore data
2) Python script is written based on the findings in notebook
3) All execution is perfected and tested
4) Python script is executed in cluster

The notebook code & data used in this video is at
https://www.kaggle.com/code/kamaljp/p...

The python pipeline-script is at
https://github.com/Kamalabot/emr_inst...

Transformation required are :
1) The column names are modified in the Spark Dataframe
2) New table under the name customer_spark_table is created in Spark metastore
3) Execute a simple filter transformation. Select the rows that have income above 15000, and spending power above 50
4) Write a new table inside spark metastore
5) Write the new table as csv file
6) Convert the Jupyter notebook cells into Pyspark Script that can execute code on the given csv file(it will customer.csv file only)

The supporting playlists are
Python Data Engineering Playlist
   • Learn to Data Engineer and Problem Solve: ...
Python Ecosystem of Libraries
   • Mastering the Python Ecosystem: Must-Know ...
ChatGPT and AI Playlist
   • Learn about AI Language Models and Reinfor...
AWS and Python AWS Wrangler
   • Building a Powerful Data Pipeline with AWS...

PS: Got a question or have a feedback on my content. Get in touch
By leaving a Comment in the video
@mail [email protected]
@twitter Handle is @KQrios
@medium   / about
@github https://github.com/Kamalabot

Комментарии

Информация по комментариям в разработке