Data Engineer PySpark Data Bricks Session Day 4

Описание к видео Data Engineer PySpark Data Bricks Session Day 4

📍 𝐒𝐭𝐚𝐫𝐭 𝐚 𝐒𝐩𝐚𝐫𝐤 𝐒𝐞𝐬𝐬𝐢𝐨𝐧 : Set up the PySpark environment.
🧣 𝐂𝐫𝐞𝐚𝐭𝐞 𝐚 𝐋𝐢𝐬𝐭 : Define the list with three elements.
📢 𝐏𝐚𝐫𝐚𝐥𝐥𝐞𝐥𝐢𝐳𝐞 𝐭𝐡𝐞 𝐋𝐢𝐬𝐭 : Distribute the list across the cluster nodes.
🔔 𝐂𝐨𝐧𝐯𝐞𝐫𝐭 𝐭𝐨 𝐃𝐚𝐭𝐚𝐅𝐫𝐚𝐦𝐞 : Convert the distributed RDD to a DataFrame.
🔋 𝐏𝐞𝐫𝐟𝐨𝐫𝐦 𝐏𝐫𝐨𝐜𝐞𝐬𝐬𝐢𝐧𝐠 : Show the contents and perform any desired operations.

📍 This video will explain how to write first program in PySpark.

📢 Video Link: https://lnkd.in/gmE_dAcG

LinkedIn Profile of author:
  / sachin-saxena-graphic-designer  

Code Source Link:
https://lnkd.in/g67a4kY3

𝐄𝐱𝐩𝐥𝐚𝐧𝐚𝐭𝐢𝐨𝐧 𝐨𝐟 𝐭𝐡𝐞 𝐂𝐨𝐝𝐞 :

𝟏. 𝐒𝐩𝐚𝐫𝐤 𝐒𝐞𝐬𝐬𝐢𝐨𝐧 : The SparkSession is created to provide an entry point for Spark functionality.
𝟐. 𝐋𝐢𝐬𝐭 𝐂𝐫𝐞𝐚𝐭𝐢𝐨𝐧 : A list of three elements is defined.
𝟑. 𝐏𝐚𝐫𝐚𝐥𝐥𝐞𝐥𝐢𝐳𝐞 : The list is parallelized with numSlices=3, which ensures that each element is assigned to a different partition in the RDD. This is how we can distribute it across the three nodes.
𝟒. 𝐂𝐨𝐧𝐯𝐞𝐫𝐭 𝐭𝐨 𝐃𝐚𝐭𝐚𝐅𝐫𝐚𝐦𝐞 : The RDD is mapped to a tuple format to convert it into a DataFrame. The column is named "element".
𝟓. 𝐃𝐢𝐬𝐩𝐥𝐚𝐲 𝐃𝐚𝐭𝐚𝐅𝐫𝐚𝐦𝐞 : The contents of the DataFrame are printed using df.show(), which will display each element as a separate row.
𝟔. 𝐂𝐨𝐮𝐧𝐭 : The total number of elements is counted and printed.
𝟕. 𝐅𝐮𝐫𝐭𝐡𝐞𝐫 𝐏𝐫𝐨𝐜𝐞𝐬𝐬𝐢𝐧𝐠 : An optional step is included to filter the DataFrame for elements containing "1" and display the result.
𝟖. 𝐒𝐭𝐨𝐩 𝐒𝐩𝐚𝐫𝐤 𝐒𝐞𝐬𝐬𝐢𝐨𝐧 Finally, the Spark session is stopped to release resources.

3:54 Databricks source
6:00 Show the number of students in the file
16:00 Map and Flatmap in PySpark
29:00 GroupBy in PySpark
30:00 Show the total marks achieved by Female and Male students
32:00 Show the total number of students that have passed and failed.
33:10 filter data as 50+ marks are required to pass the course
40:00 Show the total number of students enrolled per course
51:00 Show the total marks that students have achieved per course
52:00 Show the average marks that students have achieved per course
55:00 Show the minimum and maximum marks achieved per course
57:00 Show the average age of male and female students

Комментарии

Информация по комментариям в разработке