3.05 Mastering Bronze Layer Transformations with PySpark in Microsoft Fabric Lakehouse?

Описание к видео 3.05 Mastering Bronze Layer Transformations with PySpark in Microsoft Fabric Lakehouse?

   • Microsoft Fabric For Beginners  
This video explores common transformation techniques for Microsoft Fabric Lakehouse's Bronze layer. I explain common data cleansing and refinement tasks and demonstrate how these tasks can be implemented using PySpark methods.
I also demonstrate common parsing techniques for JSON data.
You can download related material from here:
Demo notebook: https://github.com/fazizov/youtube/bl...
Sample files: https://github.com/fazizov/youtube/bl...
Chapters:
00:00- Introduction
02:05- Preview
03:37- Demo start- upload source files
05:26- PySpark notebook demo - reading CSV files using read API's
06:10- Exploring data irregularities
08:14- Renaming columns using withColumnRenamed method
10:04- Eliminating unnecessary columns by using drop and select methods
11:33- Filter out dirty data using filter and where methods
12:40- Adding source metadata- filename and current timestamp
14:32- Deduplication using dropDuplicates method
16:15- Writing into Delta Lake table using the write method. Selecting correct write mode- append vs overwrite
19:25- Parsing JSON data using read API's
22:53- Referring to JSON node names
23:30- Flattening array type data by using the explode method
24:47- Simple technique to extract all subfields of JSON node at once

Please subscribe:    / @fazizov  

Official Documentation:
https://learn.microsoft.com/en-us/fab...
https://learn.microsoft.com/en-us/fab...
https://spark.apache.org/docs/3.1.2/a...
https://spark.apache.org/docs/3.1.3/a...
https://sparkbyexamples.com/pyspark/p...
https://sparkbyexamples.com/pyspark/p...

Hashtags:
#datafactory, #microsoft,#microsoftfabric ,#azure, #dataengineering,#cloudcomputing, #dataanalytics, #lakehouse, #azuretutorial, #azuretraining, #datapipeline, #dataextraction , #dataintegration, #datatransfer, #dataflow, #spark, #deltalake, #synapse, #synapsedataenginering, #demo, #datalake, #transformation, #ingested, #datawarehouse, #dataintegration, #azuredatabricks ,#databricks, #bigdata, #bigdatatechnologies, #pyspark, #sparksql, #notebook ,#transformationvideo, #bronze, #medallion, #kimball, #dimensions , #modeling, #facts, #silver, #gold, #historical data

Комментарии

Информация по комментариям в разработке