3.06 Mastering Common Silver and Gold zone transformations with PySpark in Microsoft Fabric

Описание к видео 3.06 Mastering Common Silver and Gold zone transformations with PySpark in Microsoft Fabric

   • Microsoft Fabric For Beginners  
This video explores common transformation techniques in Silver and Gold zones. I explain data enrichment and type conversion transformations and demonstrate how to use PySpark API's and methods to address these tasks.
I also demonstrate how to process historical data from the Bronze layer using Window functions. Next, I explain core Kimball dimensional modelling concepts and demonstrate how they can be implemented using PySpark methods.
Finally, I demonstrate creating aggregates.
You can download the related demo notebook from here: https://github.com/fazizov/youtube/bl...

Chapters:
00:00- Introduction
02:21- Preview
06:19- Lakehouse historical data storage strategy
09:00- Demo start- preparing data
10:24- Creating shortcuts to Bronze tables
11:24- Notebook demo- reading data from shortcuts
12:30- Inspecting data frame schema
13:48- Data Type conversion transformations
16:05- Ordering data
20:00- Handling historical data using Window functions
24:25- Data enrichment transformations
25-45- Using regular expressions to parse text data
26:40- Generating time dimension
30:45- Dimensional modelling concepts
32:12- Slowly changing dimensions (SCD)
33:05- SCD Type-2 dimensions
34:54- Surrogate keys
35:32- Relationships between facts and dimensions
37:00- Generating surrogate keys using monotonically_increasing_id function
38:00- Distributed computing and Spark partitions
41:31- Reducing data frame partition count
43:02- How to link Fact and Dimension tables
47:14- Incremental write into destination tables
49:02- Using MERGE INTO query for destination write
50:50- Aggregation transformations

Please subscribe:    / @fazizov  

Official Documentation:
https://learn.microsoft.com/en-us/fab...
https://learn.microsoft.com/en-us/fab...
https://sparkbyexamples.com/pyspark/p...
https://www.kimballgroup.com/data-war...
https://spark.apache.org/docs/latest/...

Hashtags:
#datafactory, #microsoft,#microsoftfabric ,#azure, #dataengineering,#cloudcomputing, #dataanalytics, #lakehouse, #azuretutorial, #azuretraining, #datapipeline, #dataextraction , #dataintegration, #datatransfer, #dataflow, #spark, #deltalake, #synapse, #synapsedataenginering, #demo, #datalake, #transformation, #ingested, #datawarehouse, #dataintegration, #azuredatabricks ,#databricks, #bigdata, #bigdatatechnologies, #pyspark, #sparksql, #notebook ,#transformationvideo, #bronze, #medallion, #kimball, #dimensions , #modeling, #facts, #silver, #gold, #historical data, #dimensional

Комментарии

Информация по комментариям в разработке