10x Spark performance improvement in Microsoft Fabric

Описание к видео 10x Spark performance improvement in Microsoft Fabric

Boosting Apache Spark Performance with Small JSON Files in Microsoft Fabric. Learn how to achieve a 10x performance improvement when ingesting small JSON files in Apache Spark hosted on Microsoft Fabric. Ian Griffiths, Technical Fellow at endjin, shares insights and techniques to overcome Spark's challenges with numerous small files, including parallelizing file discovery and optimizing data loading. Follow along for detailed steps and tips to significantly enhance your Spark data processing workflows using Apache Spark in Microsoft Fabric.

00:00 Introduction to Performance Improvement in Apache Spark
00:20 Understanding the Problem with Small Files in Spark
00:38 Our Scenario: Performance Telemetry Collection
01:20 Initial Approach and Disappointment
01:40 Exploring the Root Cause
05:27 Parallelization: The Key to Performance Boost
08:51 Implementing the Solution in Spark
12:43 Conclusion: Balancing Complexity and Performance

#microsoftfabric #apachespark #performance #spark #python #benchmark

Комментарии

Информация по комментариям в разработке