Working with Skewed Data: The Iterative Broadcast - Rob Keevil & Fokko Driesprong

#ApacheSpark#DataScience

Скачать Working with Skewed Data: The Iterative Broadcast - Rob Keevil & Fokko Driesprong бесплатно в качестве 4к (2к / 1080p)

У нас вы можете скачать бесплатно Working with Skewed Data: The Iterative Broadcast - Rob Keevil & Fokko Driesprong или посмотреть видео с ютуба в максимальном доступном качестве.

Для скачивания выберите вариант из формы ниже:

Cкачать музыку Working with Skewed Data: The Iterative Broadcast - Rob Keevil & Fokko Driesprong бесплатно в формате MP3:

Если иконки загрузки не отобразились, ПОЖАЛУЙСТА, НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если у вас возникли трудности с загрузкой, пожалуйста, свяжитесь с нами по контактам, указанным в нижней части страницы.
Спасибо за использование сервиса video2dn.com

Описание к видео Working with Skewed Data: The Iterative Broadcast - Rob Keevil & Fokko Driesprong

"Skewed data is the enemy when joining tables using Spark. It shuffles a large proportion of the data onto a few overloaded nodes, bottlenecking Spark's parallelism and resulting in out of memory errors. The go-to answer is to use broadcast joins; leaving the large, skewed dataset in place and transmitting a smaller table to every machine in the cluster for joining. But what happens when your second table is too large to broadcast, and does not fit into memory? Or even worse, when a single key is bigger than the total size of your executor? Firstly, we will give an introduction into the problem. Secondly, the current ways of fighting the problem will be explained, including why these solutions are limited. Finally, we will demonstrate a new technique - the iterative broadcast join - developed while processing ING Bank's global transaction data. This technique, implemented on top of the Spark SQL API, allows multiple large and highly skewed datasets to be joined successfully, while retaining a high level of parallelism. This is something that is not possible with existing Spark join types.

Session hashtag: #EUde11"

About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Read more here: https://databricks.com/product/unifie...

Connect with us:
Website: https://databricks.com
Facebook:   / databricksinc
Twitter:   / databricks
LinkedIn:   / databricks
Instagram:   / databricksinc   Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-nam...

Комментарии

Информация по комментариям в разработке

Working with Skewed Data: The Iterative Broadcast - Rob Keevil & Fokko Driesprong

Скачать Working with Skewed Data: The Iterative Broadcast - Rob Keevil & Fokko Driesprong бесплатно в качестве 4к (2к / 1080p)

Cкачать музыку Working with Skewed Data: The Iterative Broadcast - Rob Keevil & Fokko Driesprong бесплатно в формате MP3:

Описание к видео Working with Skewed Data: The Iterative Broadcast - Rob Keevil & Fokko Driesprong

Похожие видео