Скачать или смотреть Building an open source data lake at scale in the cloud

Building an open source data lake at scale in the cloud

Скачать Building an open source data lake at scale in the cloud бесплатно в качестве 4к (2к / 1080p)

У нас вы можете скачать бесплатно Building an open source data lake at scale in the cloud или посмотреть видео с ютуба в максимальном доступном качестве.

Для скачивания выберите вариант из формы ниже:

Информация по загрузке:

Cкачать музыку Building an open source data lake at scale in the cloud бесплатно в формате MP3:

Если иконки загрузки не отобразились, ПОЖАЛУЙСТА, НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если у вас возникли трудности с загрузкой, пожалуйста, свяжитесь с нами по контактам, указанным в нижней части страницы.
Спасибо за использование сервиса video2dn.com

Описание к видео Building an open source data lake at scale in the cloud

by Adrian Woodhead

At: FOSDEM 2020
https://video.fosdem.org/2020/UB5.132...

This presentation will give an overview of the various tools, software, patterns and approaches that Expedia Group uses to operate a number of large scale data lakes in the cloud and on premise. The data journey undertaken by the
Expedia Group is probably similar to many others who have been operating in this space over the past two decades - scaling out from relational databases to on premise Hadoop clusters to a much wider ecosystem in the cloud. This talk
will give an overview of that journey and then describe the various open source components that Expedia Group have used and built to create multi-petabyte data lakes. These include existing open source projects like Hive, Hadoop, Terraform,
Docker, Kubernetes as well as open source tools that we built to overcome some of the unexpected challenges we faced. The first of these is Circus Train — a dataset replication tool that copies Hive tables between clusters and clouds. We will also discuss various other options for dataset replication and what unique features Circus Train has. The second tool is Waggle Dance — a federated Hive metadata service that enables querying of data stored across multiple Hive metastores. We will then look at Apiary - a means to simplify the deployment of the various components of an open source data lake at scale including the Hive metastore, Waggle Dance, S3 bucket access, metadata change notifications and much more. We focus on actual problems and solutions that have arisen in a huge, organically grown corporation, rather than idealised architectures.

Room: UB5.132
Scheduled start: 2020-02-02 08:30:00

Комментарии

Информация по комментариям в разработке