Run Apache Spark jobs on serverless Dataproc

Описание к видео Run Apache Spark jobs on serverless Dataproc

Today, I'm excited to share a hands-on example of using a custom container to bundle all Spark job dependencies and execute it on serverless Dataproc. This powerful feature provides a streamlined approach for running Spark jobs without managing any infrastructure, while still offering advanced features like fine-tuning autoscaling—all without incurring the cost of a constantly running cluster. #ApacheSpark #GoogleCloud #Serverless #Dataproc #BigData

00:17 - Table of Contents
01:19 - What is Dataproc?
01:53 - Dataproc vs serverless Dataproc
03:52 - Custom containers on Dataproc
08:14 - A real-world use case
11:33 - Code walk through
20:43 - See it in Action!
25:55 - Summary

Useful links
code: https://github.com/rocketechgroup/spa...
slides: https://docs.google.com/presentation/...
custom container: https://cloud.google.com/dataproc-ser...
serverless vs compute engine: https://cloud.google.com/dataproc-ser...
spark submit via REST: https://cloud.google.com/dataproc-ser...
service to service communication: https://cloud.google.com/run/docs/aut...

Комментарии

Информация по комментариям в разработке