Unlocking Near Real Time Data Replication with CDC, Apache Spark™ Streaming, and Delta Lake

Описание к видео Unlocking Near Real Time Data Replication with CDC, Apache Spark™ Streaming, and Delta Lake

Tune into DoorDash's journey to migrate from a flaky ETL system with 24-hour data delays, to standardizing a CDC streaming pattern across more than 150 databases to produce near real-time data in a scalable, configurable, and reliable manner.

During this journey, understand how we use Delta Lake to build a self-serve, read-optimized data lake with data latencies of 15, whilst reducing operational overhead. Furthermore, understand how certain tradeoffs like conceding to a non-real-time system allow for multiple optimizations but still permit for OLTP query use-cases, and the benefits it provides.

Talk by: Ivan Peng and Phani Nalluri

Here’s more to explore:
Big Book of Data Engineering: 2nd Edition: https://dbricks.co/3XpPgNV
The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com
Twitter:   / databricks  
LinkedIn:   / databricks  
Instagram:   / databricksinc  
Facebook:   / databricksinc  

Комментарии

Информация по комментариям в разработке