Advancing Spark - Bloom Filter Indexes in Databricks Delta

Описание к видео Advancing Spark - Bloom Filter Indexes in Databricks Delta

Data Lakes are notoriously bad at single record lookups, the kind of query where you are looking for a specific ID in amongst millions of records. We have ways of organising our data, through partitions and z-ordering, which helps this a little... but wouldn't it be great if we could just pop an index over the top?

Turns out we can - in this video Simon runs through a quick introduction to using Bloom Filter Indexes with Databricks Delta. We look briefly at the available documentation, before digging into a notebook example and the how the files are managed underneath!

To get started with Bloom indexes, see the documentation here: https://docs.databricks.com/delta/opt...

And the demo notebook is available from: https://docs.databricks.com/_static/n...

As always, don't forget to like & subscribe, and let us know what you think in the comments!

Комментарии

Информация по комментариям в разработке