Efficient Data Analysis on Larger-than-Memory Data with DuckDB and Arrow

Описание к видео Efficient Data Analysis on Larger-than-Memory Data with DuckDB and Arrow

Speaker: Thomas Mock, Customer Enablement Lead at RStudio

This lightning talk will provide overview on how dplyr, arrow, and duckdb can be leveraged together to quickly explore and analyze larger-than-memory datasets. arrow provides the ability to query on-disk data or datasets with dplyr syntax even if they cannot fit into memory. duckdb further provides a zero-copy interface on top of arrow to expand the available functions and really ramp up the ability to interactively query the data without having to collect into memory. Combining arrow, duckdb, and dplyr together really makes data analysis with larger-than-memory datasets a breeze!

Комментарии

Информация по комментариям в разработке