Comparing duckdb and duckplyr to tibbles, data.tables, and data.frames (CC279)

Описание к видео Comparing duckdb and duckplyr to tibbles, data.tables, and data.frames (CC279)

duckdb has quickly grown in popularity as a database platform that is super fast with large datasets. Watch as Pat shows how to generate a duckdb database and access values from the database. He'll also compare the performance of using duckdb directly and using duckplyr or using tibbles, data.tables, and data.frames. Pat will discuss how the perforance changes by the number of different key values and the size of the database. You'll likely be surprised by the results! This episode is part of an ongoing effort to develop an R package that implements the naive Bayesian classifier.

If you want to get a physical copy of R Packages: https://amzn.to/43pMR8L
If you want a free, online version of R packages: https://r-pkgs.org/

You can find my blog post for this episode at https://www.riffomonas.org/code_club/....

Check out the GitHub repository at the:
* Beginning of the episode: https://github.com/riffomonas/phyloty...
* End of the episode: https://github.com/riffomonas/phyloty...


#rstats #microbenchmark #vectors #rdp #16S #classification #classifier #microbialecology #microbiome

Support Riffomonas by becoming a Patreon member!
  / riffomonas  

Want more practice on the concepts covered in Code Club? You can sign up for my weekly newsletter at https://shop.riffomonas.org/youtube to get practice problems, tips, and insights.

If you're interested in purchasing a video workshop be sure to check out https://riffomonas.org/workshops/

You can also find complete tutorials for learning R with the tidyverse using...
Microbial ecology data: https://www.riffomonas.org/minimalR/
General data: https://www.riffomonas.org/generalR/


0:00 Introduction
6:07 Improve construction of data.table objects
11:12 Performance of which vs. logical
16:04 Improved access to values in data.table objects
20:31 Using duckdb() to store and access data
27:11 Using duckplyr() to store and access data
30:05 Evaluating sensitivity to number of rows and sparsity
32:11 Improving performance of sparse matrix construction

Комментарии

Информация по комментариям в разработке