Benchmarking R functions for joining data frames (CC292)

Описание к видео Benchmarking R functions for joining data frames (CC292)

We often need to join two or more data frames to link different pieces of data together. What's the most efficient way to do this in R? In this Code Club, Pat shows how to use base R's merge function as well functions from the dplyr and data.table packages to illustrate how to perform an inner join, full join, left join, and right join. He then benchmarks their performance to see which inner join option is the fastest. This episode is part of an ongoing effort to develop an R package that implements the naive Bayesian classifier.

If you want to get a physical copy of R Packages: https://amzn.to/43pMR8L
If you want a free, online version of R packages: https://r-pkgs.org/

You can find my blog post for this episode at https://www.riffomonas.org/code_club/....

Check out the GitHub repository at the:
* Beginning of the episode: https://github.com/riffomonas/phyloty...
* End of the episode: https://github.com/riffomonas/phyloty...

#rstats #readr #vroom #data.table #read.delim #rdp #16S #classification #classifier #microbialecology #microbiome

Support Riffomonas by becoming a Patreon member!
  / riffomonas  

Want more practice on the concepts covered in Code Club? You can sign up for my weekly newsletter at https://shop.riffomonas.org/youtube to get practice problems, tips, and insights.

If you're interested in purchasing a video workshop be sure to check out https://riffomonas.org/workshops/

You can also find complete tutorials for learning R with the tidyverse using...
Microbial ecology data: https://www.riffomonas.org/minimalR/
General data: https://www.riffomonas.org/generalR/



0:00 Introduction
7:07 base R
10:06 {dplyr}
13:16 {data.table}
21:09 Joining data frames with different column names
26:37 Joining three data frames together
28:30 Benchmarking join methods

Комментарии

Информация по комментариям в разработке