Creating an executable R script that you can parallelize with GNU Make (CC128)

Описание к видео Creating an executable R script that you can parallelize with GNU Make (CC128)

Being able to write a stand alone executable program written as an R script can be a powerful way to speed up an analysis and make it more reproducible. In this episode of Code Club, Pat modifies a script he made earlier so that it does a single training/testing split and uses GNU Make to control the creation of 100 splits. Wow! Finally, he pulls together the data from those splits to generate some summary figures. This episode is part of a wider effort to demonstrate the utility of the #mikropml package his lab created to facilitate machine learning analyses. The data he uses is from a microbiome study his lab has published looking for biomarkers associated with colorectal cancer.

In this episode, Pat will use functions from the #mikropml R package and data handling functions from dplyr in RStudio while running everything as an #Rscript with #Make. The accompanying blog post can be found at https://www.riffomonas.org/code_club/....

Previous episodes talking about GNU Make and R scripts:
   • Automating data analyses with make: W...  
   • Using Make and git to refactor a data...  
   • How to automate data analysis with R ...  

If you're interested in taking an upcoming 3 day R workshop, email me at [email protected]!

R: https://r-project.org
RStudio: https://rstudio.com
Raw data: https://github.com/riffomonas/raw_dat...
Workshops: https://www.mothur.org/wiki/workshops

You can also find complete tutorials for learning R with the tidyverse using...
Microbial ecology data: https://www.riffomonas.org/minimalR/
General data: https://www.riffomonas.org/generalR/

0:00 Introduction
3:08 Executable R script to train model
9:36 Make rule to build individual seeds
12:11 R script to pool data for seeds
16:18 Make rule to pool data
19:07 Running Make in parallel to build all seeds
20:23 Analyzing data from 100 splits
23:56 Recap

Комментарии

Информация по комментариям в разработке