Base conversion in R to represent DNA sequences in base 4 (CC270)

Описание к видео Base conversion in R to represent DNA sequences in base 4 (CC270)

Because DNA sequences contain 4 characters, they are often thought of as base 4 strings. Pat will show you how to carry out base conversion in R from base 4 to base 10. He explains the benefits of storing kmers and other DNA fragments in this format in bioinformatics approaches. Of course, he'll do it all with test driven development using tools from base R including strtoi. This is part of an ongoing effort to develop an R package that implements the naive Bayesian classifier.


If you want to get a physical copy of R Packages: https://amzn.to/43pMR8L
If you want a free, online version of R packages: https://r-pkgs.org/

You can find my blog post for this episode at https://www.riffomonas.org/code_club/....

#rdp #16S #classification #classifier #microbialecology #microbiome

Support Riffomonas by becoming a Patreon member!
  / riffomonas  

Want more practice on the concepts covered in Code Club? You can sign up for my weekly newsletter at https://shop.riffomonas.org/youtube to get practice problems, tips, and insights.

If you're interested in purchasing a video workshop be sure to check out https://riffomonas.org/workshops/

You can also find complete tutorials for learning R with the tidyverse using...
Microbial ecology data: https://www.riffomonas.org/minimalR/
General data: https://www.riffomonas.org/generalR/

0:00 Introduction
4:30 Quarternary numeral system: base4
5:40 Converting DNA sequence to base 4
13:55 Converting base 4 strings to integers
21:33 Using strtoi to accelerate conversion
25:49 Handling kmers with ambiguous base calls

Комментарии

Информация по комментариям в разработке