High performance programming in R

Описание к видео High performance programming in R

In this webinar, Benedikt discussed how comma-separated values (CSV) files were commonly used when working with data files, as they were easy to read for humans and supported by tools like Excel or R. However, these files had a downside in terms of performance and file size. To address this, the industry developed binary formats that were more efficient. Benedikt focused on the Arrow R package and the Parquet file format and how they could help save time and disk space.

Main Sections

00:00 Introduction
03:45 Data sizes increasing
05:26 Paquet file format
09:46 Moving from CSV to Parquet with arrow
11:52 Case study: Aggregating exposure information
14:21 Why switching to Parquets is worthwhile
17:28 Parquet with arrow is a valuable tool
20:10 Join the R Consortium
21:24 Q&A

More Resources

R Validation Hub Site: https://www.pharmar.org/
Main Site: https://www.r-consortium.org/
News: https://www.r-consortium.org/news
Blog: https://www.r-consortium.org/news/blog
Join: https://www.r-consortium.org/about/join
Twitter: https://twitter.com/rconsortium?lang=en
LinkedIn:   / r-consortium  
Mastodon: https://fosstodon.org/@RConsortium

Комментарии

Информация по комментариям в разработке