Efficient CSV Parsing - On the Complexity of Simple Things - Pedro Holanda

Описание к видео Efficient CSV Parsing - On the Complexity of Simple Things - Pedro Holanda

DSDSD - THE DUTCH SEMINAR ON DATA SYSTEMS DESIGN:
We hold bi-weekly talks on Fridays from 3:30 PM to 5 PM CET for and by researchers and practitioners designing (and implementing) data systems. The objective is to establish a new forum for the Dutch Data Systems community to unite, foster collaborations between its members, and bring in high-quality international speakers. We would like to invite all researchers, especially PhD students, who are working on related topics to join the events. It is an excellent opportunity to receive feedback early on from researchers in your field.

Website: https://dsdsd.da.cwi.nl/
X: https://x.com/dsdsdnl

Speaker: Pedro Holanda

Title: Efficient CSV Parsing: On the Complexity of Simple Things

Abstract: In this talk, we will revisit different CSV parsing
implementations in DuckDB and compare them with the current
implementation. The bulk of the talk is to discuss the design and
implementation decisions in DuckDB's current CSV Parser. In particular,
we will examine the parallel algorithm, the CSV buffer manager, and the
transitions of the CSV state machine. Disclaimer: This talk is not for
the faint of heart; some very exotically built CSV files will be depicted.

Bio: Pedro is an early contributor to DuckDB and currently works as a
software engineer at DuckDB Labs, focusing on core and integration
aspects of DBMS technology. He completed his PhD at the Database
Architectures group at CWI, researching Indexes for Interactive Data
Analysis.

Комментарии

Информация по комментариям в разработке