Producing Arrow C Data with nanoarrow

Описание к видео Producing Arrow C Data with nanoarrow

The ability to produce data Arrow C Data is vitally important - it can serve as the foundation for a dataframe library that wants to create low-level results, or may be the means with which a database exchanges data with downstream tools and drivers.

Regardless of the use case, nanoarrow is still a great tool for the job of exchanging Arrow data. In our previous two videos, we focused on consuming data created by other tools, but in this video we are going to move our attention towards producing data that other tools can consume. Along the way, we are going to look at some of the memory pitfalls that can happen, particularly when exchanging data via Python capsules, and highlight ways that you can confidently move data from your extensions.

00:00 - Introduction and Solution Overview
01:41 - Test-driven development for Array production
03:07 - Building the Schema for Array export
05:40 - Building the Array for Array export
08:37 - Building Python capsules for our Array and Schema
15:32 - Validating and checking for memory leaks
18:00 - Test-driven development for Stream production
20:33 - Building the Schema for Stream export
22:39 - Building the Array for Stream export
27:48 - Building a Python capsule for our Stream
32:30 - Test-driven development for a dataframe "sum" computation
35:45 - Determining sum-able columns
38:50 - Building the result Schema for summation
43:22 - Refactoring our summation inner loop
45:41 - Forming our output Array of summation
48:37 - Building output Python capsule for summation result
50:48 - Final compilation, debug, and issue cleanup

Комментарии

Информация по комментариям в разработке