How Fast can Python Parse 1 Billion Rows of Data?

Описание к видео How Fast can Python Parse 1 Billion Rows of Data?

To try everything Brilliant has to offer—free—for a full 30 days, visit https://brilliant.org/DougMercer .
You’ll also get 20% off an annual premium subscription.

———————————————————————————————
Sign up for 1-on-1 coaching at https://dougmercer.dev
———————————————————————————————

The 1 billion row challenge is a fun challenge exploring how quickly we can parse a large text file and compute some summary statistics. The coding community created some amazingly clever solutions.

In this video, I walk through some of the top strategies for writing highly performant code in Python. I start with the simplest possible approach, and work my way through JIT compilation, multiprocessing, and memory mapping. By the end, I have a pure Python implementation that is only one order of magnitude slower than the highly optimized Java challenge winner.

On top of that, I show two much simpler, but just as performant solutions that use the polars dataframe library and duckdb (in memory SQL database). In practice, you should use these, cause they are incredibly fast and easy to use.

If you want to take a stab at speeding things up further, you can find the code here https://github.com/dougmercer-yt/1brc.

References
------------------
Main challenge - https://github.com/gunnarmorling/1brc
Ifnesi - https://github.com/ifnesi/1brc/tree/main
Booty - https://github.com/booty/ruby-1-billion/
Danny van Kooten C solution blog post - https://www.dannyvankooten.com/blog/2...
Awesome duckdb blog post - https://rmoff.net/2024/01/03/1%EF%B8%...
pypy vs Cpython duel blog post - https://jszafran.dev/posts/how-pypy-i...

Chapters
----------------
0:00 Intro
1:09 Let's start simple
2:55 Let's make it fast
10:48 Third party libraries
13:17 But what about Java or C?
14:17 Sponsor
16:04 Outro

Music
----------
"4" by HOME, released under CC BY 3.0 DEED, https://home96.bandcamp.com/album/res...

Go buy their music!

Disclosure
-----------------
This video was sponsored by Brilliant.

#python #datascience #pypy #polars #duckdb #1brc

Комментарии

Информация по комментариям в разработке