Pandas + Dask DataFrame 2.0 - Comparison to Spark, DuckDB and Polars [PyCon DE & PyData Berlin 2024]

Описание к видео Pandas + Dask DataFrame 2.0 - Comparison to Spark, DuckDB and Polars [PyCon DE & PyData Berlin 2024]

🔊 Recorded at PyCon DE & PyData Berlin 2024, 23.04.2024
https://2024.pycon.de/program/N9DEVW/

🎓 Watch how Dask DataFrame 2.0's improved performance and new features compare to Spark, DuckDB, and Polars, offering a faster and more robust system for big data processing.

Speakers:
Florian Jetter, Patrick Hoefler

Description:
Florian Jetter and Patrick Hoefler discussed the significant enhancements to Dask, a Python library for distributed computing that integrates well with pandas. Historically, Dask was user-friendly but lacked robust performance. The re-implementation of the DataFrame API has addressed these concerns, making Dask faster and more efficient.

Patrick Hoefler, a pandas core team member and Dask maintainer at Coiled, highlighted the improvements in Dask, including a new shuffle algorithm, a logical query planning layer, and a reduced memory footprint. These changes have led to a better user experience and a more robust system overall, especially when compared to tools like Spark, DuckDB, and Polars.

The speakers emphasized the seamless integration of Dask with pandas and other PyData stack libraries, making it a compelling option for big data applications. They compared Dask's performance against other tools using TPC-H benchmarks. They also discussed future developments, including extending the logical query planning layer to frameworks like Dask Array and XArray.

⭐️ About PyCon DE & PyData Berlin:
The PyCon DE & PyData conference unite the Python, AI, and data science communities, offering a unique platform for collaboration and innovation. The PyCon DE & PyData Berlin 2024 conference, hosted in partnership with the local Berlin PyData chapter, provided an exceptional experience, fostering deeper connections within the Python community while showcasing advancements in AI and data science. Attendees enjoyed a diverse and engaging program, solidifying the event as a highlight for Python and AI enthusiasts nationwide.

Follow us:
• LinkedIn:   / 28908640  
• X: https://www.x.com/pyconde
• X: https://www.x.com/pydataberlin

Links:
• Conference website: http://pycon.de
• Related sessions: http://2024.pycon.de/program/categori...

The conference is organized by
• Python Softwareverband e.V.: http://pysv.org
• NumFOCUS Inc.: http://numfocus.org
• Pioneers Hub gemeinnützige GmbH: http://pioneershub.org


If you enjoyed this session, please like, comment, and subscribe to our channel for more insightful talks and discussions.
Share this video with your network to spread the knowledge!

Hashtags:
#Python #PyConDE #PyData #OpenSource #AI #DataScience #MachineLearning #SoftwareDevelopment #LLMs #Community

Acknowledgements:
Special thanks to all the volunteers and sponsors who made this event possible.

About:
Python Softwareverband e.V.:
PySV is a non-profit that promotes the use and development of Python in Germany through events, education, and advocacy, fostering an open Python community.

NumFOCUS Inc.
supports open-source scientific computing by providing financial and logistical support to key projects like NumPy and Jupyter, promoting sustainable development and collaboration.

Pioneers Hub gemeinnützige GmbH:
is a non-profit fostering innovation in AI and tech by connecting experts and promoting knowledge exchange through events and collaborative initiatives.
www.pydata.org

PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.

Комментарии

Информация по комментариям в разработке