🔊 Recorded at PyCon DE & PyData 2025, April 23, 2025
https://2025.pycon.de/program/PRRPQ3/
🎓 A practical guide to building production-ready data platforms using pure Python tools, demonstrated through Star Wars franchise analytics.
Speakers:
Eric Thanenthiran
Description:
This presentation examines the implementation of pure Python open source data platforms, focusing on practical architecture and tooling choices for small to medium-scale data operations. Thanenthiran demonstrates the construction of a complete data stack using Star Wars datasets as an example, incorporating five key components: data sources, pipelines, data storage, transformation, and orchestration. The stack utilizes DLT for data pipelines, DuckDB for storage, DBT for transformations, Dagster for orchestration, and Streamlit for visualization. The presentation addresses critical considerations in data platform development, including schema evolution, data quality testing, and lineage tracking. Particular attention is given to the medallion architecture approach, progressing data through raw, staging, domain, and curated layers. The implementation demonstrates handling both API and file-based data sources, transformation logic management, and effective data visualization for non-technical stakeholders. While the demonstrated stack is optimized for batch processing of datasets up to tens of gigabytes, Thanenthiran discusses considerations for scaling and alternatives for larger implementations. The presentation emphasizes practical implementation patterns and real-world considerations in data platform development, providing insights for engineers building initial data infrastructure.
⭐️ About PyCon DE & PyData:
The PyCon DE & PyData conference unite the Python, AI, and data science communities, offering a unique platform for collaboration and innovation. The PyCon DE & PyData 2025 conference, provided an exceptional experience, fostering deeper connections within the Python community while showcasing advancements in AI and data science. Attendees enjoyed a diverse and engaging program, solidifying the event as a highlight for Python and AI enthusiasts nationwide.
Follow us:
• LinkedIn: / 28908640
• X: https://www.x.com/pyconde
Links:
• Conference website: http://pycon.de
• Other sessions: https://2025.pycon.de/talks/
The conference is organized by
• Python Softwareverband e.V.: http://pysv.org
• NumFOCUS Inc.: http://numfocus.org
• Pioneers Hub gemeinnützige GmbH: http://pioneershub.org
If you enjoyed this session, please like, comment, and subscribe to our channel for more insightful talks and discussions.
Share this video with your network to spread the knowledge!
Hashtags:
#Python #PyConDE #PyData #OpenSource #AI #DataScience #MachineLearning #SoftwareDevelopment #LLMs #Community
Acknowledgements:
Special thanks to all the volunteers and sponsors who made this event possible.
About:
Python Softwareverband e.V.:
PySV is a non-profit that promotes the use and development of Python in Germany through events, education, and advocacy, fostering an open Python community.
NumFOCUS Inc.
supports open-source scientific computing by providing financial and logistical support to key projects like NumPy and Jupyter, promoting sustainable development and collaboration.
Pioneers Hub gemeinnützige GmbH:
is a non-profit fostering innovation in AI and tech by connecting experts and promoting knowledge exchange through events and collaborative initiatives.
www.pydata.org
PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.
Информация по комментариям в разработке