Matthew Rocklin Dask A Pythonic Distributed Data Science Framework PyCon 2017

Описание к видео Matthew Rocklin Dask A Pythonic Distributed Data Science Framework PyCon 2017

"Speaker: Matthew Rocklin

Dask is a general purpose parallel computing system capable of Celery-like task
scheduling, Spark-like big data computing, and Numpy/Pandas/Scikit-learn level
complex algorithms, written in Pure Python. Dask has been adopted by the
PyData community as a Big Data solution.

This talk focuses on the distributed task scheduler that powers Dask when
running on a cluster. We'll focus on how we built a Big Data computing system
using the Python networking stack (Tornado/AsyncIO) in service of its data science stack (NumPy/Pandas/Scikit Learn). Additionally we'll talk about the challenges of effective task scheduling in a data science context (data locality, resilience, load balancing) and how we manage this dynamically with aggressive measurement and dynamic scheduling heuristics.


Slides can be found at: https://speakerdeck.com/pycon2017 and https://github.com/PyCon/2017-slides"

Комментарии

Информация по комментариям в разработке