Dask in 8 Minutes: An Introduction

Описание к видео Dask in 8 Minutes: An Introduction

This video gives a general overview of the Dask project.

What is Dask?

Dask is a flexible library for parallel computing in Python.

Dask is composed of two parts:

1. Dynamic task scheduling optimized for computation. This is similar to Airflow, Luigi, Celery, or Make, but optimized for interactive computational workloads.

2. “Big Data” collections like parallel arrays, DataFrames, and lists that extend common interfaces like NumPy, Pandas, or Python iterators to larger-than-memory or distributed environments. These parallel collections run on top of dynamic task schedulers.

Dask emphasizes the following virtues:

Familiar: Provides parallelized NumPy array and Pandas DataFrame objects

Flexible: Provides a task scheduling interface for more custom workloads and integration with other projects.

Native: Enables distributed computing in pure Python with access to the PyData stack.

Fast: Operates with low overhead, low latency, and minimal serialization necessary for fast numerical algorithms

Scales up: Runs resiliently on clusters with 1000s of cores

Scales down: Trivial to set up and run on a laptop in a single process

Responsive: Designed with interactive computing in mind, it provides rapid feedback and diagnostics to aid humans

Share your feedback with us in the comments and let us know:

Did you find the video helpful?
Have you used Dask before?

Learn more at dask.org

KEY MOMENTS
00:00 - Intro
00:08 - What does Dask do?
01:08 - Dask Array
01:43 - Where is Dask used?
02:58 - Examples of application
05:46 - How Does Dask Work?
06:15 - Where is Dask run?
00:06:48 Dask Open Source Community

Комментарии

Информация по комментариям в разработке