Introduction to AmpLab Spark Internals

Описание к видео Introduction to AmpLab Spark Internals

Matei Zaharia of UC Berkeley's AmpLab presents an introduction to Spark Internals 2012-12-18 at Yahoo in Sunnyvale, Ca. The presentation is 1 hour 14 minutes long.
Summary
2:32 Spark Project Goals
4:48 Spark Code base Size
5:59 Code base breakdown by module
8:45 Components
10:41 Example Job
12:03 RDD Graph
14:43 Data Locality
15:48 In More Detail: Life of a Job
16:15 Scheduling Process
27:11 RDD Abstraction
27:52 RDD Interface
29:34 Example: HadoopRDD
30:28 Example: FilteredRDD
31:32 Example: JoinedRDD
32:47 Discussion of source code
38:25 Dependency Types, Narrow and Wide
39:49 DAG Scheduler
40:43 Discussion of source code
42:05 Scheduler Optimizations
45:39 Task Details
51:07 Worker
52:00 Other Components: BlockManager
52:16 Other Components: CommunicationsManager
52:24 Other Components: MapOutputTracker
52:42 Extending Spark
52:53 Extension points: RDD, SchedulerBackend, spark.serializer
53:38 What People Have Done
53:39 Possible Future Extensions
54:15 As an Exercise to prepare for extending Spark
54:50 How to contribute
54:52 Development Process: Issue tracking, developer list, master on Github
Follow code style and add tests

Комментарии

Информация по комментариям в разработке