Modern Spark DataFrame & Dataset | Apache Spark 2.0 Tutorial

Описание к видео Modern Spark DataFrame & Dataset | Apache Spark 2.0 Tutorial

Adam Breindel, lead Spark instructor at NewCircle, talks about which APIs to use for modern Spark with a series of brief technical explanations and demos that highlight best practices, latest APIs, and new features. (Topics Indexed Below)

We'll look at how Dataset and DataFrame behave in Spark 2.0, Whole-Stage Code Generation, and go through a simple example of Spark 2.0 Structured Streaming (Streaming with DataFrames) that you can run in your own free instance of Databricks.

00:00:40 - Intro: What is "Modern Spark"
00:01:26 - DataFrame
00:05:07 - Why not use RDD?
00:09:15 - Intro to DataFrame and Dataset
00:10:13 - DataFrame versus Dataset
00:14:42 - Dataset Queries and Dataset with Scala classes
00:19:07 - Spark Query Optimizer
00:23:26 - Whole-Stage Codegen
00:27:21 - Hive integration
00:29:28 - Wrapping Up DataFrame/Dataset Benefits
00:30:54 - One More Thing - Structured Streaming
00:36:47 - Conclusion

Try the Examples:
+ Databricks Community Edition: https://databricks.com/try
+ Get this Notebook: https://bit.ly/get-notebook

----------------------------------------------------------------------------------------------
SPARK 2.0 TRAINING | NewCircle | Onsite & Public Classes
----------------------------------------------------------------------------------------------
+ Programming for Spark 2.0 (3 days):
http://bit.ly/spark-prog-newcircle

+ Spark 2.0 for Machine Learning & Data Science (3 days):
http://bit.ly/spark-ml-newcircle

Комментарии

Информация по комментариям в разработке