Oracle Machine Learning for Spark

Описание к видео Oracle Machine Learning for Spark

We saw how Oracle Machine Learning for Spark offers interfaces to run Machine Learning algorithms on top of Data Lakes, using Spark to distribute computation across Nodes, and brings integration with the Big Data ecosystem that allows for manipulation tables in HIVE and Impala, as well as integration with HDFS and the Oracle Database, using the R language as front-end.

It makes the open source R scripting language and environment ready for the enterprise and big data. Designed for problems involving both large and small volumes of data, Oracle Machine Learning for Spark integrates R with Data Lakes, allowing users to execute R commands and scripts for data processing, statistical and machine learning analytics on HIVE, IMPALA, Spark DataFrame tables and views using R and Spark SQL syntax. Many familiar R functions are overloaded and translate R functions into SQL for in-Data Lake execution.

Oracle Machine Learning consists of complementary components supporting scalable machine learning algorithms for in-database and big data environments (including Cloud and on-premises), notebook technology, SQL, Python and R APIs, and Hadoop/Spark environments.

The Slides used in the presentation can be found in the Resources section below.

Video highlights:
04:50 Introduction to Oracle Machine Learning for Spark
07:10 Oracle Machine Learning for Spark integration
09:56 OML4Spark R language API
11:40 OML4Spark performance benchmark
13:55 OML4Spark benefits for Spark MLlib on users on R
17:20 Demo - Manipulating HDFS data
22:00 Demo - Manipulating HIVE, IMPALA and Spark DataFrames
36:48 Demo - Using OML4Spark ML models to predict Bike Demand
43:45 Demo - OML4Spark Cross-Validation and Classification Model Selection
47:54 Demo - Benchmark of OML4Spark GLM Logistic on 100mi records
49:26 - OML4Spark Roadmap
51:09 - Q&A

Комментарии

Информация по комментариям в разработке