Joining Datasets: How to join 2 datasets

Описание к видео Joining Datasets: How to join 2 datasets

https://bigdataelearning.com/course/a...
https://bigdataelearning.com/courses
https://bigdataelearning.com
In this video, I am going to show you 2 ways to join the datasets.

We have a product dataset with 2 columns productId, and productName.

We also have a customer dataset with 2 columns productId, and customerName.

We need to join the 2 datasets based on the product Id. First let’s do a dataframe join. Val resultDs equals product Ds dot join customer Ds comma, here we are specifying the ON condition. Product Id of product Ds equals product id of customer Ds.

This gives us the result Dataset, as expected. Here we have product id, product Name and customer name.

the second way to join 2 datasets is by creating the spark SQL temporary table.
I am creating product table for the product dataset. Similarly, I am creating customer table for the customer dataset.

val joined dataset equals sql context dot sql. Within braces, I am specifying the SQL query to join the 2 tables. I am specifying the ON condition as productTable dot product id equals customer table dot product id.

This gives us the joined dataset. Here we have product id, product name, and customer name columns as expected.

Комментарии

Информация по комментариям в разработке