Recommendation Engines Using ALS in PySpark (MovieLens Dataset)

Описание к видео Recommendation Engines Using ALS in PySpark (MovieLens Dataset)

This tutorial provides an overview of how the Alternating Least Squares (ALS) algorithm works, and, using the MovieLens data set, it provides a code-level example of how to build out a collaborative-filtering recommendation engine using Pyspark.

One note on using the TrainValidationSplit mentioned at 6:01: A more appropriate solution which would incorporate cross-validation would be to use the cross validator functionality. This will allow you to use several different "folds" in the cross-validation step rather than using just a train set and a test set as in the TestValidationSplit. To incorporate this into the code, you should replace the "tvs = TrainValidationSplit(…" code chunk with this:

cv = CrossValidator(estimator=als,
estimatorParamMaps=param_grid,
evaluator=evaluator,
numFolds=3)

numFolds can be set to any integer you prefer. Typical number of folds is usually around 5. I used 3 here for purposes of time.

Also, be sure to replace the previous "tvs" in your code to "cv" in the "model = cv.fit(training)" code chunk as well.

Комментарии

Информация по комментариям в разработке