# A parallel recommendation engine in Julia.

Datetime:2016-08-23 01:12:03         Topic: Recommender System          Share        Original >>

## Introduction

Recommender systems play a pivotal role in various business settings like e-commerce sites, social media platforms, and other platforms involving user interaction with other users or products. Recommender systems provide valuable insights to gain actionable intelligence on these users.

Large Scale Recommender systems help in unraveling the latent information in the complex relational data between users and items.However mapping the users space to the items space to predict the interaction is a challenge. Inferring actionable information from a variety of data sources collected either implicitly like click patterns, browser history etc, or explicitly like ratings of books and movies, is what well-designed recommender systems do consistently well.

## Matrix Factorizations

Depending on the source of information on the users and the items, there are a variety of techniques to build recommender systems, each with a unique mathematical approach. Linear algebra and matrix factorisations are important to certain types of recommenders where user ratings are available and it is most ideal to apply methods like svd in such cases.

In matrix factorization the users and items are mapped onto a joint latent factor space of reduced dimension f, and the inner product of the user vector with the item vector gives the corresponding interaction. Dimensionality reduction is mainly about a more compact representation of the large training data which is obtained by matrix factorization. We want to quantify the nature or the characteristics of the movies defined by a certain number of aspects (factors), i.e., we are trying to generalize the information (independent and unrelated ratings matrix) in a concise and descriptive way.

Example :Let us consider a simple example to figure out how matrix factorization helps in predicting the likelihood of a user liking a movie or not. For sake of brevity, we have couple of users, Joe and Jane and couple of movies, Titanic and Troll 2. The users and the movies are characterized based on certain number of factors as show in the below tables.

Factors/Movies Titanic Troll 2
Romance 4 1
Comedy 2 4
Box Office success 5 2
Drama 3 2
Horror 1 4
Factors/Movies Joe Jane
Romance 4 1
Comedy 2 4
Box Office success 5 2
Drama 3 2
Horror 1 4

Consider Joe to be characterized by vector [4 2 5 3 1] , which suggests that Joe likes Romance and big hit movies and not so much horror or comedy. Similarly Jane likes comedy horror and she is not very particular about box office success of the movies, neither is she a big fan of romance movies.

The movies Titanic, is a popular romance movie, where as the movie Troll 2, is not so popular and horror comedy. It is intuitively obvious that Joe will end up liking Titanic and Jane will like Troll 2. This is based on how the users and movies score on the 5 factors. Using Cosine distance as shown in the below table, confirms this.

Factors/Movies Joe Jane
Titanic 0.94 0.67
Troll 2 0.50 0.97

With large rating data matrix, like in the NETFLIX dataset which had around 20 thousand movies and 0.5 million users, mapping all the users and the movies in the above way is impossible. This is where matrix factorization helps in factoring the Rating matrix into user matrix and movie matrix.

## Alternating Least Squares

Let

be the user feature matrix whereand, and letbe the item or movie feature matrix, whereand. Hereis the number of factors, i.e., the reduced dimension or the lower rank, which is determined by cross validation. The predictions can be calculated for any user-movie combination,, as

.

Here we minimize the loss function of

and

as the condition in the iterative process of obtaining these matrices. Let us start by considering the loss due to a single prediction in terms of squared error: $$\mathcal{L}^2(r,{u},{m})=(r-<{u},{m}>)^2.$$

Based on the above equation generalizing it for the whole data set, the empirical total loss as: $$\mathcal{L}^{emp}(R,U,M)=\frac{1}{n} \sum_{(i,j) \in I}\mathcal{L}^2(r_{ij},{u_i},{m_j}),$$ where

is the known ratings dataset having

ratings.

## Julia recommender system

The package RecSys.jl is a package for recommender systems in Julia, it can currently work with explicit ratings data. For preparing the input create an object of ALSWR type. This takes two input parameters, firstly input file location, and second optional input is the variable par which specifies the type of parallelism. The parallelism is about how the data is shared/distributed across the processing units. When par=ParShemm the data is present at one location and is shared across the processing units, when par=ParChunk the data is distributed across the processing units as chunks. For this report only sequential timings were captured, i.e., with nprocs =1.

rec=ALSWR(&quot;/location/to/input/file/File.delim&quot;, par=ParShemm)

The file can be any tabular structured data, delimited by any character, which needs to be specified,

The call to the function to create a model is train(rec, 10, 10) where 10 is the number of iterations to run and 10 is the number of factors.

## Performance Analysis

### Sequential version

The sequential performance of the ALS algorithm is tested on Apache Spark and Julia. The scala example code shown in the mentioned link was run with rank = 10 and iterations = 10 . The timing of the ALS.train() function is recorded in order to analyse the core computational part only. For the same parameters in Julia, the timings for the computationally intensive train() function is captured.

The algorithm took around 500 seconds to train on the NETFLIX dataset on a single processor, which is good for data as large as 1 billion ratings.

The below table also summarises the performance(single processor) on various other datasets like the Movielens and lastFM.

Parameters/Datasets Size (No. of interactions) Factorization time (in secs)
Movielens 20 Million 119
Last.fm 0.5 Billion 2913

The NETFLIX dataset is not available publicly anymore, however datasets for movielens and lastfm can be downloaded. Please refer the dataset specific julia example scripts in the examples/ directory for more details on how to model the recommender system for the respective datasets.

### Parallel version

Parallelism is made possible in Julia mainly 2 ways, a). Multiprocessing and b). Multithreading. The multithreading development is onging. However the multiprocessing based parallel processing in Julia is mature and mainly based around Tasks which are concurrent function calls. The implementation details are not covered here, the following graph summarises the performance of parallel ALS implementation in Julia and Spark,

In the above graph, Julia Distr breaks up the problem and uses Julia’s distributed computing capabilities. Julia Shared uses shared memory through mmap arrays. Julia MT is the multithreading version of the ALS. While multi-threading in Julia is nascent, it already gives parallel speedups. There are several planned improvements to Julia’s multi-threading that we expect will make the multi-threaded ALS faster than the other parallel implementations.

The experiments were conducted by invoking spark with flags --master local[1] . The experiments were conducted on a 30 core Intel Xeon machine with 132 GB memory and 2 hyperthreads per core.

Credits to Tanmay KM for contributing towards the parallel implementation of the package.

Apart from methods to model the data and check for accuracy, there are also abilities to make recommendations for users who have not interacted with items, by picking the most likely items the user would interact with. Hence in RecSys.jl we have a fast, scalable and accurate recommender system which can be used to for end to end system. Currently we are working on a demo of such a recommender system with a UI interface too implemented in Julia.