Neural Matrix Factorization from scratch in PyTorch

Posted on Tue, Mar 2, 2021

A tutorial to understand the process of building a Neural Matrix Factorization model from scratch in PyTorch on MovieLens-1M dataset.

Dataset

After downloading and expanding the movielens-1m dataset, we will create the dataset class as the first step:

The name of our class is Rating_Dataset and it is getting inherited from PyTorch Dataset base class. The __getitem__ method is helping us in 2 ways: 1) It is reinforcing the type to [long, long, float] and returning the tensor version of the tuple for the given index id.

We are also creating a helper dataset class to put all the data processing functions under a single umbrella. This class contains 5 methods:

Evaluation criteria

Next, we are defining evaluation metrics.

The metrics function is first loading the user and item variables to the right device (e.g. to the GPU if it is enabled), then getting predictions from the model and then finally calculating (& returning) the hit_rate_at_k and ndcg_at_k values.

Neural Collaborative Filtering for Personalized Ranking

This model leverages the flexibility and non-linearity of neural networks to replace dot products of matrix factorization, aiming at enhancing the model expressiveness. In specific, this model is structured with two subnetworks including generalized matrix factorization (GMF) and MLP and models the interactions from two pathways instead of simple inner products. The outputs of these two networks are concatenated for the final prediction scores calculation.

After defining the dataset class and evaluation function, it is time to define the model architecture.

In this architecture, we are first creating the user and item embedding layers for both MLP and MF architectures, and with the help of PyTorch ModuleList, we are creating MLP architecture. Then, in the forward method, we are passing user and item indices list in the embedding layers and then concatenating and multiplying the MLP and MF embedding layers respectively. And finally, concatenating the MLP and MF feature layers and a logistic activation at the end.

Training and evaluation

We are using following hyperparameters to train the model:

Average epoch time is 90 seconds on Nvidia T4 GPU. Both hit_rate and ndcg values improves initially for first 4 epochs and then converged to a local (or global, I hope) minima.

The complete tutorial (+code) can be found here.