Music30 Recommendation System

Google Colaboratory

https://colab.research.google.com/drive/1mfijMYGZV-Ve59H13ezWh_w_K0WCTra0#scrollTo=ej80XDzTYvHc

Weights & Biases

Weights & Biases, developer tools for machine learning

https://wandb.ai/sparsh121/SARS%20Music30%20x1/overview?workspace=user-sparsh121

Introduction

Due to complex user behavior dynamics (listener's intent, drift in interest, repeated listening behavior, music reminding, playing order) and lack of music ratings (very few explicit ratings given by the user), the music recommendation systems face challenges in providing optimal music recommendations to the listener.

The objective is to predict the subsequent user action(s), given the last N actions by the user and other types of information.

Data exploration

30music dataset downloaded from RecSys Lab official website.
This dataset contains 31M play events from 45K users over 5.6M tracks collected via Last-FM public API in 1 year, from 20 Jan 2014 to 20 Jan 2015 (about 4GB of uncompressed CSV file).
Play events are grouped into user play sessions, where each session is an ordered list of play events that are assumed to have consequently listened without interruption. Two consecutive play events belong to the same session if the gap is not larger than 800 seconds. The dataset contains 2.7M user sessions, with an average of 11 sessions per user, each one, in turn, composed of 11 tracks, on average.
The 30Music dataset also provides 4.1M user love preferences and 57K user-created playlists. Despite being of undeniable interest for recommender systems, we have not considered explicit feedback from users in our experiments. We focused our attention on implicit feedbacks only - i.e., the user play events and play sessions - since they are the most challenging factors in terms of size.

Methods

Sequence-aware model

Use sequence-aware algorithms that will take into account the user behaviour dynamics.

Popularity based Recommender

Here we fit the recommedation algorithm over the sessions in the training set. Popularity Recommender simply recommends items ordered by their popularity in the training set. It doesn't have any hyper-parameter.

Frequent Sequential Mining

Here we fit the recommedation algorithm over the sessions in the training set.

This algorithm extract Frequent Sequential Patterns from all the training sequences. Patterns are having support lower than minsup are discarded (support = # occurrences of a pattern in the traning data).Recommendations are then generated by looking for patterns having a prefix corresponding to the last [max_context, min_context] elements in the user profile, taken in order. Matches are then sorted by decreasing confidence score (ratio between the support of the matched rule and the support of the context). Matches having confidence below minconf are discarded.

The class FSMRecommender has the following initialization hyper-parameters:

minsup: the minimum support threshold. It is interpreted as relative count if in [0-1], otherwise as an absolute count. NOTE: Relative count required for training with SPFM (faster).
minconf: the minimum confidence threshold. Use to filter irrelevent recommendations.
max_context: the maximum number of items in the user profile (starting from the last) that will be used for lookup in the database of frequent sequences.
min_context: the minimum number of items in the user profile (starting from the last) that will be used for lookup in the database of frequent sequences.
spmf_path: path to SPMF jar file. If provided, SPFM library will be used for pattern extraction (algorithm: Prefix Span). Otherwise, use pymining, which can be significantly slower depending on the sequence database size.
db_path: path to the sequence database file

Markov Chain Recommender

Here we fit the recommedation algorithm over the sessions in the training set.This recommender is based on the MarkovChainRecommender implemented from:

Shani, Guy, David Heckerman, and Ronen I. Brafman. "An MDP-based recommender system." Journal of Machine Learning Research 6, no. Sep (2005): 1265-1295. Chapter 3-4

This recommender computes the item transition matrices for any Markov Chain having order in [min_order, max_order]. Each individual Markov Chain model employs some heuristics like skipping or clustering to deal better with data sparsity. Recommendations are generated by sorting items by their transition probability to being next, given the user profile. The scores coming from different MC are weighted inversely wrt their order.

The class MixedMarkovChainRecommender has the following initialization hyper-parameters:

min_order: the minimum order of the Mixed Markov Chain
max_order: the maximum order of the Mixed Markov Chain

Factorized Personalized Markov Chain (FPMC) Recommender

Here we fit the recommedation algorithm over the sessions in the training set.This recommender is based on the following paper:

Rendle, S., Freudenthaler, C., & Schmidt-Thieme, L. (2010). Factorizing personalized Markov chains for next-basket recommendation. Proceedings of the 19th International Conference on World Wide Web - WWW ’10, 811

In short, FPMC factorizes a personalized order-1 transition tensor using Tensor Factorization with pairwise loss function akin to BPR (Bayesian Pairwise Ranking).

TF allows to impute values for the missing transitions between items for each user. For this reason, FPMC can be used for generating personalized recommendations in session-aware recommenders as well.

In this notebook, you will be able to change the number of latent factors and a few other learning hyper-parameters and see the impact on the recommendation quality.

The class FPMCRecommender has the following initialization hyper-parameters:

n_factor: (optional) the number of latent factors
learn_rate: (optional) the learning rate
regular: (optional) the L2 regularization coefficient
n_epoch: (optional) the number of training epochs
n_neg: (optional) the number of negative samples used in BPR learning

Prod2Vec Recommender

Here we fit the recommedation algorithm over the sessions in the training set.

This is simplified implementation of the following:

Grbovic, Mihajlo, Vladan Radosavljevic, Nemanja Djuric, Narayan Bhamidipati, Jaikit Savla, Varun Bhagwan, and Doug Sharp. "E-commerce in your inbox: Product recommendations at scale." In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1809-1818. ACM, 2015.

This implementation uses the gensim implementation of Word2Vec to compute item embeddings using the skip-gram model

Recommendations are generated by returning the k-nearest neighbors of the last items in the user profile, whose relevance is weighted using a simple exponential decay (the last item in the user profile is the most relevant one, and the first item the least relevant).

The original paper contains other variants of this algorithm (namely bagged-prod2vec and prod2vec-cluster) which are not subject of this tutorial.

The class Prod2VecRecommender has the following initialization hyper-parameters:

min_count: the minimum item frequency. Items less frequent that min_count will be pruned
size: the size of the embeddings
window: the size of the context window
decay_alpha: the exponential decay factor used to discount the similarity scores for items back in the user profile. Lower values mean higher discounting of past user interactions. Allows values in [0-1]
workers: the number of threads used for training

Session-based RNN

Here we fit the recommedation algorithm over the sessions in the training set.

This is a simplified interface to Recurrent Neural Network models for Session-based recommendation. Based on the following two papers:

Recurrent Neural Networks with Top-k Gains for Session-based Recommendations, Hidasi and Karatzoglou, CIKM 2018
Personalizing Session-based Recommendation with Hierarchical Recurrent Neural Networks, Quadrana et al, Recsys 2017

we will consider the session-based (non-personalized) version of the algorithm.

Each item in the current user session is first encoded either using 1-hot encoding or a dense embedding vector. The item representation is then forwarded to one or more Gated Reucurrent Unit (GRU) layers, which "mix" the information coming from the past steps of the sequence with the representation of the current item. The last hidden state of the network is finally use to compute the likelihood scores for the next items by using one out of several loss functions (e.g. cross-entropy, BPR, TOP1, BPR-max, TOP1-max, etc.).

For simplicity, we only support 1-hot encoded inputs and the BPR-max loss function here.

The hyper-parameters of the model are:

session_layers: number of units per layer used at session level. It has to be a list of integers for multi-layer networks, or a integer value for single-layer networks.
user_layers: number of units per layer used at user level. Required only by personalized models. (None in this case)
batch_size: the mini-batch size used in training
learning_rate: the learning rate used in training (Adagrad optimized)
momentum: the momentum coefficient used in training
dropout: it's a float value for the hidden-layer(s) dropout.
epochs: number of training epochs
personalized: whether to train a personalized model using the HRNN model (False in this case).

NOTE: GRU4Rec originally has many more hyper-parameters. Going through all of them is out, but we suggest to check-out the original source code here if you are interested.

Personalized RNN-based Recommender

Here we fit the recommedation algorithm over the sessions in the training set.

This is a simplified interface to Recurrent Neural Network models for Session-based recommendation. Based on the following two papers:

Recurrent Neural Networks with Top-k Gains for Session-based Recommendations, Hidasi and Karatzoglou, CIKM 2018
Personalizing Session-based Recommendation with Hierarchical Recurrent Neural Networks, Quadrana et al, Recsys 2017

we will consider the session-aware (personalized) version of the algorithm.

Each user session goes through a session RNN, which models short-term user preferences. At the end of each session, the state of the session RNN is used to update a user RNN, which models more long-term user preferences. It's state is passed forward to the next session RNN, which can now personalize recommendations depending on both short-term and long-term user interests.

The hyper-parameters of the model are:

session_layers: number of units per layer used at session level. It has to be a list of integers for multi-layer networks, or a integer value for single-layer networks.
user_layers: number of units per layer used at user level. Required only by personalized models. It has to be a list of integers for multi-layer networks, or a integer value for single-layer networks.
batch_size: the mini-batch size used in training
learning_rate: the learning rate used in training (Adagrad optimized)
momentum: the momentum coefficient used in training
dropout: it's a 3-tuple with the values for the dropout of (user hidden, session hidden, user-to-session hidden) layers.
epochs: number of training epochs
personalized: whether to train a personalized model using the HRNN model (True in this case).

NOTE: HGRU4Rec originally has more hyper-parameters. Going through all of them is out from the scope, but we suggest to check-out the original source code here in case you are interested.

KNN-based Recommender

Here we fit the recommedation algorithm over the sessions in the training set.

The class KNNRecommender takes the following initialization hyper-parameters:

model: One among the following KNN models:
- iknn: ItemKNN, item-to-item KNN based on the last item in the session to determine the items to be recommended.
- sknn: SessionKNN, compares the entire current session with the past sessions in the training data to determine the items to be recommended.
- v-sknn: VMSessionKNN, use linearly decayed real-valued vectors to encode the current session, then compares the current session with the past sessions in the training data using the dot-product to determine the items to be recommended.
- s-sknn: SeqSessionKNN, this variant also puts more weight on elements that appear later in the session by using a custom scoring function (see the paper by Ludewng and Jannach).
- sf-sknn: SeqFilterSessionKNN, this variant also puts more weight on elements that appear later in the session in a more restrictive way by using a custom scoring function (see the paper by Ludewng and Jannach).
param init_args: The model initialization arguments. See the following initializations or check util.knn for more details on each model:
- iknn: ItemKNN(n_sims=100, lmbd=20, alpha=0.5)
- sknn: SessionKNN(k, sample_size=500, sampling='recent', similarity='jaccard', remind=False, pop_boost=0)
- v-sknn: VMSessionKNN(k, sample_size=1000, sampling='recent', similarity='cosine', weighting='div', dwelling_time=False, last_n_days=None, last_n_clicks=None, extend=False, weighting_score='div_score', weighting_time=False, normalize=True)
- s-knn: SeqSessionKNN(k, sample_size=1000, sampling='recent', similarity='jaccard', weighting='div', remind=False, pop_boost=0, extend=False, normalize=True)
- sf-sknn: SeqFilterSessionKNN(k, sample_size=1000, sampling='recent', similarity='jaccard', remind=False, pop_boost=0,extend=False, normalize=True)

Evaluation

In the evaluation of sequence-aware recommenders, each sequence in the test set is split into:

the user profile, used to compute recommendations, is composed by the first k events in the sequence
the ground truth, used for performance evaluation, is composed by the remainder of the sequence.

We can control the dimension of the user profile by assigning a positive value to GIVEN_K, which correspond to the number of events from the beginning of the sequence that will be assigned to the initial user profile. This ensures that each user profile in the test set will have exactly the same initial size, but the size of the ground truth will change for every sequence.

Alternatively, by assigning a negative value to GIVEN_K, we will set the initial size of the ground truth. In this way the ground truth will have the same size for all sequences, but the dimension of the user profile will differ.

Evaluation with sequentially revealed user-profiles

Here we evaluate the quality of the recommendations in a setting in which user profiles are revealed sequentially.

The user profile starts from the first GIVEN_K events (or, alternatively, from the last -GIVEN_K events if GIVEN_K<0).The recommendations are evaluated against the next LOOK_AHEAD events (the ground truth).The user profile is next expanded to the next STEP events, the ground truth is scrolled forward accordingly, and the evaluation continues until the sequence ends.

In typical next-item recommendation, we start with GIVEN_K=1, generate a set of alternatives that will evaluated against the next event in the sequence (LOOK_AHEAD=1), move forward of one step (STEP=1) and repeat until the sequence ends. You can set the LOOK_AHEAD='all' to see what happens if you had to recommend a whole sequence instead of a set of a set of alternatives to a user.

TOPN=100
GIVEN_K = 1
LOOK_AHEAD = 1
STEP=1

test_sequences = get_test_sequences(test_data, GIVEN_K)
print('{} sequences available for evaluation'.format(len(test_sequences)))

results = evaluation.sequential_evaluation(recommender,
                                          test_sequences=test_sequences,
                                          given_k=GIVEN_K,
                                          look_ahead=LOOK_AHEAD,
                                          evaluation_functions=METRICS.values(),
                                          top_n=TOPN,
                                          scroll=True,  # scrolling averages metrics over all profile lengths
                                          step=STEP)

print('Sequential evaluation (GIVEN_K={}, LOOK_AHEAD={}, STEP={})'.format(GIVEN_K, LOOK_AHEAD, STEP))
for mname, mvalue in zip(METRICS.keys(), results):
  print('\t{}@{}: {:.4f}'.format(mname, TOPN, mvalue))

Popularity-based Recommender

precision@100: 0.0018 recall@100: 0.1782 mrr@100: 0.0150

Evaluation with "static" user-profiles

Here we evaluate the quality of the recommendations in a setting in which user profiles are instead static.

The user profile starts from the first GIVEN_K events (or, alternatively, from the last -GIVEN_K events if GIVEN_K<0).The recommendations are evaluated against the next LOOK_AHEAD events (the ground truth).

The user profile is not extended and the ground truth doesn't move forward. This allows to obtain "snapshots" of the recommendation performance for different user profile and ground truth lenghts.

Also here you can set the LOOK_AHEAD='all' to see what happens if you had to recommend a whole sequence instead of a set of a set of alternatives to a user.

TOPN=100
GIVEN_K = 1
LOOK_AHEAD = 'all'
STEP=1

test_sequences = get_test_sequences(test_data, GIVEN_K)
print('{} sequences available for evaluation'.format(len(test_sequences)))

results = evaluation.sequential_evaluation(recommender,
                                          test_sequences=test_sequences,
                                          given_k=GIVEN_K,
                                          look_ahead=LOOK_AHEAD,
                                          evaluation_functions=METRICS.values(),
                                          top_n=TOPN,
                                          scroll=False) 

print('Sequential evaluation (GIVEN_K={}, LOOK_AHEAD={}, STEP={})'.format(GIVEN_K, LOOK_AHEAD, STEP))
for mname, mvalue in zip(METRICS.keys(), results):
  print('\t{}@{}: {:.4f}'.format(mname, TOPN, mvalue))

Popularity-based Recommender

precision@100: 0.0086 recall@100: 0.1777 mrr@100: 0.0491

Next-item Recommendation evaluation

We propose to analyse the performance of the recommender system in the scenario of next-item recommendation over the following dimensions:

the length of the recommendation list

GIVEN_K = 1
LOOK_AHEAD = 1
STEP = 1
topn_list = [1, 5, 10, 20, 50, 100]

test_sequences = get_test_sequences(test_data, GIVEN_K)
print('{} sequences available for evaluation'.format(len(test_sequences)))

res_list = []

for topn in topn_list:
    print('Evaluating recommendation lists with length: {}'.format(topn))
    res_tmp = evaluation.sequential_evaluation(recommender,
                                              test_sequences=test_sequences,
                                              given_k=GIVEN_K,
                                              look_ahead=LOOK_AHEAD,
                                              evaluation_functions=METRICS.values(),
                                              top_n=topn,
                                              scroll=True,  # here we average over all profile lengths
                                              step=STEP)
    mvalues = list(zip(METRICS.keys(), res_tmp))
    res_list.append((topn, mvalues))

# show separate plots per metric
fig, axes = plt.subplots(nrows=1, ncols=len(METRICS), figsize=(15,5))
res_list_t = list(zip(*res_list))
for midx, metric in enumerate(METRICS):
    mvalues = [res_list_t[1][j][midx][1] for j in range(len(res_list_t[1]))]
    ax = axes[midx]
    ax.plot(topn_list, mvalues)
    ax.set_title(metric)
    ax.set_xticks(topn_list)
    ax.set_xlabel('List length')

the length of the user profile.

given_k_list = [1, 2, 3, 4]
LOOK_AHEAD = 1
STEP = 1
TOPN = 20

# ensure that all sequences have the same minimum length 
test_sequences = get_test_sequences(test_data, max(given_k_list))
print('{} sequences available for evaluation'.format(len(test_sequences)))

res_list = []

for gk in given_k_list:
    print('Evaluating profiles having length: {}'.format(gk))
    res_tmp = evaluation.sequential_evaluation(recommender,
                                              test_sequences=test_sequences,
                                              given_k=gk,
                                              look_ahead=LOOK_AHEAD,
                                              evaluation_functions=METRICS.values(),
                                              top_n=TOPN,
                                              scroll=False,  # here we stop at each profile length
                                              step=STEP)
    mvalues = list(zip(METRICS.keys(), res_tmp))
    res_list.append((gk, mvalues))

# show separate plots per metric
fig, axes = plt.subplots(nrows=1, ncols=len(METRICS), figsize=(15,5))
res_list_t = list(zip(*res_list))
for midx, metric in enumerate(METRICS):
    mvalues = [res_list_t[1][j][midx][1] for j in range(len(res_list_t[1]))]
    ax = axes[midx]
    ax.plot(given_k_list, mvalues)
    ax.set_title(metric)
    ax.set_xticks(given_k_list)
    ax.set_xlabel('Profile length')

NOTE: This evaluation is by no means exhaustive, as different the hyper-parameters of the recommendation algorithm should be carefully tuned before drawing any conclusions.

References

https://github.com/mquad/sars_tutorial
https://arxiv.org/pdf/1802.08452.pdf
https://recsys.acm.org/wp-content/uploads/2019/09/recsys-19-material-predictability.pdf
https://mquad.github.io/static/papers/2015-large-scale-music-rec.pdf
http://ceur-ws.org/Vol-1441/recsys2015_poster13.pdf
Evaluation of Session-based Recommendation Algorithms [arXiv:1803.09587]

FSM