What is Bandit based Recommendation?

Traditionally, the recommendation problem was considered as a simple classification or prediction problem; however, the sequential nature of the recommendation problem has been shown. Accordingly, it can be formulated as a Markov decision process (MDP) and reinforcement learning (RL) methods can be employed to solve it. In fact, recent advances in combining deep learning with traditional RL methods, i.e. deep reinforcement learning (DRL), has made it possible to apply RL to the recommendation problem with massive state and action spaces.

Use case 1: Personalized recommendations

Goal: Quickly help users find products they would like to buy

In e-commerce and other digital domains, companies frequently want to offer personalized product recommendations to users. This is hard when you don’t yet know a lot about the customer, or you don’t understand what features of a product are pertinent. With limited information about what actions to take, what their payoffs will be, and limited resources to explore the competing actions that you can take, it is hard to know what to do.

Use case 2: Online model evaluation

Goal: Compare and find the best performing recommender model

Use case 3: Personalized re-ranking

Goal: Bring the most relevant option to the top

Use case 4: Personalized feeds

Goal: Recommend a never-ending feed of items (news, products, images, music)