Contextual Bandit Recommenders

Posted on Fri, Apr 9, 2021 Reinforcement Learning Dash App

Code is available on GitHub.

The main purpose would be to address the cold-start issue and online real-time learning challenges.

App 1

The objective of this app is to apply the bandit algorithms to recommendation problem under a simulated environment. Although in practice we would also use the real data, the complexity of the recommendation problem and the associated algorithmic challenges can already be revealed even in this simple setting.

Inspired by these works: Simulation, Colab notebook, Blog post, and Blog post.

App 2

The objective of this app is to apply the contextual bandit algorithms to recommendation problem under a simulated environment. The recommender agent is able to quickly adapt the changing behavior of users and change the recommendation strategy accordingly.

App 3

The objective is to recommend products and adapt the model in real-time using user’s feedback using Actor-critic algorithm. Suppose, we observed users’ behavior and acquired some products they clicked on. It is fed into the Actor Network which decides what we would like to read next. It produces an ideal product embedding. It can be compared with other product embeddings to find similarities. The most matching one will be recommended to the user. The Critic helps to judge the Actor and help it find out what is wrong.

Inspired by this project.

App 4

The core intuition is that we couldn’t just blindly apply RL algorithms in a production system out of the box. The learning period would be too costly. Instead, we need to leverage the vast amounts of offline training examples to make the algorithm perform as good as the current system before releasing into the online production environment. An agent is first given access to many offline training examples produced from a fixed policy. Then, they have access to the online system where they choose the actions.

Inspired by these works: Blog post, Blog post, and RecoGym.

References

  1. LinUCB Contextual News Recommendation
  2. Experiment with Bandits
  3. n-armed Bandit Recommender
  4. Bandit Algorithms for Website Optimization [eBook O’reilly] [GitHub] [Colab]
  5. MAB Ranking PyPi
  6. RecSim GitHub, Video, Medium
  7. https://vowpalwabbit.org/tutorials/contextual_bandits
  8. https://github.com/sadighian/recommendation-gym
  9. https://learning.oreilly.com/library/view/reinforcement-learning-pocket/9781098101527/ch02
  10. https://github.com/awarebayes/RecNN/
  11. https://vowpalwabbit.org/neurips2019/
  12. https://github.com/criteo-research/reco-gym
  13. https://pypi.org/project/SMPyBandits/
  14. https://github.com/bgalbraith/bandits
  15. https://pypi.org/project/mab-ranking/
  16. https://www.optimizely.com/optimization-glossary/multi-armed-bandit/
  17. https://abhishek-maheshwarappa.medium.com/multi-arm-bandits-for-recommendations-and-a-b-testing-on-amazon-ratings-data-set-9f802f2c4073