Azure Personalizer is a cloud-based API service that helps developers create rich, personalized experiences for each user of your app. It learns from customer's real-time behavior, and uses reinforcement learning to select the best item (action) based on collective behavior and reward scores across all users. Actions are the content items, such as news articles, specific movies, or products. It takes a list of items (e.g. list of drop-down choices) and their context (e.g. Report Name, User Name, Time Zone) as input and returns the ranked list of items for the given context. While doing that, it also allows feedback submission regarding the relevance and efficiency of the ranking results returned by the service. The feedback (reward score) can be automatically calculated and submitted to the service based on the given personalization use case.
You can use the Personalizer service to determine what product to suggest to shoppers or to figure out the optimal position for an advertisement. After the content is shown to the user, your application monitors the user's reaction and reports a reward score back to the Personalizer service. This ensures continuous improvement of the machine learning model, and Personalizer's ability to select the best content item based on the contextual information it receives.
Content is any unit of information, such as text, images, URL, emails, or anything else that you want to select from and show to your users. It is highly essential to come up with well thought out features that represent the given items and their context most effectively as per the objective of personalization use case. Exploration ensures the Personalizer continues to deliver good results even in the changing user behavior and avoids model stagnation, drift, and ultimately lower performance. Learning policy determines the specific hyperparameters for the model training. These can be optimized offline (using offline evaluation) and then used online. These can be imported/exported for future reference, re-use, and audit. Personalizer starts with default learning policy which can yield moderate performance. As part of optimization, Evaluations are run that allows Personalizer to create new Learning Policies specifically optimized to a given use case. Optimized learning policies perform significantly better for each specific loop, generated during evaluation. Don’t use Personalizer where the personalized behavior isn’t something that can be discovered across all users but rather something that should be remembered for specific users or comes from a user-specific list of alternatives.
Personalizer Service can return a rank very rapidly, and azure will auto-scale on need basis to maintain the rapid generation of ranking results. Throughput is calculated by adding the size of action and context JSON documents, and factor the rate of 20 MB / sec. Context and Actions (items) are expressed as a JSON object that is sent with the Rank API call. JSON objects can include nested JSON objects and simple property/values. Arrays can be included if the items are numbers. Once embedded, Personaliser can skip the learning curve with apprentice mode. When the feature is switched on, the service learns alongside your existing solution without being exposed to users until it meets your performance threshold.
Following are some of the interesting use cases of Azure Personalizer:
- Blog Recommender [Video tutorial, GitHub]
- Food Personalizer [Video tutorial, Slideshare, Code Blog]
- Coffee Personalizer [GitHub, Video tutorial]
- News Recommendation
- Content Type - News list
- Actions (with features) - The president... (national, politics, [text]), Hurricane in the ... (regional, weather, [text,image], Premier League ... (global, sports, [text, image, video])
- Context Features - Device news is read from, Month, Season
- Returned Reward Action ID (display this content) - The president...
- Movie Recommendation
- Content Type - Movie list
- Actions (with features) - Star Wars (1977, [action, adventure, fantasy], George Lucas), Hoop Dreams (1994, [documentary, sports], Steve James, Casablanca (1942, [romance, drama, war], Michael Curtiz)
- Context Features - Device movie is watched from, Screen size, Type of user
- Returned Reward Action ID (display this content) - Casablanca
- Product Recommendation
- Content Type - Products list
- Actions (with features) - Product A (3 kg, $$$$, deliver in 24 hours), Product B (20 kg, $$, 2 week shipping with customs), Product C (3 kg, $$$, delivery in 48 hours)
- Context Features - Device shopping is read from, Spending tier of user, Month, or season
- Returned Reward Action ID (display this content) - Product B
- Intent clarification & disambiguation: help your users have a better experience when their intent is not clear by providing an option that is personalized.
- Default suggestions for menus & options: have the bot suggest the most likely item in a personalized way as a first step, instead of presenting an impersonal menu or list of alternatives.
- Bot traits & tone: for bots that can vary tone, verbosity, and writing style, consider varying these traits.
- Notification & alert content: decide what text to use for alerts in order to engage users more.
- Notification & alert timing: have personalized learning of when to send notifications to users to engage them more.
- Dropdown Options
- Different users of an application with manager privileges would see a list of reports that they can run. Before Personalizer was implemented, the list of dozens of reports was displayed in alphabetical order, requiring most of the managers to scroll through the lengthy list to find the report they needed. This created a poor user experience for daily users of the reporting system, making for a good use case for Personalizer. The tooling learned from the user behavior and began to rank frequently run reports on the top of the dropdown list. Frequently run reports would be different for different users, and would change over time for each manager as they get assigned to different projects. This is exactly the situation where Personalizer’s reward score-based learning models come into play.
- Reward Score Calculation
- Reward score was calculated based on the actual report selected (from the dropdown list) by the user from the ranked list of reports displayed with the following calculation:
- If the user selected the 1st report from the ranked list, then reward score of 1
- If the user selected the 2nd report from the ranked list, then reward score of 0.5
- If the user selected the 3rd report from the ranked list, then reward score of 0
- If the user selected the 4th report from the ranked list, then reward score of – 0.5
- If the user selected the 5th report or above from the ranked list, then reward score of -1
- Results
- Reward score was calculated based on the actual report selected (from the dropdown list) by the user from the ranked list of reports displayed with the following calculation:
- Projects in Timesheet
- Every employee in the company logs a daily timesheet listing all of the projects the user is assigned to. It also lists other projects, such as overhead. Depending upon the employee project allocations, his or her timesheet table could have few to a couple of dozen active projects listed. Even though the employee is assigned to several projects, particularly at lead and manager levels, they don’t log time in more than 2 to 3 projects for a few weeks to months.
- Before personalization, the projects in the timesheet table were listed in alphabetical order, again resulting in a poor user experience. Even more troublesome, frequent user errors caused the accidental logging of time in the incorrect row. Personalizer was a good fit for this use case as well, allowing the system to rank projects in the timesheet table based on time logging patterns for each user.
- Reward Score Calculation
- The reward score for this use case was calculated based on the proximity between the ranking of projects in timesheet returned by the Personalizer and the actual projects that the user would log time as follows:
- Time logged in the 1st row of the ranked timesheet table, then reward score of 1
- Time logged in the 2nd row of the ranked timesheet table, then reward score of 0.6
- Time logged in the 3rd row of the ranked timesheet table, then reward score of 0.4
- Time logged in the 4th row of the ranked timesheet table, then reward score of 0.2
- Time logged in the 5th row of the ranked timesheet table, then reward score of 0
- Time logged in the 6th row of the ranked timesheet table, then reward score of -0.5
- Time logged in the 7th row or above of the ranked timesheet table, then reward score of -1
- The above approach to reward score calculation considers that most of the time users would not need to fill out their timesheet for more than 5 projects at a given time. Hence, when a user logs time against multiple projects, the score can be added up and then capped between 1 to -1 while calling Personalizer Rewards API.
- Results
- The reward score for this use case was calculated based on the proximity between the ranking of projects in timesheet returned by the Personalizer and the actual projects that the user would log time as follows: