Wehkamp

Democratizing data for better shopping experiences

Dutch retailer Wehkamp offers their shoppers a wide range of quality products. They carry the latest fashion trends, home-goods, electronics and everything in between. As a leading e-commerce company in fashion in the Netherlands, they dedicates itself to provide a better shopping experience for the customers, and continually looks for ways to not only engage shoppers on their site, but also create opportunities for their brand partners to clearly demonstrate their value.

Their main marketing focus is relevance — ensuring their shoppers are able to find what they need in the most efficient way. This puts shoppers in a purchasing frame of mind when they visit Wehkamp’s website. Using Spark, the data science team is able to develop various machine-learning projects for this purpose based on the large scale data of products and customers. A major topic for the data science team is ranking products. If a visitor enters a search phrase, what are the best products that fit the search phrase and in what order should the products been shown? Ranking products is also important if a visitor enters a product overview page, where hundreds or even thousands of products of a certain article type are displayed.

For instance, when a user search for 'jeans' on wehkamp website, it returns 4400+ products. User navigates to 'ladies jeans' overview page. The search result page now has 2176 products. So the goal is to maximize the order of relevance of returned products given a user query.

System Design

  1. Data collection
  2. Click model - for relevancy scores
  3. Feature generation - for explaining relevancy
  4. Ranking model - for estimating weights to features
  5. Serve model (elasticsearch LTR) - for productionising
  6. Evaluation (tableau)

Tech stack

Data collection

Raw 'Google Analytics' feed (daily) → Google Big Query → S3 bucket → Spark

Click model

Objective - predict the relevance of products based on impression and clicks of products given its position. 2 candidates models - DBN (dynamic bayesian network) click model and COEC (click over expected clicks). COEC gave better results, easier to train and explain.

Feature generation

Ranking model

Notebook jobs were used to process raw data and generate features. XGBoost model was trained on these features. HyperOpt and MLflow were used for hyperparameter optimization and experiment tracking respectively. For identifying feature importances and explaining them, SHAP was used.

Serve model

Model was saved in Elastic index.

Evaluation

References

Applied Machine Learning for Ranking Products in an Ecommerce Setting Arnoud de Munnik Wehkamp Jerry

As a leading e-commerce company in fashion in the Netherlands, Wehkamp dedicates itself to provide a better shopping experience for the customers. Using Spar...

Wehkamp | Criteo Success Story

How Wehkamp monetizes their website with seamless ad experiences

Customer Story: Wehkamp - Databricks

Learn how with the Databricks unified data analytics platform, Wehkamp is now able to deliver a more personalized online shopping experience.

Applied Machine Learning for Ranking Products in an Ecommerce Setting - Databricks

As a leading e-commerce company in fashion in the Netherlands, Wehkamp dedicates itself to provide a better shopping experience for the customers. Using Spark, the data science team is able to develop various machine-learning projects for this purpose based on the large scale data of products and customers.

wehkamp - Gratis bezorging en retour

Unieke collectie mode, meubels, speelgoed en elektronica voor dames, heren en kinderen | gratis bezorging vanaf 20.- * 30 dagen bedenktijd * gratis retourneren!

AWS Case Study: Wehkamp

Wehkamp is the largest e-commerce retailer in the Netherlands, selling more than 17 million products a year and attracting about 350,000 unique customers daily. Wehkamp planned to expand into Belgium, and wanted to employ a new cloud-based platform to do so.