A Tesla automobiles store wants to predict the preferencess of potential buyers so that the right tesla variant can be recommended to maximize the probability of conversion. The data would come from 3 social media profiles of those potential buyers.
Note: Collected social media data collected after users's consent and in compliance with data privacy regulations.
Rule engine
This system will then be fine-tuned and used to recommend teslas based on queries. A business rule engine will be integrated with this system to increase accuracy.
This is how derived rules would be like:
- Car or truck or no mention of vehicle type means Cyber Truck
- SUV mention means Model X
- Mentions of large family or many people means model x
Input and Output
Public datasets
Primary (available for academic use only, need university affiliation for access)
Secondary (low quality data, not sure if can be used at all)
- Hacker News Posts
- TechCrunch Posts Compilation
- Instagram image data HowTo
- Flikr Large with likes and comments
- The Images of Groups Dataset
- http://www.multimediaeval.org/datasets/
- The InstaCities1M Dataset
- Multimodal Meme Classification: Identifying Offensive Content in Image and Text
- Understanding Police Social Media Usage Through Posts and Tweets
Scope
- media content categories: text and images
- platforms: facebook, twitter and instagram
- implicit rating categories: like, comment, share
- columns: userid, timestamp, platform, type, content, rating
Model Framework
Model framework 1
- Convert user's natural language query into vector using Universal Sentence Embedding model
- Create a product specs binary matrix based on different categories
- Find TopK similar query vectors using cosine distance
- For each TopK vector, Find TopM product specs using interaction table weights
- For each TopM specification, find TopN similar specs using binary matrix
- Show all the qualified product specifications
Model framework 2
- Seed data: 10 users with ground-truth persona, media content and implicit ratings
- Inflated data: 10 users with media content and implicit ratings
- media content → Implicit rating (A)
- media content → feature vector (B) + (A) → weighted pooling → similar users (C)
- media content → QA model → slot filling → global pooling → item associations (D)
- (C) → content-based filtering → item recommendations → (D) → top-k recommendations
User selection
- People who are connected to social media community of electric vehicles
- Seed users are those who already have an electric vehicle
- Inflated users are those who doesn't own an EV but inclined to purchase
- Users having presense on all three sites or at least 2
Model framework 3
User-User Similarity (clustering)
- User → Media content → Embedding → Average pooling
- Cosine Similarity of user's social vector with other user's social vector
User-Item Similarity (reranking)
- User → Implicit Rating on media content M → M's correlation with item features
- Item features: familySize
- Cosine Similarity of user's social vector with item's feature vector
User-User Similarity (clustering)
- User → Media content → Embedding → Average pooling
- Cosine Similarity of user's social vector with other user's social vector
User-Item Similarity (reranking)
- User → Implicit Rating on media content M → M's correlation with item features
- Item features: familySize
- Cosine Similarity of user's social vector with item's feature vector
Model framework 4
- Text → Prepare → Vectorize → Average → Similar Users
- Text → Prepare → QA → Slot filling
- Image → Prepare → Vectorize → Average → Similar Users
- Image → Prepare → VQA → Slot filling
- Image → Similar Image from users → Detailed enquiry
Model framework 5
- Topic Clusters Text
- Topic Clusters Image
- Fetch raw text and images
- Combine, Clean and Store text in text dataframe
- Vectorize Texts
- Cosine similarities of texts with topic clusters
- Vectorize Images
- Cosine similarities of images with topic clusters
Experiment 1
Experiment 2
Facebook Scraping
Twitter Scraping
Dataframe
Insta Image Grid
User Text NER
Experiment 3
Topic scores
JSON rules
Results and Discussion
- API with 3 input fields - Facebook username, Twitter handle & Instagram username
- The system will automatically scrap the user's publicly available text and images from these 3 social media platforms and provide a list of recommendations from most to least preferred product
References
- Content-based Recommender System using Social Networks for Cold-start Users
- Social Media-based User Embedding: A Literature Review
- Understanding Consumer Preferences from Social Media Data
- Word Embeddings for User Profiling in Online Social Networks
- Social Network Embeddings
- Learning Invariant Representations of Social Media Users, git
- Every Document Owns Its Structure: Inductive Text Classification via Graph Neural Networks
- https://arxiv.org/pdf/1908.07738.pdf