A Tesla automobiles store wants to predict the preferencess of potential buyers so that the right tesla variant can be recommended to maximize the probability of conversion. The data would come from 3 social media profiles of those potential buyers.

Note: Collected social media data collected after users's consent and in compliance with data privacy regulations.

Rule engine

This system will then be fine-tuned and used to recommend teslas based on queries. A business rule engine will be integrated with this system to increase accuracy.

This is how derived rules would be like:

Car or truck or no mention of vehicle type means Cyber Truck
SUV mention means Model X
Mentions of large family or many people means model x

Input and Output

Public datasets

Instagram: 16539 images from 972 Instagram influencers (link)
TechCrunchPosts: (link)
Tweets: (link)

Primary (available for academic use only, need university affiliation for access)

A Dataset and Benchmarks for Multimedia Social Analysis

Secondary (low quality data, not sure if can be used at all)

Hacker News Posts
TechCrunch Posts Compilation
Instagram image data HowTo
Flikr Large with likes and comments
The Images of Groups Dataset
http://www.multimediaeval.org/datasets/
The InstaCities1M Dataset
Multimodal Meme Classification: Identifying Offensive Content in Image and Text
Understanding Police Social Media Usage Through Posts and Tweets

Scope

media content categories: text and images
platforms: facebook, twitter and instagram
implicit rating categories: like, comment, share
columns: userid, timestamp, platform, type, content, rating

Model Framework

Model framework 1

Convert user's natural language query into vector using Universal Sentence Embedding model
Create a product specs binary matrix based on different categories
Find TopK similar query vectors using cosine distance
For each TopK vector, Find TopM product specs using interaction table weights
For each TopM specification, find TopN similar specs using binary matrix
Show all the qualified product specifications

Model framework 2

Seed data: 10 users with ground-truth persona, media content and implicit ratings
Inflated data: 10 users with media content and implicit ratings
media content → Implicit rating (A)
media content → feature vector (B) + (A) → weighted pooling → similar users (C)
media content → QA model → slot filling → global pooling → item associations (D)
(C) → content-based filtering → item recommendations → (D) → top-k recommendations

User selection

People who are connected to social media community of electric vehicles
Seed users are those who already have an electric vehicle
Inflated users are those who doesn't own an EV but inclined to purchase
Users having presense on all three sites or at least 2

Model framework 3

User-User Similarity (clustering)

User → Media content → Embedding → Average pooling
Cosine Similarity of user's social vector with other user's social vector

User-Item Similarity (reranking)

User → Implicit Rating on media content M → M's correlation with item features
Item features: familySize
Cosine Similarity of user's social vector with item's feature vector

User-User Similarity (clustering)

User → Media content → Embedding → Average pooling
Cosine Similarity of user's social vector with other user's social vector

User-Item Similarity (reranking)

User → Implicit Rating on media content M → M's correlation with item features
Item features: familySize
Cosine Similarity of user's social vector with item's feature vector

Model framework 4

Text → Prepare → Vectorize → Average → Similar Users
Text → Prepare → QA → Slot filling
Image → Prepare → Vectorize → Average → Similar Users
Image → Prepare → VQA → Slot filling
Image → Similar Image from users → Detailed enquiry

Model framework 5

Topic Clusters Text
Topic Clusters Image
Fetch raw text and images
Combine, Clean and Store text in text dataframe
Vectorize Texts
Cosine similarities of texts with topic clusters
Vectorize Images
Cosine similarities of images with topic clusters

Experiment 1

Experiment 2

Facebook Scraping

Twitter Scraping

Dataframe

Insta Image Grid

User Text NER

Experiment 3

Topic scores

JSON rules

Results and Discussion

API with 3 input fields - Facebook username, Twitter handle & Instagram username
The system will automatically scrap the user's publicly available text and images from these 3 social media platforms and provide a list of recommendations from most to least preferred product

Vehicle Recommendation using Social media data