Use of this document

This is a study note for

1. Introduction

1.1 Evaluation of Matching

Metrix Recommending focus Example industry
novelty different item News, Video, Game
personalization similar item music, searching
diversity (novelty + personalization) different + similar item E-commerce, App

1.2 Matching algorithm by pathway

Recommendation pathway to personalization. ALS_MF, NCF, DSSM stand for alternating least squred matrix factorization, Neural Collaborative Filtering, and Deep Semantic Similarity Model.
pathway output Contents matching algorithm Solving problem
u2i personalized items query from log database Gray Sheep
i2i similar items embedding(item2vec/DSSM) + KNN_TopN Synonymy
u2u similar user embedding(user2vec/DSSM) + KNN_TopN cold start
u2u2i similar user, personalized items embedding(ALS_MF/NCF/DSSM) + KNN_TopN cold start, Gray Sheep, Data Sparsity
u2i2i personalized items, similar items embedding(ALS_MF/NCF/DSSM) + KNN_TopN Gray Sheep, Synonymy, Data Sparsity
Recommendation pathway to novelty. ALS_MF, NCF, DSSM stand for alternating least squred matrix factorization, Neural Collaborative Filtering, and Deep Semantic Similarity Model. For mining Massive Data Sets, use Locality-Sensitive Hashing can improve efficiency with minor loss in acurrcy
pathway output Contents Matching algorithm Solving problem
u2tag same tag query from profile database Gray Sheep
tag2tag similar tag embedding(tag2vec) + KNN_TopN Novelty, Data Sparsity
tag2i similar items query from content database cold start
u2tag2tag2i similar tag, similar items embedding(ALS_MF/NCF/DSSM) + KNN_TopN cold start, Gray Sheep, novelty, Data Sparsity

1.3 Hybridization techniques of Matching components

Hybridization techniques given Matching strategies \(s\) that result score for each Contents \(S_{s,c}\)
Hybridization techniques Description
ordering given order of strategies, choose the best one strategy to present
average \(S_i = \frac{\sum_{s} S_{s,c}}{\#s}\)
weighted average Given weight \(W_s\) of each Matching components, \(S_i = \frac{\sum_{s} S_{s,c} \times W_s}{\sum_s W_s}\)
dynamic weighting instantaneously calculate KPI, such as Click Through Rate (CTR), Average Transaction Value (ATV) of each Matching components, and update weight, then use weighted average
algorithm-based weighting use model (LR、FM、Embedding+MLP, AFM, IAFM, Wide&Deep, FNN, NFM, DeepFM, DCN, xDeepFM, PNN, OENN, OANN, FGCNN, FiBiNET) to calculate Click Through Rate and assign weight, , then use weighted average

2. Collaborative filtering

2.1 Memory based collaborative filtering

  • Pearson Correlation Coefficient
  • Cosine-based Similarity
  • Adjusted Cosine Similarity

2.2 Model based collaborative filtering

model Advantages Disadvantages data
traditional Matrix Factorization Latent semantic indexing no crossing either item or user features, high memory complexity x: user-item interaction matrix, y: user latent vector, item latent vector
YouTube’ user-embedding DNN crossing user features no crossing item features x: user behavior, y: user-item interaction matrix
NCF, Neural Collaborative Filtering embedding user and item variable, respectively, to enable feature-crossing the later mlp for user and item feature-crossing does not help to obtain fit the user-item interaction matrix x: user feature, item feature; y: user-item interaction matrix
NMF, Neural Matrix Factorization crossing user and item features crossing user and item features increasing computation latency during serving x: user feature, item feature; y: user-item interaction matrix
DSSM, Deep Semantic Similarity Model improvement of NCF on removing mlp after embedding. 1) embedding user and item variable, respectively, to enable feature-crossing 2) During serving, the seperation of user and item embedding reduce computation cost by pre-embedding items. Asynchrony problem when model version is different due to high-frequency incremental learning x: user feature, item feature; y: user-item interaction matrix