Recommendation Systems
Feedback Types
- Explicit
- Ratings, likes, or dislikes provided directly by users
- Sparse, values often missing (not missing at random)
- Higher quality but harder to collect
- Implicit
- Inferred from user behavior - clicks, watches, purchases, etc.
- Sparse, positive-only feedback (absence doesn't mean dislike)
- Easier to collect but noisier
- Explicit
Collaborative Filtering
Users collaborate indirectly to help recommend items
User-based: Recommend items that similar users liked
- $\hat{y}{ui} = \frac{\sum{u'} \text{sim}(u, u') y_{u', i}}{\sum_{u'} \text{sim}(u, u')}$
Item-based: Recommend items that are similar to what the user already likes
Similarity calculations typically use common ratings between users/items
Data sparsity makes similarity calculation challenging
Matrix Factorization
- View the problem as matrix completion
- Predict all missing entries in the ratings matrix
- Optimization: $L = ||Z - Y||^2$ (for observed entries only)
- Break up Z into low-rank matrices: $Z = U^TV$
- U represents user embeddings, V represents item embeddings
- Can't use SVD directly due to missing values
- Use ALS (Alternating Least Squares): Estimate U given V, then V given U
- Add biases to account for user/item rating tendencies:
- $\hat{y}_{ui} = \mu + b_u + c_i + u_u^T v_i$
- $\mu$ is global average rating
- $b_u$ is user bias (rating tendency)
- $c_i$ is item bias (intrinsic quality)
- $u_u^T v_i$ is user-item interaction
- $L = \sum_{(u,i) \in \text{observed}} (y_{ui} - \hat{y}_{ui})^2 + \lambda(\sum ||u_u||^2 + \sum ||v_i||^2)$
- $\hat{y}_{ui} = \mu + b_u + c_i + u_u^T v_i$
Probabilistic Matrix Factorization
- Bayesian approach to matrix factorization
- $p(Y=y_{ui}) = \mathcal{N}(\mu + b_u + c_i + u_u^T v_i, \sigma^2)$
- Can add priors on user/item vectors
Bayesian Personalized Ranking (BPR)
- For implicit feedback data
- Ranking approach: Model ranks items user interacted with (positive set) ahead of others (negative set)
- $p(y = (u,i,j) | \theta) = \sigma(f(u,i;\theta) - f(u,j;\theta))$
- $f(u,i;\theta)$ is predicted score for user u and item i
- Use hinge or logistic loss to estimate parameters
- Samples triplets (user, positive item, negative item) for training
Factorization Machines (FM)
- Generalization of matrix factorization that can incorporate side features
- Represent user-item pairs as feature vectors:
- $x = \text{concat}[\text{user_features}, \text{item_features}]$
- Model:
- $f(x) = \mu + \sum w_i x_i + \sum_{i<j} (v_i \cdot v_j) x_i x_j$
- The dot product term captures pairwise interactions between features
- Uses low-rank approximation to reduce parameter count
- Can easily incorporate context (time, location, etc.) and user/item metadata
- Loss functions:
- Explicit feedback: MSE or MAE
- Implicit feedback: Ranking loss (BPR)
Cold-Start Problem
- Difficult to generate predictions for new users or items with no history
- Solutions:
- Content-based approaches using item/user metadata
- Hybrid models combining collaborative and content-based methods
- Active learning to efficiently collect initial preferences
- Transfer learning from related domains
Exploration-Exploitation Tradeoff
- Counterfactual challenge: Users might like items they never see
- Recommendation systems must balance:
- Exploitation: Recommend items likely to be relevant based on history
- Exploration: Recommend items with uncertain but potentially high value
- Approaches:
- Multi-armed bandits (Thompson Sampling, UCB)
- Contextual bandits for personalized exploration
- Diversity promotion in recommendations