Part 3 Practical Deep Learning for Coders (Collaborative Filtering)
Collaborative Filtering: Learning Latent Factors
The Core Problem
Collaborative filtering solves a fundamental problem: predicting user preferences by learning latent factors: hidden characteristics that explain why users like certain items. Rather than relying on explicit metadata, the model discovers these patterns through training on historical interactions.
The Core Mechanism
The simplest approach uses a dot product between user and movie embeddings. Each user and item gets a vector of learned factors. When multiplied element-wise and summed, this predicts a rating. The elegance lies in efficiency: embeddings replace one-hot-encoded vectors (where only one element is 1) with a learned lookup table, avoiding sparse matrix multiplications while maintaining mathematical equivalence.
# A 1000-user, 50-factor embedding is far more efficient than
# 1000 one-hot vectors multiplied by a 1000x50 matrix
embed = Embedding(1000, 50)
Refinements That Matter
Bias Terms: Capture movie or user-specific effects independent of latent factors. Some movies are universally poor; some users are universally generous. Bias absorbs this, letting embeddings focus on preference patterns.
Weight Decay (L2 Regularization): Prevents the model from learning overly complex, sharp features. By penalizing large weights, it encourages smoother decision boundaries that generalize better to unseen users and items.
loss_with_wd = loss + wd * (parameters**2).sum()The regularization term wd * (parameters**2).sum() adds a penalty proportional to the magnitude of weights. Higher wd values force weights toward zero, reducing model complexity and overfitting.
The Cold Start Problem
New users or items lack sufficient data, making recommendations impossible. Solutions range from questionnaires to using average embeddings or tabular models incorporating metadata and temporal signals.
Key Insight
Collaborative filtering succeeds because user preferences follow patterns that generalize. Latent factors capture these patterns, learned through simple gradient descent, without ever knowing why a user likes something just that similar users do.
