matrix factorization techniques for recommender systems

Three strength of MF (conclusion):

  • combining good scalability with predictive accuracy.
  • offering much flexibility for modeling various rela-life situation.
  • allowing incorporation of additional information.
  • lower space complexity.

recommender system strategies

  1. content-based filtering: which creates a profile for each user or product to characterize its nature. However, it requires gathering external information that miht not be available or easy to collect.
  2. collaborative filtering: which relies only on past user behaviors (previous transactions or ratings)-without requiring the creation of explicit profiles. Although it is domain free and more accurate than content-based filtering, it suffers from the cold start problem, due to its inability to address the system's new products and users.

CF

  1. the neighborhood methods: computing the relatioships between items or, between users.
  • item-based CF: evaluates a user's preference for an item based on ratings of "neighboring" items(tend to get similar ratings when rated by the same user) by the same user
  • user-based CF:
  1. the latent factor model: tries to explain the ratings by characterizing both items and users on, say, 20 to 100 factors inferred from the ratings patterns.

MF

Some of the most successful realization of latent factor models are based on MF, which characterizes both items and users by vectors of factors infered from item rating patterns( combining good scalabilitywith predictive accuracy and offer flexibility).

Explicit feedback comprises a sparse matrix. Implicit feedback usually represents a densely filled matrix.

A basic MF model

Matrix factorization models map both users and items to a joint latent factor space of dimensionality $f$, such that user-item interactions are modeled as inner products in that space. Accordingly, each item $i$ is associated with a vector $q_i \in \mathbb{R}^f$, and each user $u$ is associated with a vector $p_u \in \mathbb{R}^f$. For a given item $i$, the elements of $q_i$ measure the extent to which the item possesses those factors, positive or negative. For a given user $u$, the elements of $p_u$ measure the extent of interest the user has in items that are high on the corresponding factors, again, positive or negative. The resulting dot product, $q_i^\top p_u$, captures the interaction between user $u$ and item $i$—the user’s overall interest in the item’s characteristics.

loss: $min \sum_{(u,i) \in K}(r_{ui}-q_i^\top p_u)^2+\lambda(\mid \mid q_i \mid \mid^2+\mid \mid p_u \mid \mid^2)$

learning algorithms

  • stochastic gradient descent

  • Alternating least squares(ALS)

Adding biases

$q_i^\top p_u$ tries to capture the interactions between users and items that produce the different rating values. However, much of the observed variation in rating values is due to effects associated with either users or items, known as biases or intercepts, independent of any interactions.

For example, CF data exhibits large systematic tendencies for some users to give higher ratings than others, and for some items to receive higher ratings than others. Thus, it is unwise to explain the full rating value by an interaction of the form $q_i^\top p_u$. Instead, the system tries to identify the portion of these values that individual user or item biases can explain. A first-order approximation of the bias involved in rating $b_{ui}=\mu +b_i +b_u$ accounts for the user and item effects.

loss: $min \sum_{(u,i) \in K}(r_{ui}-\mu-b_u-b_i-q_i^\top p_u)^2+\lambda(\mid \mid q_i \mid \mid^2+\mid \mid p_u \mid \mid^2+b_u^2+b_i^2)$

Additional input sources

A way to relieve cold start problem is to incorporate additional sources of information about the users.

$\hat{r}{ui}=\mu+b_i+b_u+q_i^\top[p_u+\mid N(u) \mid^{-0.5}\sum{i \in N(u)}x_i+\sum_{a \in A(u)}y_a]$

temporal dynamics

preference changes over time.

$\hat{r}_{ui}(t)=\mu+b_i(t)+b_u(t)+q_i^\top p_u(t)$

inputs with varying confidence levels

In several setups, not all observed ratings deserve the same weight or confidence.

$min \sum_{(u,i) \in K}c_{ui}(r_{ui}-\mu-b_u-b_i-q_i^\top p_u)^2+\lambda(\mid \mid q_i \mid \mid^2+\mid \mid p_u \mid \mid^2+b_u^2+b_i^2)$