Home ML Papers Greg Linden - Amazon.com Recommendations - Item-to-Item Collaborative Filtering (2003)

Greg Linden - Amazon.com Recommendations - Item-to-Item Collaborative Filtering (2003)

History / Edit / PDF / EPUB / BIB /
Created: December 9, 2017 / Updated: December 21, 2025 / Status: finished / Readability: technical / 4 min read (~626 words)
machine-learning

There are three common approaches to solving the recommendation problem:
- traditional collaborative filtering
- cluster models
- search-based methods
Our algorithm's online computation scales independently of the number of customers and number of items in the product catalog

Most recommendation algorithms start by finding a set of customers whose purchased and rated items overlap the user's purchased and rated items. The algorithm aggregates items from these similar customers, eliminates items the user has already purchased or rated, and recommends the remaining items to the user

Represents a customer as an N-dimensional vector of items, where $N$ is the number of distinct catalog items
The components of the vector are positive for purchased or positively rated items and negative for negatively rated items
To compensate for best-selling items, the algorithm typically multiples the vector components by the inverse frequency, making less well-known items much more relevant
Using collaborative filtering to generate recommendations is computationally expensive. It is $O(MN)$ in the worst case, where $M$ is the number of customers and $N$ is the number of product catalog items
Because the average customer vector is extremely sparse, the algorithm's performance tends to be closer to $O(M + N)$
It is possible to partially address these scaling issues by reducing the data size. We can reduce $M$ by randomly sampling the customers or discarding customers with fewer purchases, and reduce $N$ by discarding very popular or unpopular items
Unfortunately, all these methods also reduce recommendation quality in several ways
- If the algorithm examines only a small customer sample, the selected customers will be less similar to the user
- Item-space partitioning restricts recommendations to a specific product or subject area
- If the algorithm discards the most popular or unpopular items, they will never appear as recommendations, and customers who have purchased only those items will not get recommendations

Clustering models divide the customer base into many segments and treat the task as a classification problem
The algorithm's goal is to assign the user to the segment containing the most similar customers
Using a similarity metric, a clustering algorithm groups the most similar customers together to form clusters or segments
Because optimal clustering over large data sets is impractical, most applications use various forms of greedy clustering generation
Clustering models have better online scalability and performance than collaborative filtering because they compare the user to a controlled number of segments rather than the entire customer base
- However, recommendation quality is low

Given the user's purchased and rated items, the algorithm constructs a search query to find other popular items by the same author, artist, or director, or with similar keywords or subjects
For users with thousands of purchases, it's impractical to base a query on all the items
- The algorithm must use a subset or summary of the data, reducing quality
In all cases, recommendation quality is relatively poor

Rather than matching the user to similar customers, item-to-item collaborative filtering matches each of the user's purchased and rated items to similar items, then combines those similar items into a recommendation list
To determine the most-similar match for a given item, the algorithm builds a similar-items table by finding items that customers tend to purchase together
This offline computation of the similar-items table is extremely time intensive, with $O(N^2M)$ as worst case. In practice, however, it's closer to $O(NM)$, as most customers have very few purchases

For very large data sets, a scalable recommendation algorithm must perform the most expensive calculations offline

Linden, Greg, Brent Smith, and Jeremy York. "Amazon. com recommendations: Item-to-item collaborative filtering." IEEE Internet computing 7.1 (2003): 76-80.