Recommendation System

It is Also known as “Collaborative Filtering”
If Person A has the same opinion as Person B on an issue, A is more likely to have B’s opinion on a different issue ‘x’, when compared to the opinion of a person chosen randomly

Collaborative Filtering are of two types:
1. Traditional Collaborative Filtering
2. Item to Item Collaborative Filtering

Traditional Collaborative Filtering:

• Customer as a p-dimensional vector of items
? p: the number of distinct catalog items
? Components
? Bought (1) / Not bought (0)
? Ratings
? Rated (1) / Not rated (0)
? Number of products purchased
• Find Similarity between Customers A & B

Similarity Measures:

Cos (A, B) = A.B / |A|*|B|
? A: (a1, a2 ,….., aN )
? B: (b1 , b2 ,…..,bN )
? A.B: a1*b1 + a2*b2 + aN*bN
? |A|: (a1 2 + a2 2 +….+ aN 2) 1/2
? |B|: (b1 2 + b2 2 +….+ bN 2 ) ½
Corr AB = Covariance (A, B) / Stdev (A) * Stdev (B)

Dissimilarity measures:

• Multiply the vector components by the inverse frequency
• Inverse frequency: The inverse of the number of customers who have purchased or rated the item
• Find Nearest Neighbour(s) based on distance
• Can use other Distance measures to identify neighbours
Euclidean distance = SQRT[(X2-X1) ^2 + (Y2-Y1) ^2 + (Z2-Z1) ^2 +(A2-A1) ^2)]
Manhattan distance = (|X2-X1| + | Y2-Y1| + | Z2-Z1| + | A2-A1|)

What items to recommend?

• The item that hasn’t been bought by the user yet
• Create a list of multiple items to be considered for recommendation & recommend the item that the person is MOST LIKELY to buy
? Rank each item according to how many similar customers purchased it
? Or rated by most
? Or highest rated
? Or some other popularity criteria

Long Tail
Supply-side drivers:
• Centralized warehousing with more offerings
• Lower inventory cost of electronic products
Demand-side drivers:
• Search engines
• Recommender systems

• Memory-based / Lazy-learning
• When does the recommendation engine compute the “recommendation”?

How to reduce computation?

1. Randomly sample customers
2. Discard infrequent buyers
3. Discard items that are very popular or very unpopular
4. Clustering can reduce # of rows
5. PCA can reduce # of columns
Search-based Methods
Based on previous purchases to reduce computation

Item-to-Item collaborative filtering

• Cosine similarity among items
? Item being the vector
? Customers as components of the vector
• Correlation similarity among items
? Correlation of ratings of Items I & J where users rated both I & J

Scalability & Performance

? Computation-expensive, however similar-items table is computed offline
? Dependent only on how many titles the user has purchased or rated
? Online component: lookup similar items for the user’s purchases & ratings


• Less diversity between items, compared to the users’ taste, therefore the recommendations are often obvious
• When considering what to recommend to a user, who purchased a popular item, the association rules are item-based collaborative filtering might yield the same recommendation, whereas the user-based recommendation will likely differ