Recommendation System

It is Also known as “Collaborative Filtering”
If Person A has the same opinion as Person B on an issue, A is more likely to have B’s opinion on a different issue ‘x’, when compared to the opinion of a person chosen randomly

Collaborative Filtering are of two types:
2. Item to Item Collaborative Filtering

• Customer as a p-dimensional vector of items
? p: the number of distinct catalog items
? Components
? Bought (1) / Not bought (0)
? Ratings
? Rated (1) / Not rated (0)
? Number of products purchased
• Find Similarity between Customers A & B

Similarity Measures:

Cos (A, B) = A.B / |A|*|B|
? A: (a1, a2 ,….., aN )
? B: (b1 , b2 ,…..,bN )
? A.B: a1*b1 + a2*b2 + aN*bN
? |A|: (a1 2 + a2 2 +….+ aN 2) 1/2
? |B|: (b1 2 + b2 2 +….+ bN 2 ) ½
Corr AB = Covariance (A, B) / Stdev (A) * Stdev (B)

Dissimilarity measures:

• Multiply the vector components by the inverse frequency
• Inverse frequency: The inverse of the number of customers who have purchased or rated the item
• Find Nearest Neighbour(s) based on distance
• Can use other Distance measures to identify neighbours
Euclidean distance = SQRT[(X2-X1) ^2 + (Y2-Y1) ^2 + (Z2-Z1) ^2 +(A2-A1) ^2)]
Manhattan distance = (|X2-X1| + | Y2-Y1| + | Z2-Z1| + | A2-A1|)

What items to recommend?

• The item that hasn’t been bought by the user yet
• Create a list of multiple items to be considered for recommendation & recommend the item that the person is MOST LIKELY to buy
? Rank each item according to how many similar customers purchased it
? Or rated by most
? Or highest rated
? Or some other popularity criteria

Long Tail
Supply-side drivers:
• Centralized warehousing with more offerings
• Lower inventory cost of electronic products
Demand-side drivers:
• Search engines
• Recommender systems

• Memory-based / Lazy-learning
• When does the recommendation engine compute the “recommendation”?

How to reduce computation?

1. Randomly sample customers
3. Discard items that are very popular or very unpopular
4. Clustering can reduce # of rows
5. PCA can reduce # of columns
Search-based Methods
Based on previous purchases to reduce computation

Item-to-Item collaborative filtering

• Cosine similarity among items
? Item being the vector
? Customers as components of the vector
• Correlation similarity among items
? Correlation of ratings of Items I & J where users rated both I & J

Scalability & Performance

? Computation-expensive, however similar-items table is computed offline
? Dependent only on how many titles the user has purchased or rated
? Online component: lookup similar items for the user’s purchases & ratings