Recommender systems are now setting the pace in the digital marketplace – not just for products and services, but also in networking sites which present users with a choice of people that they could connect with. In this blog post, we will explore how recommender systems use collaborative filtering.
The internet’s vast troves of data have a problem – extracting information from these big data sets is becoming increasingly difficult. Collaborative filtering is one of the mechanisms that filters information to help systems make predictions and serve their users better.
Collaborative filtering makes predictions about a user’s interests based upon the collection of preferences of many other users. The premise is that if person A likes and buys the same product as person B, then person A is also likely to buy other different products that person B has bought, rather than any randomly chosen person.
Collaborative filtering requires that users actively participate in the system, indicating their preferences and interests, it also requires the use of machine learning algorithms that can match users with similar interests. Collaborative filtering can be user-based or item-based.
User-based filtering typically uses the Nearest Neighbor algorithm to process the information and make predictions. It matches the user with others who share the same rating patterns and use those ratings to calculate a prediction for the current user.
How Facebook uses Collaborative Filtering
In Facebook, Collaborative filtering is used to help users discover friends, pages, groups and events that are most relevant to them, based on historical data of like-minded users and their preferences.
One of the challenges that collaborative filtering tackles is the massive data sets that Facebook has, to come up with relevant recommendations. Facebook’s average data set has 100 billion ratings with more than a billion users and millions of items. Existing solutions were simply not enough to handle the scale and the complexity of the data. Their software engineers have worked on several customized solutions along with scaling Apache Giraph to solve the problem.
Facebook’s Maja Kabiljo & Aleksandar Ilic have written about how they use both Alternating Least Squares (ALS) and Stochastic gradient descent (SGD) optimization in a customized rotational hybrid method to help achieve better recommendations.
They found that using the standard method led to problems such as large network traffic and skewed item degree distribution. Their solution has them extending the Giraph framework with worker-to-worker messaging. This rotational hybrid method is also 10x faster than the more conventional Spark MLib implementations in this domain. Further, the standard approach could only handle a data set capped at 3.5 billion ratings, whereas the rotational hybrid method, the system could handle more than 100 billion ratings.
How LinkedIn uses Collaborative Filtering
LinkedIn uses item-based collaborative filtering to connect users. At LinkedIn item-to-item collaborative ﬁltering is used to set up recommendations for people, job, company, group and is one of the principal components of user engagement.
On each user’s profile/page, there is navigational aid that enables the user to browse and discover content like for example, ‘people who viewed this profile also viewed,’ etc. These navigational aids are called browsemaps and are a collaborative filtering dataset. While it was initially designed to show co-occurrences on member profiles, it evolved into an item-to-item collaborative ﬁltering platform, where member browsing histories are used to build a latent graph of co-occurrences of entities.
The Browsemaps platform which is a hybrid offline/online system enables rapid development, deployment, and computation of collaborative ﬁltering recommendations. The platform can support all entities on LinkedIn, such as member profiles, job postings and company pages and is flexible enough to support each entity’s requirements. For example, job postings will expire after a date and so the browsemap will remove the job listing while retaining the company page as is.
With this online/offline architecture, Browsemap’s online/offline architecture is well suited for LinkedIn’s extensive traffic and powers many of LinkedIn’s navigational aids. These collaborative filtering datasets also enable many hybrid recommender systems.
The search for more personalized recommendation systems will continue to drive research and development soon. As Collaborative Filtering methods come more sophisticated, its application will extend to different kinds of data including financial data, and environmental sensing and monitoring data.