Collective Intelligence – Part 3: Gathering Intelligence from User Interaction
This is the third of a series of posts on the topic of programming Collective Intelligence in web applications. This series of posts will draw heavily from Santam Alag’s excellent book Collective Intelligence in Action.
These posts will present a conceptual overview of key strategies for programming CI, and will not delve into code examples. For that, I recommend picking up Alag’s book. You won’t be disappointed!
Click on the following links to access previous posts in this series:
Introduction – Applying CI in your Application
Alag states that there are three things that need to happen to apply collective intelligence in your application.
You need to:
- Allow users to interact with your site and with each other, learning about each user through their interactions and contributions.
- Aggregate what you learn about your users and their contributions using some useful models.
- Leverage those models to recommend relevant content to your users.
This post will focus on the first of these steps: specifically the different forms of user interaction that capture the raw data used to derive collective intelligence in social web applications.
In Alag’s book, he provides persistence models for capturing this user interaction data. In this post, however, I will not be discussing the specific persistence models that model these user interactions. Please pick up a copy of Alag’s book if you are interested in the details of how the data collected from these user interactions are captured in underlying persistence models.
Gathering Intelligence from User Interaction
To extract intelligence from a user’s interaction in your application, it isn’t enough to know what content the user looked at or visited. You need to quantify the quality of the interaction. A user may like the article or may dislike it, these being two extremes. What one need is a quantification of how the user liked the item relative to other items.
Remember, we’re trying to ascertain what kind of information is of interest to the user. The user may provide this directly by rating or voting for an article, or it may need to be derived, for example, by looking at the content the user has consumed. We can also learn about the item that the user is interacting with in the process.
In this section, we look at how users provide quantifiable information through their interactions. … Some of the interactions such as ratings and voting are explicit in the user’s intent, while other interactions such as using clicks are noisy – the intent of the user isn’t perfectly known and is implicit.
Alag discusses 6 examples of user interaction from which collective intelligence data might be extracted. These are:
- Rating and Voting
- E-mailing of Forwarding a Link
- Bookmarking and Saving
- Purchasing Items
I would generalize “e-mailing and forwarding a link” to “forwarding and sharing content” generally, of which “e-mailing and forwarding a link” is variation.
This post will provide a very light treatment of some of the forms of user interaction from which collective intelligence is derived.As mentioned above, I will not be exploring the persistence models that capture the user data from these interactions.
So, first up, rating and voting.
Rating and Voting
Asking the user to rate an item of interest is an explicit way of getting feedback o how well the user liked the item. The advantage with a user rating content is that the information provided is quantifiable and can be used directly.
Alag has a very nice section on the specific data and persistence models that underlie the rating and voting data captured from user intereaction. Please refer to his book for this additional detail.
Forwarding and Sharing Content
Forwarding and sharing is another activity that can be considered a positive vote for an item. Alag briefly discusses a variation of this activity in the form of a user e-mailing or forwarding a link
Bookmarking and Saving
A few quick comments from Alag:
Online bookmarking services such as del.icio.us allow users to store and retrieve URLs, also known as bookmarks. Users can discover interesting links that other users have bookmarked through recommendations, hot lists, and other such features. By bookmarking URLs, a user is explicitly expressing interest in the material associated with the bookmark. URLs that are commonly bookmarked bubble up higher in the site.
The process of saving an item or adding it to a list is similar to bookmarking and provides similar information.
Bookmarking and saving is another user interaction activity for which Alag explores the underlying persistence model.
In an e-commerce site, when users purchase items, they’re casting an explicit vote of confidence in the item – unless the item is returned after purchase, in which case it’s a negative vote. Recommendation engines, for example the one used by Amazon, can be built from analyzing the procurement history of users. Users that buy similar items can be correlated and items that have been bought by other users can be recommended to a user.
So far we’ve looked at fairly explict was of determining whether a user liked or disliked a particular item, through ratings, voting, forwarding, and purchasing items. When a list of items is presented to a user, there’s a good chance that the user will click on one of them based on the title and description. But after quickly scanning the item, the user may find the item to be not relevant and may browse back or search for other items.
A simply way to quantify an article’s relevance is to record a positive vote for any item clicked. This approach is used by Google News to personalize the site. To furthre filter out the noise, such as items the user didn’t really like, you could look at the amount of time the user spent on the article. Of course, this isn’t fail proof. For example, the user could have left the room to get some coffee or been interrupted when looking at the article. But on average, simply looking at whether an item was visited and the time spent on it provides useful information that can be mined later.
You can also gather useful statistics from this data:
- What is the average time a user spends on a particular item?
- For a user, what is the average time spent on any given article?
Web 2.0 is all about connecting people with similar people. This similarity may be based on similar tastes, positions, opinions, or geographic location. Tastes and opinions are often expressed through reviews and recommendations. These have the greatest impact on other users when:
- They’re unbiased
- The reviews are from similar users
- They’re from a person of influence
Depending on the application, the information provided by a user may be available to the entire population of users, or may be privately available only to a select group of users.
Perhaps the biggest reasons why people review items and share their experiences are to be discovered by others and for boasting rights. Reviewers enjoy the recognition, and typically like the site and want to contribute to it. Most of them enjoy doing it. A number of applications highlight the contributions made by users, by having a Top Reviewers list. Reviews from top reviewers are also typically placed toward the top and featured more prominently. Sites may also feature one of their top reviewers on the site as an incentive to contribute.
Here again, Alag provides additional commentary around the persistence model underlying Reviews. See the book for details.
In this post, we (very) briefly explored forms of user interaction that provide the raw data that applications use to derive collection intelligence to provide useful and relevant content to their users. In future posts in this series, we’ll explore how collective intelligence algorithms are used to aggregate this content, and provide useful insight and information to the users or a social web application.
Also in this series
- Collective Intelligence – Part 1: Introduction
- Collective Intelligence – Part 2: Basic Algorithms
- Collective Intelligence – Part 4: Calculating Similarity
- Collective Intelligence – Part 5: Extracting Intelligence from Tags