This talk by Kevin Marks from Lift ’08 is over two years old now, but I really like the story of how the “younger generation” views the Web (or the Cloud) – it’s just there, it’s like oxygen:
As I dig deeper into the technologies of the open, social web it’s nice to be reminded that the whole point of these technologies is in many ways to make the infrastructure invisible.
A nice presentation at the Information Architecture Institute‘s IDEA 2009 conference by Christian Crumlish, who is the curator of Yahoo!’s pattern library and Erin Malone. A link to the presentation can be found here – scroll down to the Social Design Patterns Mini-Workshop presentation. The slide deck that accompanies the presentation is shown below.
Crumlish and Malone are also the authors of the book Designing Social Interfaces – which I would list as one of my top 3 books on social web design, along with Josh Porter’s Designing for the Social Web, and Gavin Bell’s Building Social Web Applications.
This is the fourth in a series of posts on key dimensions of Hyperlocal. Other posts in this series are:
- HyperLocal – a Framework
- Hyperlocal – Core Dimensions (Part 1)
- Hyperlocal – Core Dimensions (Part 2)
In this post we consider key enabling technologies that many of the hyperlocal platforms mentioned in previous posts will leverage.
Key Enabling Technologies
The initial post in this series identified the following key enabling technologies for Hyperlocal solutions:
- Identity and Personalization
- Social Media/Social Web
- Real-time Web
- Machine Learning
- Structured Data/Semantic Web
Let’s explore each in turn.
*** Update January 5 2010 ***
It looks like ReadWriteWeb concurs with my identifiation of key enabling technologies for emerging web-based applications. See ReadWriteWeb’s Top 5 Web Trends of 2009. I think leaving out Geolocation is a fairly important omission on RWW’s part. I didn’t make reference to the Internet of Things in my list, but have referred to Web Meets World (another name for the same thing), and its impact on HyperLocal, in previous posts.
*** End of Update ***
Identity and Personalization
Identity is a key part of any online platform these days. Not only does Identity represent one’s online presence, but it’s the basis for relating to other in the context of one’s social graph.
Chris Messina has some great insights into the emergence of Identity as a platform – here’s video of his Identity is the Platform presentation from October 2009, and the slideshow accompanying his talk.
The two key players positioned to dominate the Identity Platform space are:
Identity forms the foundation by which to deliver and manage personalized content for a user. I’m not going to discuss Personalization strategies in detail here, but ReadWriteWeb has an excellent piece on the topic.
Social Media and Social Web
I’m not sure too much needs to be said here. Obviously, Social Media and Social Networks, or what’s often referred to as the Social Graph, is a key feature of the Web today. If you’re going to host and service a Community on your website, you won’t get very far if you don’t design your website for the social web.
Interestingly, the Identity Platforms mentioned in the previous section – OpenID and Facebook Connect – allows you to import the Social Graph from external platforms into your Community site. Alternatively, you may also want to promote your content on other sites on the Social Web – including Twitter and Facebook.
Another important concept to be aware of in the context of the Web and HyperLocal is that of the Social Object. The Social Object is any piece of Content or information that a community might potentially socialize around. So for example, Twitter posts, news articles, photos, business listings, videos, URLs, movies … all are potential social objects that a community might share and discuss.
Social Media is any form of publishing that facilitaties social collaboration and sharing of information, content, and conversation. Social Networking sites, Blogs, Wikis, Microblogging platforms etc. all fall under this category.
The following are just a few of the more popular platforms on the social web:
It’s important on your website to enable key forms of social behavior, including sharing and bookmarking content, commenting, rating and reviewing, and so on. These are features that any social website should support, and the key community platform players, such as Jive, Pluck, and Lithium all support.
With the viral adoption of Twitter, the real-time web has really taken off of late. To understand the state of the Real-time Web heading into 2010, see the following:
- For an excellent overview of the real-time Web, please see RWW’s Top 5 Web Trends of 2009: The Real-Time Web, from September 2009.
- For a series of fabulous videos from TechCrunch’s recent Real-time Web CrunchUp event in November 2009 in San Fran, please see Real-time Web – CrunchUp Event in November.
- Any finally, here’s Mashup’s view of in the real-time Web heading into 2010: 5 Big Real-Time Web Trends of 2009
The Real-time Web can be viewed from a number of different angles. Three are:
This is the core of the Real-time Web – the underlying real-time feed protocol. Please see:
- Rest in Peace, RSS – TechCruch, May 2009
- PubSubHubbub: Real-Time Feeds and Real-Time Feedback Too? – louisgray.com, July 2009
- Twitter to Open Firehose to Developers – Mashable, December 2009
- You say you want a revolution – Steve Gilmore, December 2009
- RSSCloud Vs. PubSubHubbub: Why The Fat Pings Win – TechCrunch, September 2009
- Twitter Search, Google launches Real-time Search – Mashable, December 2009
- Real-time Search-off – TechCrunch, May 2009
Real-time Geo, or Geo-streams
- Twitter API Adds Location Data – Tweets Get Realtime Geo – ProgrammableWeb, August 2009
For more on real-time geo and geolocation trends, see the Geolocation section that follows.
Managing the Real-time Firehose of Information
With the Real-time Web, information bursts furth as a massive stream – or firehose – of information, which is then filtered or consumed according to one’s particular social filters and interests. It can be overwhelming at first, as Nova Spivak discusses here.
… This post is a work-in-progress. Please return later to view the completed post.
Media “futurist” Ross Dawson has been talking a lot about Influence lately – and Influence’s role in Media, Marketing, Social Organization, etc. He had a nice blog post titled Top blog posts of 2009: 8 Perspectives on Influence in December 2009 that listed his top blog posts on Influence during 2009.
There’s some interesting stuff here. Nothing really I haven’t specifically encountered before. But Dawson aggregates various Social Media related trends – across Media, the Economy, Social Media, Organizational structure and processes – under the broad category of Influence, and for me anyway, it produced some unique insights.
Here’s a list of some Dawson’s Influence-related posts over 2009 that caught my eye:
- “Influence is the future of media” – July 2009
- Will Influencism supplant Capitalism? The emergence of the influence economy – September 2009
- Five key trends in how influence is transforming society – August 2009
- What are the business models for influence and reputation – today and in the future? – August 2009
- Influence research: Duncan Watts and the debate on whether “influentials” really matter – August 2009
- Influence research: what are the real influence networks within Twitter and social media? – August 2009
Food for thought here.
Always like to come across a concept that serves as a “hook” around which to describe a shared concept. In the Social Web space, Social Object (coined by Jyri Engestrom) is such a concept (see also here).
Another concept that’s been around for a while, but which I just recently stumbed across, is that ambient intimacy, coined by Leisa Reichelt back in 2007. Here’s how Leisa describes the notion of ambient intimacy:
Ambient intimacy is about being able to keep in touch with people with a level of regularity and intimacy that you wouldn’t usually have access to, because time and space conspire to make it impossible. Flickr lets me see what friends are eating for lunch, how they’ve redecorated their bedroom, their latest haircut. Twitter tells me when they’re hungry, what technology is currently frustrating them, who they’re having drinks with tonight.
Who cares? Who wants this level of detail? Isn’t this all just annoying noise? There are certainly many people who think this, but they tend to be not so noisy themselves. It seems to me that there are lots of people for who being social is very much a ‘real life’ activity and technology is about getting stuff done.
There are a lot of us, though, who find great value in this ongoing noise. It helps us get to know people who would otherwise be just acquaintances. It makes us feel closer to people we care for but in whose lives we’re not able to participate as closely as we’d like.
Knowing these details creates intimacy. (It also saves a lot of time when you finally do get to catchup with these people in real life!) It’s not so much about meaning, it’s just about being in touch.
Google is moving into the Social Search space as well. First there was the launch in Google Labs of Google Social Search:
Then there was the recent announcement at the Google Search Event that Google will be partnering with Facebook and MySpace (in addition to Twitter) to accept real-time feeds from these applications.
Google is certainly making rapid moves of late to integrate both the real-time web and the social web into its products. It will be interesting to watch moving forward.
This is the fifth of a series of posts on the topic of programming Collective Intelligence in web applications. This series of posts will draw heavily from Santam Alag’s excellent book Collective Intelligence in Action.
These posts will present a conceptual overview of key strategies for programming CI, and will not delve into code examples. For that, I recommend picking up Alag’s book. You won’t be disappointed!
Click on the following links to access previous posts in this series:
- Part 1: Introduction
- Part 2: Basic Algorithms
- Part 3: Gathering Intelligence from User Interaction
- Part 4: Calculating Similarity
So far in this series of posts, we’ve been introduced to some basic algorithms in CI, looked at various forms of user interaction, and explored how we used term vectors and similarity matrices to calcuate the similarity between users, items, and items and users. In this post, we’ll explore how to gather intelligence from tags.
Alag introduces the topic of gathering intelligence from tags as follows:
Users tagging items—adding keywords or phrases to items—is now ubiquitous on the web. This simple process of a user adding labels or tags to items, bookmarking items, sharing items, or simply viewing items provides a rich dataset that can translate into intelligence, for both the user and the items. This intelligence can be in the form of finding items related to the one tagged; connecting with other users who have similarly tagged items; or drawing the user to discover alternate tags that have been associated with an item of interest and through that finding other related items.
With that introduction, let’s begin.
Introduction to Tagging
Tagging is the process of adding freeform text, either words or small phrases, to items. These keywords or tags can be attached to anything in your application—users, photos, articles, bookmarks, products, blog entries, podcasts, videos, and more.
[Previously] we looked at using term vectors to associate metadata with text. Each term or tag in the term vector represents a dimension. The collective set of terms or tags in your application defines the vocabulary for your application. When this same vocabulary is used to describe both the user and the items, we can compute the similarity of items with other items and the similarity of the item to the user’s metadata to find content that’s relevant to the user.
In this case, tags can be used to represent metadata. Using the context in which they appear and to whom they appear, they can serve as dynamic navigation links.
In essence, tags enable us to:
- Build a metadata model (term vector) for our users and items. The common terminology between users and items enables us to compute the similarity of an item to another item or to a user.
- Build dynamic navigation links in our application, for example, a tag cloud or hyperlinked phrases in the text displayed to the user.
- Use metadata to personalize and connect users with other users.
- Build a vocabulary for our application.
- Bookmark items, which can be shared with other users.
Content-based vs. Collaborative-based Metadata
Alag emphasizes the distinction between content-based and collaborative-based sources of metadata. Quoting Alag:
In the content-based approach, metadata associated with the item is developed by analyzing the item’s content. This is represented by a term vector, a set of tags with their relative weights. Similarly, metadata can be associated with the user by aggregating the metadata of all the items visited by the user
within a window of time.
In the collaborative approach, user actions are used for deriving metadata. User tagging is an example of such an approach. Basically, the metadata associated with the item can be computed by computing the term vector from the tags—taking the relative frequency of the tags associated with the item and normalizing the counts.
When you think about metadata for a user and item using tags, think about a term vector with tags and their related weights.
Categorizing Tags based on how they are generated
We can categorize tags based on who generated them. There are three main types of tags: professionally generated, user-generated, and machine-generated.
Professionally generated Tags
Again quoting Alag:
There are a number of applications that are content rich and provide different kinds of content—articles, videos, photos, blogs—to their users. Vertical-centric medical sites, news sites, topic-focused group sites, or any site that has a professional editor generating content are examples of such sites.
In these kinds of sites, the professional editors are typically domain experts, familiar with content domain, and are usually
paid for their services. The first type of tags we cover is tags generated by such domain experts, which we call professionally generated tags.
Tags that are generated by domain experts have the following characteristics:
- They bring out the concepts related to the text.
- They capture the associated semantic value, using words that may not be found in the text.
- They can be authored to be displayed on the user interface.
- They can provide a view that isn’t centered around just the content of interest, but provides a more global overview.
- They can leverage synonyms—similar words.
- They can be multi-term phrases.
- The set of words used can be controlled, with a controlled vocabulary.
Professionally generated tags require a lot of manpower and can be expensive, especially if a large amount of new content is being generated, perhaps by the users. These characteristics can be challenging for an automated algorithm.
Back to Alag:
It’s now common to allow users to tag items. Tags generated by the users fall into the category of user-generated tags, and the process of adding tags to items is commonly known as tagging.
Tagging enables a user to associate freeform text to an item, in a way that makes sense to him, rather than using a fixed terminology that may have been developed by the content owner or created professionally.
[For example, considering the tagging processes] at del.icio.us. Here, a user can associate any tag or keyword with a URL. The system displays a list of recommended and popular tags to guide the user.
The use of users to create tags in your application is a great example of leveraging the collective power of your users. Items that are popular will tend to be frequently tagged. From an intelligence point of view, for a user, what matters most is which items people similar to the user are tagging.
User-generated tags have the following characteristics:
- They use terms that are familiar to the user.
- They bring out the concepts related to the text.
- They capture the associated semantic value, using words that may not be found in the text.
- They can be multi-term phrases.
- They provide valuable collaborative information about the user and the item.
- They may include a wide variety of terms that are close in meaning.
User-generated tags will need to be stemmed to take care of plurals and filtered for obscenity. Since tags are freeform, variants of the same tag may appear. For example, collective intelligence and collectiveintelligence may appear as two tags.
[Additionally,] you may want to offer recommended tags to the user based on the dictionary of tags created in your application and the first few characters typed by the user.
Tags or terms generated through an automated algorithm are known as machine-generated tags. Alag provides several examples in his book of extracting tags using an automated algorithm – for example, generating tags by analyzing the textual content of a document.
Again from Alag:
An algorithm generates tags by parsing through text and detecting terms and phrases.
Machine-generated tags have the following characteristics:
- They use terms that are contained in the text, with the exception of injected synonyms.
- They’re usually single terms—Multi-term phrases are more difficult to extract and are usually done using a set of predefined phrases. These predefined phrases can be built using either professional or user-generated tags.
- They can generate a lot of noisy tags—tags that can have multiple meanings based on the context, including polysemy and homonyms.—For example, the word gain can have a number of meanings—height gain, weight gain, stock price gain, capital gain, amplifier gain, and so on. Again, detecting multiple-term phrases, which are a
lot more specific than single terms, can help solve this problem.
In the absence of user-generated and professionally generated tags, machine-generated tags are the only alternative. This is especially true for analyzing user-generated content.
How to leverage Tags in your application
Alag leads off this section of his book with the following:
It’s useful to build metadata by analyzing the tags associated with an item and placed by a user. This metadata can then be used to find items and users of interest for the user. In addition to this, tagging can be useful to build dynamic navigation in your
application, to target search, and to build folksonomies. In this section, we briefly review these three use cases.
I’m not going to explore the specific use cases that Alag covers in his book. Again, you know where to find the details. :)
Alag concludes his chapter on extracting intelligence from tagging with:
- An example that illustrates the process of extracting intelligence from user tagging, and
- Thoughts on building a scalable persistence architecture for tagging
Exploring the tagging example and Alag’s thoughts on a persistence architecture for tagging is beyond the introductory scope of this post. Please see Alag’s book for more information.
Hopefully this post has given you a bit of a flavor of how Tags are used to surface collective intelligence in a social web application. In the final post in this series, I’ll be exploring extracting intelligence from textual content.
Also in this series