metadata as social browsing.
Applying and adapting existing technologies from the field of information re-trieval that make use of traditional domain metadata as well as social metadata has the potential to allow for new ways of interacting with digital resources, and of providing new services based on these metadata. Especially tags as in-troduced in Section 2.6.2 play an important role, as they have proven to be an effective means of appropriately describing the content of a resource and can thus significantly improve resource discovery [LGZ08]. This is of special im-portance for videos and images, where several automatic approaches to analyse contents are investigated (see [UKBB09] or [DUBqW09]), but where it is still dif-ficult to automatically extract appropriate textual representations. Furthermore, automatic trend detection [HJSS06] can be applied to find out about emerging and relevant topics.
The challenges when aiming to support access to digital resources by means Challenges
of the discussed interaction possibilities will be:
• to provide them in an adequate way so that users are encouraged to make use of them,
• to offer them in existing, traditional environments in scenarios with differ-ent characteristics than the World Wide Web (e.g., intranets), and
• to develop new services that meet the users needs and exploit the full po-tential of all information at hand.
3.6.2 The Long Tail Phenomenon
The term Long Tail was first coined by Chris Anderson in an article in the Wired magazine29 [And04]. It describes a phenomenon that can be observed in many markets (especially in eCommerce) and online platforms: a declining impor-tance of very popular “mainstream resources” or “hits” in favour of niches or so-called micromarkets that are only attractive for a few users. Figure 3.430 de-picts this Long Tail, a power law function: Every point on the x-axis stands for a single entity, whereas the y-axis represents the popularity of an entity (i.e., the number of times it was bought or accessed).
Figure 3.4:The Long Tail – a power law function describing the distribution of resources and their popularity
The online store Amazon31is a very good example for such a Long Tail distri-bution. In contrast to a traditional store where every good that can be sold also requires some physical space, and where only a few goods can be presented prominently to a very restricted number of customers, Amazon can offer ba-sically any kind of product to any user with access to the World Wide Web.
Although the majority of books is sold only a few times, about a quarter of Amazon’s sales of books comes from outside the top 100,000 bestselling titles [And06].
29See www.wired.com
30Source: http://en.wikipedia.org/wiki/The_Long_Tail
31See http://www.amazon.com
3.6 New Phenomena, Potentials, and Challenges
The reasons for the emersion of Long Tail distributions are manifold. In [And06], Anderson identified the following forces:
• Democratisation of production (lengthens the tail)
• Democratisation of distribution (fattens the tail)
• Connections between supply and demand (drives business from hits to niches)
Although Long Tail phenomena could be observed long before the World Wide Web (e.g., in the emergence of big supermarkets offering a lot more prod-ucts than small stores), the tools and infrastructure of the Web 2.0 significantly accelerated this development. First of all, we have a democratisation of pro-duction realised with tools that allow to create digital resources like images or videos by almost everyone. Second, using the World Wide Web and accord-ing platforms, resources or information about them can easily be published and shared (i.e., distributed). This also holds true for physical goods such as books.
Last but not least, each resource can be offered to potentially any user or cus-tomer through the World Wide Web, thus further connecting supply and de-mand.
The Long Tail phenomenon can also be observed in the production of re-sources. In [OD08], Ochoa and Duval provide a quantitative analysis of sev-eral Web sites that are based on different types of user generated content. They showed that the production of user generated content follows a Long Tail dis-tribution – this is often referred to as participation inequality.
Potentials and Challenges
In the context of this thesis, the Long Tail phenomenon first of all provides Potentials an opportunity. In contrast to environments where only mainstream resources
could be accessed, the growing number of niches with very specialised digital resources gives users more choice, and also raises the chance that resources exist and are accessible that exactly meet the information needs of a user (of course, niche resources are not necessarily of better quality).
Yet, when it’s about finding and accessing content, traditional mechanisms Challenges based on broadcasting (e.g., advertisements in the mass media) or on the
pop-ularity of a resource (e.g., Google’s Pagerank) will not work or are at least not sufficient in such a situation.
The challenge is – besides the above described means to create and provide digital resources for the democratisation of production and distribution – to lead
and guide users to resources that are interesting for them. This especially con-cerns means to provide resource subset selections, where simple mechanisms just based on a user’s query are not sufficient. Alternative methods to navigate and filter the content are needed – e.g., social browsing can be applied to offer intuitive access to resources in the Long Tail. Automatic approaches to estimate the relevance of a resource have to take into account a user’s personal interests (e.g., by applying information filtering techniques [HSS01]).
3.6.3 Collective Intelligence
The harnessing of Collective Intelligence was already mentioned in Section 3.3 as one of the key principles of Web 2.0. Collective Intelligence is not at a new phenomenon from the field of computer science but has its roots in different disciplines such as biology (there sometimes referred to as swarm intelligence), sociology, and economy. Generally, it refers to a phenomenon where the collec-tive behaviour of many entities (such entities can be humans, but also animals, bacteria or even particles) leads to a result considered as intelligent. Such phe-nomena can be found, e.g., in the World Wide Web, in science, in politics or in business [TW06]. A helpful characterisation emphasising the idea that Collec-tive Intelligence is “more than the sum of its parts” emerged during the “Col-lective Intelligence FOO Camp” organised by O’Reilly at Google in 2008:
“The network knows what the nodes do not.” [Tor08]
To illustrate the concept of Collective Intelligence, we will now provide some ex-amples where Collective Intelligence occurs or where it is harvested to provide additional value:
Ant societies: Sometimes referred to as “superorganisms”, ant societies are highly organised and consist of millions of ants that communicate with each other and perform very different tasks (division of labour). Ant so-cieties are able to solve complex problems such as constructing nests and cultivating food.
Google PageRank: The PageRank algorithm developed by Page et al. in 1998 [PBMW98] still is the basis for how Google32 is estimating the importance of search results, thus ranking the results accordingly. The PageRank of a web site depends on its location in the Web’s graph structure – in simple terms, the underlying assumption is that the more people link to a web page, the more important it is.
32See http://www.google.com
3.6 New Phenomena, Potentials, and Challenges
Wikipedia: This online encyclopaedia is built up only from contributions of volunteers. Basically any user with access to the World Wide Web can create and modify entries in the Wikipedia. The English Wikipedia site contains the remarkable amount of more than 3.2 million articles.33 In 2005, the Nature magazine carried out a study comparing Wikipedia and the Encyclopaedia Britannica concerning their coverage of science. In the study, entries about numerous scientific fields from both sources were sent to experts for a peer review. The result showed that the difference in ac-curacy was surprisingly small: science entries from Wikipedia contained 3.86 inaccuracies on average, entries from the Encyclopaedia Britannica 2.92 [Gil05].
Digg: Digg34is a news service that allows users to provide feedback (so-called
“diggs”) about news they consider important. Based on this feedback, news are ranked accordingly, thus allowing users to discover relevant news.
Delicious The social bookmarking tool Delicious35offers the possibility to store and tag bookmarks online, as well as to share this information with others.
Users’ bookmark lists and the collective annotation of resources with tags and descriptions allow for social browsing and efficient retrieval of rele-vant content. As of November 2008, Delicious has more that 5.3 million users and over 180 million unique URLs saved.36
Akismet Akismet37initially was a plugin for the blog tool and publishing soft-ware WordPress38. It allows users to classify comments in their blogs as spam and to classify new comments accordingly based on the feedback of all users. Today, Akismet can also be used to classify arbitrary strings. Ac-cording to Akismet, more than 15 milliard spam comments have already been identified by the system as of April 2010.39
Collective Intelligence can be considered as an emergent property that some-times can simply be observed (e.g., in an ant society), and somesome-times requires more sophisticated methods (e.g., Google’s PageRank algorithm) to collect and aggregate the results of the single entities behaviours.
33According to http://wikipedia.org, accessed April 29, 2010
34See http://digg.com
Several definitions and characterisations of Collective Intelligence can be found in literature, and there is no consensus about the exact definition of the term itself. For example, the MIT Center for Collective Intelligence40 provides this definition:
“Collective Intelligence is groups of individuals doing things collectively that seem intelligent.”
Obviously, this definition is very fuzzy and excludes cases in which the enti-ties are not individuals. It is also not necessarily true that Collective Intelligence can only be harvested when the things that groups of individuals do “seem in-telligent” – actions might also seem non-intelligent until a proper and maybe complex aggregation method is applied.
Wikipedia defines Collective Intelligence as follows:
“Collective intelligence is a shared or group intelligence that emerges from the collaboration and competition of many individuals.” [Wik08a]
This definition is restricting Collective Intelligence to cases where collabora-tion and competicollabora-tion is required. Yet, such a definicollabora-tion would exclude scenarios where individuals act independently with different motivations than competi-tion. For example, Google PageRank would be excluded in this case.
As we want to include all scenarios in which Collective Intelligence shows up, we will thus follow the definition provided by Tom Atlee, a pioneer in the field of Collective Intelligence [Atl04]: