Social networks of the Web 2.0 style have received major attention in the recent lit-erature, with focus on applying data mining methods on social relations and, most prominently, relations among tags. [56] provides an empirical study of the tagging behavior and tag usage in online communities. [69, 114] discuss methods for generat-ing taxonomy-like relations among tags, so-called “folksonomies”, based on statistical measures. Similar approaches have been applied to query-and-click logs, e.g. in [17], but none of this work considers social relations between different users. Identifying important and emergent tags and visualising them in so-called “tag clouds” (and cor-responding time series) has been extensively explored; recent work along these lines includes [52, 64, 115]. The dynamics of social relations among users (e.g. the rate of making friends) has been studied, for example, in [12, 88, 123].
As for the exploitation of social tags for information retrieval, [14] discusses the challenges of searching and ranking in social communities. Various forms of community-aware ranking methods have been developed, mostly inspired by the well-known Page-Rank method [30] for web link analysis. [70] proposes FolkPage-Rank for identifying impor-tant users, data items, and tags. [133] compares different methods for identifying au-thoritative users with high expertise. [19] introduces SocialPageRank, to measure page authority based on its annotations, and SocialSimRank for the similarity of tags. [132]
further extends this work by augmenting language models with tag similarities. [49]
shows that explicit user tagging can help to improve precision of queries for Intranet search. The work of [68] provides an empirical analysis of how social bookmarking can influence web search, with both positive and negative insights. None of this prior work considers the impact of user-friendship strengths on the scoring of search results, and the problem of efficient query processing in the presence of such “social wisdom”.
The work of [13] discusses efficient top-k processing of network-aware search queries in collaborative tagging sites by clustering users in groups of seakers and tag-gers with similar behaviour and defining cluster related upper score bounds for items.
However, they do not consider the impact of social relations or tagging activities on the result quality.
Aspects of user communities have also been considered for peer-to-peer search, most notably, for establishing “social ties” between peers and routing queries based on corresponding similarity measures (e.g. similarities of queries issued by different peers). [25] has studied “social” query routing strategies based on explicit friendship relationships and behavioral affinity. [103] has developed an architecture and methods for “social” overlay networks that connect “taste buddies” with each other. [97] has proposed a community-enhanced web search engine that takes into account prior clicks by community members. [42] has proposed the notion of Peer-Sensitive ObjectRank, where peers receive resources from their friends and rank them using peer-specific trust values.
There is ample literature on collaborative filtering for recommender systems (e.g.
[11, 66, 108, 110]), for example, to predict movies or e-commerce items that customers are likely to buy or to identify news that news-feed subscribers are likely interested in.
In a nutshell, these methods aim to learn user preferences from the collective behaviour
- like purchases or tagging - of an entire community. Typically, statistical analysis and machine learning techniques are used offline for precomputations, and the actual run-time recommendations have got limited flexibility and cannot easily cope with high dynamics and ad-hoc interests of individual users (as expressed by an ad-hoc query).
One of the notable exceptions is the work by [43] which addresses scalability issues when the number of users and items in a recommender system grows to many millions and both undergo fast changes. However, in contrast to our “social search” theme, this prior work considers only the space of item pairs and there is no notion of user-specific tags or annotations on items. Thus, our setting requires search over a three-dimensional user-tag-item space, as opposed to the two-three-dimensional user-item space of the previous work on collaborative filtering.
3 SENSE Framework
In this chapter, we discuss the data model used in our SENSE—Socially ENhanced Search and Exploration—framework for social tagging networks, introduce the basic design decisions being made for its implementation and present three real world social tagging networks that have been mapped to our data model and are the basis for the experimental evaluations given in Section 5.4 and 6.4.
3.1 Data Model
This section defines a unified set of abstractions, modelling the user-provided data and activities in social tagging networks. For this, entities occurring in social networks are cast into the model, representing different types and their mutual relationships.
3.1.1 Types of Entities
We identify three major types of entities in social tagging networks which are repre-sented in our data model in the following way:
• User U: A user U in social tagging networks produces content either by creating or publishing own documents or by tagging existing content.
• Document d: A document d is a content item that is published by a user U, e.g.
a blog entry, a bookmark or a photo, etc.
• Tag t : A tag t is a keyword used by a user U to annotate documents d and usually describes or categorises the respective document.
Additionally, social tagging networks exhibit various relationships, both within entities of the same type (intra-entity relations) and between entities of different types (inter-entity relations). Each relationship can be cast into a relational scheme and is given in detail in the following sections.
3.1.2 Intra-Entity Relations
Each of the three entity types exhibits some sort of relation between the entities of the same type.
User-User Relation: Friendship(U1,U2,type,sf)
Friendshipis a user-user relation between two users U1and U2with a weight sfequal to the friendship strength of U2with respect to U1. In friendship relations, we treat sf
as a pluggable building block; it may also be completely absent (or constant/equal for each pair of users). Quantitative measures for different friendship relations are given in Section 5.1 and 6.2.
The friendship relation can be defined in different forms; we therefore allow multi-ple types of friendship relations captured by the type attribute. A variety of friendship type definitions are possible, and in the following, we describe three intriguing types:
social, spiritual and global friendship.
Definition 3.1 (Social Friendship). A friendship of type social is defined by a user-provided relation which can be either symmetric or asymmetric; it assumes that such a relation exists only if the users know each other by somesocial interaction in real life or within the social network.
A key feature of social networks is to allow users to maintain an explicit list of friends. Hence, a straightforward way to create a social friendship relation with in-stances of the form that user U1considers user U2as a friend is done by the explicit act of U1adding U2to her friendship list.
Additional means of establishing social friendships include, for example, a user subscribing to another user’s content (e.g. in LibraryThing.com, by adding other users’
books to the own set of interesting libraries), or writing comments on a user’s profile page or addressing others in messages (e.g. in Twitter.com, by mentioning a user in own tweets). Regardless of how direct friends are defined, we may consider the transitive closure of the friendship relation or a bounded set of transitive connections (up to some distance). The union over all users and social friendship relations defines the user or friendship graph of a social tagging network.
Definition 3.2 ((Social) Friendship Graph, User Graph). The graph representing all users in a social network with edges being defined between users according to their social friendship relations is called thefriendship graph or user graph of the social network.
The friendship strength sf of type = social, also denoted as social friendship strength, between two users could be finally derived from their distance in the underly-ing social friendship graph of the network, i.e. the length of the shortest path from one user to the other, and can function as measure of the trust of one user in another user.
In addition, the friendship strength could be weighted by some semantic measurement like considering the overlap in the usage of tags, too, such that it is a combination of social friendship and mutual interests.
Definition 3.3 (Spiritual Friendship). A friendship of type spiritual considers similar behaviour or activities of users. This type of friendships is a symmetric relation and does not assume that the users know each other but rather are“Brothers in Spirit”—
hence, the chosen name for this type of friendship—expressing the overlap in thematic interests and being an indicator of users sharing common interests.
The spiritual friendship could, for example, be based on users’ participation in the-matic similar groups, or it could be based on similar tags being issued to documents or personal content items receiving similar tags from third parties. It could also be derived from mutual comments and ratings.
The friendship strength sfof type= spiritual, also denoted as spiritual friendship strength, between two users represents the degree of mutual interest overlap by taking the mentioned behaviour and activities of users into account.
Definition 3.4 (Global Friendship). A friendship of type global is defined by neglect-ing all kind of distinctions of users but treatneglect-ing all of them as oneglobal community of like-minded users, i.e. each user is a friend of each other user and with an equal weight for the friendship strengthsffor all users.
Concrete definitions of social, spiritual and global friendship relations are given in Section 5.1 and 6.2.
Tag-Tag Relation: TagSimilarity(t1,t2,tsim)
In social tagging networks, users frequently use more than one tag to describe a particu-lar document, documents can be tagged by more than one user, and the same tag can be used on different documents. There is no restriction which terms can be used as tags. In fact, tags are freely selected annotations, and, given the natural diversity of users’ opin-ions in social networks, it is often the case that different tags describe the same content item and may express (near-)synonyms (e.g. “feline” and “cat”, “Web_2.0” and “So-cial_Web”, etc.) or other kinds of semantically related concepts (e.g. hyponyms such as “dogs” and “German_shepherd”, “search_engine” and “Google”, etc.).
Determining the similarity of tags is a way of clustering tags with respect to their meaning. To this end, an ontology, a light-weight knowledge base that captures differ-ent types of “semantic” relations among tags (e.g. synonymy or specialisation / gen-eralisation), could be exploited. An ontology may be provided by domain experts or imported from real ontologies, or they may be built by applying data-mining tech-niques to the tagging data of the social network. The latter case is more realistic for today’s types of social tagging networks and is often referred to as “folksonomies”
(folklore taxonomies) [70, 127]. Hence, the tag-usage statistics in the social network are harnessed to derive a weight corresponding to the tag similarity tsim for two tags t1
and t2. Quantitative measures are given again in Section 5.1 and 6.2.
Document-Document Relation: Linkage(d1,d2,w)
In some applications, like in web search and PageRank computations [31, 99], docu-ments also exhibit relations among themselves. In the case of web pages, this linkage is obvious and given by the hyperlink graph, with weights w often chosen proportionally to the outdegree of the pages. For other types of documents, different notions of links between two documents d1and d2and their weight w need to be defined; conceivable options include, for example, the geographic proximity of different photos when GPS information is available or may be based on associated feature vectors representing the documents.
3.1.3 Inter-Entity Relations
For our unified data model, we observe the following relations between entities of different types:
Document-Tag Relation: Content(d,t,score)
By annotating a document d with a tag t, users strongly associate the tag with the document so that the tag should be viewed as a strong indication about the document’s content. We consider the value score in the content relation as a weight associated with a document-tag pair to reflect how well that tag describes the document.
User-Document Relation: Rating(U ,d,rating)
In many social communities, a user U can explicitly rate a document d, which is cap-tured by a rating score rating. Another naïve instantiation of Rating is authorship of a content item, which (e.g. in the case of bookmarks) can be seen as an endorsement for the document. Alternatively, we can also derive a weight as rating score based on the tag usage of the user.
User-Tag-Document Relation: Tagging(U ,t,d,score)
A tag t is naturally associated with the document d and user U who associates it with that document. Hence, Tagging is a ternary relation between users U , documents d, and tags t. In full generality, it can not be decomposed into three binary relations (users-docs, docs-tags, users-tags) without losing information. Nevertheless, binary-relation (or, equivalently, graph or matrix) representations for tagging are very popular in the literature on social networks for convenience.
Our approach preserves the full information and feeds it into a scoring model (see Sections 5.1 and 6.2).
3.1.4 Remarks
With the ingredients given above, our data model eventually allows for well-founded scoring and ranking models that go far beyond ad-hoc retrieval models for social net-works which often include many hard-to-tune parameters.
Furthermore, it is important to note that the weights for all the relations introduced in our data model can be defined in many different ways. The examples we have pro-vided present some of the alternatives, but are not meant to be exhaustive. Our data model and the upcoming scoring and ranking models presented in Section 5.1 and 6.2 are independent from changes to those functions.
Also note that this model is much richer than, e.g. the datasets in traditional recom-mender systems. In addition to the shown relations, we can easily add various kinds of aggregation views, for example, document-tag frequencies aggregated over all users.
Also note, while our data model captures all relationships that might occur in a social network, some of the introduced relationships might not exist for certain so-cial networks. In LibraryThing.com, for instance, user interactions are mainly through bookmarking and tagging, whereas in Flickr.com, the vast majority of users has got
“authored” contents, which in this case are the photos that they have published. More-over, only few platforms for social networks would show the users’ home locations and some might not facilitate any cross-references among individual items. A detailed analysis of three real world social tagging networks is given in Section 3.2.
User Content: &
Figure 2: Example illustration of our data model
In any case, our data model is flexible enough for being applicable to most social tagging platforms. And in fact, as shown in Section 5.4 and Section 6.4, our exper-imental studies on three real world social networks utilise only a subset of our data model.
An example illustration for our data model with 4 users, their tagged documents and the relations between entities of the same and of different types, is given in Figure 2.