• No se han encontrado resultados

communities are of key importance. Structural holes are such empty dyads; creating a link— maybe a bridge—would bring power and control to its extremities.

2.3 Centrality

This idea is one of the main concepts in social network analysis, but it remains rather vaguely defined. We can see it as a general intuition:

Centrality quantifies how important vertices (or edges) are in a networked system,

[. . .7]. There are a wide variety of mathematical measures of vertex centrality that

focus on different concepts and definition of what it means to be central in a network. (Newman, 2010)

The use and understanding of centrality are somewhat immediate and intuitive, even if it relies on abstracted mathematical expressions, and on properties of the nodes and of the whole network.

At the node level of analysis, the most widely studied concept is centrality—a fam- ily of node-level properties relating to the structural importance or prominence of a node in the network. (Borgatti et al., 2009, p. 894)

Since the origins of centrality at the end of the 1940s, there has been an important amount of research on that subject. In research, new variations of centrality measures are continuously defined. In fact, the concept itself relies on an intuitive definition, translated into mathematical expressions, but without a unique mathematical definition. The few attempts to build a mathematical centrality theory have failed, and the situation seems inextricable: the attempts at creating consistent axiomatic systems do not recognise most of the centrality measures (Sabidussi, 1966; Boldi and Vigna, 2013).

Some of the centrality indices find their origins in mathematics, like degree and eccentricity (Berge, 1958). In 1869, the mathematician Camille Jordan defines the center of a graph as the

set of vertices minimizing the eccentricity value8, eccentricity being the maximal distance from

a given node to any other node in the network.

In 1950, Alex Bavelas, psychologist at the Massachusetts Institute of Technology, was studying diffusion in small groups of persons. In particular, he showed interest in the structural properties of the groups that were helping to spread information and how they were interfering with the quality of the transmission. For this purpose, he modelled the groups with networks. In a seminal article, Bavelas wrote:

Do some patterns have structural properties that limit group performance? It may be that among several communication patterns, all logically adequate for

7". . . and social network analysts in particular have expended considerable effort studying it."

the successful completion of a specified task, one gives significantly better perfor- mance than another. What effects can pattern, as such, have upon the emergence of leadership, the development of organization, the degree of resistance to group disruption, the ability to adapt successfully to sudden changes in the working environment? (Bavelas, 1950)

This is the origin of the question of centrality that consists of explaining some dimensions of the role of an actor on the basis of their position in the network and of the network’s structure. In this article, Bavelas calls the proposed measure of uncovering these questions "relative centrality". For a given vertex, he computes it by summing the distances between all the vertices and dividing this value by the sum of the distances from the vertex to all others in the network. Mathematically, if d is the distance function, G the network and V the set of its

vertices, the relative centrality cRof node x is9:

cR(x) = P xi,xj∈Vd (xi, xj) P xk∈Vd (x, xk) . (2.1)

The experiments had the participants distributed in predefined configurations, as in figure

2.210(Bavelas, 1948, 1950; Leavitt, 1951). They had to share a piece of information among

Figure 2.2: Figures from (Bavelas, 1950, p. 726). The author chose to study (A) a circle graph, (B) a line graph, (C) a mixed-case graph, and (D) a star graph.

them. In the end, they would indicate who had been the group leader according to their perception. The results obtained by the index of relative centrality regarding which were the best configurations and positions (see figure 2.3, left) were correlated with the observation (see figure 2.3, right).

According to Bavelas:

9Despite being right in expressing Bavelas calculation, this mathematical formula is anachronistic.

10There is a mistake with figure (C). It shows the same network as (D); this is not consistent with figure 2.3, which

2.3. Centrality

Figure 2.3: Figures from (Bavelas, 1950, pp. 726, 728). The relative centrality is computed as in formula 2.1. (Left.) Theoretical values. (Right.) Experimental values. The original caption reads: "Frequency of occurrence of recognized leaders at the different positions in patterns A, B, C, and D" and cites a work by one of Bavelas’ Ph.D. student (Leavitt, 1949).

[. . . ]11 the findings suggested that the individual occupying the most central

position in a pattern was most likely to be recognized as the leader.

Closeness, one of the now-classic measures of centrality, is based on Bavelas’ idea of "relative centrality". To our knowledge, Gert Sabidussi gave the first impulse to a mathematical definition of centrality by defining axioms that centrality indices had to verify (Sabidussi, 1966). Sadly, today his work is primarily remembered for his clear definition of closeness centrality, the rest of the article remaining ahead of its time. In 1971, Anthonisse defined betweenness, a way to measure the control a node has on flows across the network as they are assimilated as shortest paths (Anthonisse, 1971). In 1972, Philip Bonacich defined power centrality (also known as eigenvector centrality), an index based on the eigenvectors of the adjacency matrix (Bonacich, 1972). At the time, centrality had chaotically turned into a catch- all concept housing families of measures based on mathematical expressions (Koschützki et al., 2005a). Centrality measures ranked importance by highlighting specific network features

11The beginning of the sentence reads "No good theory has been formulated for the differences in number of

errors, but [. . . ]". What Bavelas calls "errors" are the mistakes accumulated in the process of transmitting the information among the group participants. This part of Bavelas’ work is out of the scope here.

among a wide range of criteria. Later, Linton C. Freeman formulated a clearer definition for betweenness (Freeman, 1977)—the one still in use today—and soon after proposed a review article on the subject of centrality (Freeman, 1978). He chose three measures—degree, closeness and betweenness—that became widely accepted:

The time has come, it would seem, to stop, take stock and try to make some sense of the concept of centrality and the range and limits of its potential for application. (Freeman, 1978, p. 217)

He introduced centrality as having the characteristic that the maximum centrality of an actor among all possible networks is obtained in the center of a star-graph (see figure 1.1). He also proposed to normalise centrality, because "it might be desirable to have a measure that is independent of network size". Finally, he proposed a global network measure adapted to any centrality index, called centralisation, which shows "the tendency of a single point to be more central than all other points in the network" (Freeman, 1978, p. 227). This was an important step for the then still burgeoning field of social network analysis, but it also imposed in the following years the use of these centrality indices for most of the community. Freeman recalls on the almost immediate acceptance of these centrality measures, including eigenvector centrality:

Beginning in about 1980 then, the measures based on closeness, degree, between- ness and the first eigenvector became standard in social network analysis. All four were widely used in the field. (Freeman, 2008, p. 3)

Subsequently, over a number of years, research on the concept of centrality focused mostly on statistical properties (Moxley and Moxley, 1974; Donninger, 1986; Mizruchi et al., 1986; Bolland, 1988; Valente et al., 2008; Frantz et al., 2009), while some work followed the suggestion of Gert Sabidussi to define a clear graph-theoretical framework consisting of definitions and a system of axioms. In 1980, G. Kishi would follow his footsteps and obtain some new results for the case of directed graphs (Kishi, 1980).

There are many mathematical definitions of centrality measures, which are usually grouped in families of measures. In fact, there are infinite possibilities, since the concept of centrality

is not clearly mathematically defined12, nor unique to each experimental situation, in most

cases. For example, in a recent textbook (Hennig et al., 2012), the authors warn the reader against a blind use of centrality measures, since:

12Just as in social sciences, to some extent. In (Borgatti and Everett, 2006), the authors ask "What do centrality

2.3. Centrality

The relations between radial13, medial14, and feedback15centralities are not fully

understood. (Hennig et al., 2012)

Measures of centrality have been designed to quantify social or psychological phenomena and to provide insights on experiments in these contexts. Sometimes, a new measure must be defined to answer specific research questions. This can be done by modifying attributes, like with the Google’s PageRank algorithm (Page et al., 1999), which is a variation on eigenvector centrality, or by creating a new centrality measure, like in (Tutzauer, 2007), which is based on information theory and is "appropriate for traffic that propagates by transfer and flows along paths".

In spite of the large number of new definitions, sketches of typologies of centrality have been proposed (Ruhnau, 2000; Borgatti, 2005; Koschützki et al., 2005a; Borgatti and Everett, 2006; Koschützki et al., 2005b) and estimates of the efficiency of these indices analysed (Fried- kin, 1991; Valente et al., 2008; Boldi and Vigna, 2013).

However, there is no unified framework that allows us to compare the results of one measure with another. There is no axiomatic system, which would allow to explain centrality on the basis of axioms or constraints, and therefore facilitate the comparison of multiple

networks. Finally, there is simply no universal theory of centrality16. We have to work with

the pieces of theory existing here and there, when bringing together the state-of-the-art and some of our ideas. It is also important to note that Noah E. Friedkin (Friedkin, 1991), Mark E. J. Newman (Newman, 2005) and Frank Tutzauer (Tutzauer, 2007), among many others, have all defined new centrality indices which have the particularity of widening the field of possible applications dealing with theoretical questions, and sometimes filling boxes in the typology proposed by (Borgatti, 2005).

13"All of the measures [that] assess walks that emanate from or terminate with a given node." (Borgatti and

Everett, 2006)

14"[. . . ] centrality measures [. . . ] which are based on the number of walks that pass through given node." (Borgatti

and Everett, 2006)

15For a feedback centrality, the computation of a node centrality depends on its neighbours’, and vice versa.

16Ulrik Brandes et al. are currently working on a consistent theory of centrality that may provide the answer Gert

3

Indexing

A good index is a work of art and science, order and chance, delight and usefulness.

Foreword to (Bell, 2001) A.S. BYATT

The index of a book is a table containing numeric references—occurrences—to words appear- ing in the text. These words generally belong to a single category, like concepts, places, names,

etc1. In the case of an elaborated index, subheadings indicating context are associated with

the entries. The index is found in the pages at the back of a book.

In this work, we use an index to extract information from the text without having to consult it. Many indexes of literary works are available. This fact makes our framework reproducible for both stories and sets of stories, such as the Sagas of Icelanders or the previously cited

Les Rougon-Macquart, which share common characters from one story to another. There

is no extraction process of occurrences other than the one previously executed by editors and indexers and resulting in the published index. In this chapter, we grant importance to the particular case of indexes of characters. Characters from Les Confessions are part of an

autobiography, which is at the intersection of biography and fiction2. Here, we explore both

indexing for biography and fiction (3.2), and indexing for autobiography (3.3).

3.1 Some elements of history

We find proto-indexes in Antiquity already, and even that (Maniez and Maniez, 2009). The origin of their actual form dates back to Middle Ages.

The first subject indexes were "distinctions," collections of "various figurative or symbolic meanings of a noun found in the scriptures" that "are the earliest of all alphabetical tools aside from dictionaries. [. . . ]" Distinctions were biblical tools designed to assist preachers in writing sermons. (Kilgour, 1998, p. 76)

1E.g. (Lemoine, 1985) gives indexes of animal and boat names in the books of George Simenon.

These tables were used in religious contexts, and a whole century was needed for them to evolve to their current form:

By the end of the thirteenth century the practical utility of the subject index is taken for granted by the literate West, no longer solely as an aid for preachers, but also in the disciplines of theology, philosophy, and both kinds of law. [(Goodman, 1990) as cited by (Kilgour, 1998, p. 77)]

The oldest known printed indexes came in the early 1460s (Wellisch, 1986) (see figure 3.1).

From that point, they would be spread to larger audiences3. The history of indexes is fasci-

nating because closely related to the history of both book and literature; however, a deeper discussion of this specific point would be outside of the scope of this work. Detailed accounts can be found in (Bell, 2001; Maniez and Maniez, 2009). The important point here is that indexes have been present for a very long time: they cover all kinds of works in all kinds of genres and eras.

Figure 3.1: First lines of oldest known printed index, for St Augustine’s De rate praedicandi, circa 1466. Figure taken from (Wellisch, 1986).

Documento similar