The models presented previously are devoted to construct networks that have small average shortest path L and high local clustering C, or equivalently, high Eglob and
Eloc using Latora and Marchiori [67] framework metrics.
However, in many real world networks the degree distribution does not follow a bell curve (that for instance characterize the frequency of humans heights), but instead does follows a power law, i.e. P (k) ∼ c · k−γ where c is a constant and γ is
a positive exponent that empirically varies between two and three. The reason why the exponent fits in that range is still unknown to network scientists and it remains an open question since the firsts discoveries on networks science. Having a P (k) that has a decaying tail in the power law means that the vast majority of nodes have low degree and that there exist few nodes, the so-called hubs, that have an extremely high connectivity.
Even though one might expect a limited influence of hubs in the overall develop- ment and life of the networks because of the small cardinality, they play a fundamental role in the evolution, robustness and connectivity of the entire networks. For instance, in biology, hubs can represent genes that identify functional modules and whose re- moval can be the cause of specific diseases. Hubs are also very important in brain networks and represent functional areas that are anatomically highly connected with neighborhood. Their importance stems from the possibility of use the degree rank order in normal brain as an indicator of the first symptoms of diseases [3]. These special nodes are not captured by both random and small-world models [86][95] (see figure 2.4).
Power laws are not new in the literature. Pareto, back in 1900s, found that people’s income is well approximated by a function that has long decaying tails. In other words, power laws guarantee that rare events, such as people with very high income, people that have many friends, popular web pages compared to the less popular ones, the most cited papers, or the earthquakes with high magnitude[74][13], have positive probability to happen.
Such networks have been named scale-free [7], because power-laws have the prop- erty of having the same functional form at all scales. In fact, power-laws are the only functional form f (x) that remains unchanged, apart from a multiplicative factor, un- der a rescaling of the independent variable x, being the only solution to the equation
32 CHAPTER 2. BACKGROUND 0 5 10 15 20 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 k P(k) Random (a) 100 101 102 103 10−5 10−4 10−3 10−2 10−1 100 k P(k) scale−free (b)
Figure 2.4: Example of random (left) and scale-free (right) degree distributions on linear and log-log scale respectively. One of the networks whose node distribution linkage follows a bell-shape curve is the U.S. highway system[9].
f (αx) = βf (x). Power-laws have a particular role not only in complex system field but also in statistical physics because of their connections to phase transitions [99] and fractals [45].
When working with real networks, it may happen that the data have a rather strong intrinsic noise due to the finiteness of the sampling. Therefore, when the system size is small and the degree distribution P (k) is heavy-tailed, it is sometimes advisable [86] to measure the cumulative degree distribution Pcum(k) =P∞k′=kP (k′).
Indeed, when summing up the original distribution P (k), the statistical fluctuations generally present in the tails of the distribution will be smoothed. Consequently the exponent γ of P (k) ∼ k−γ can be obtained from P
cum(k) as one plus the slope of
Pcum(k) in a log-log plot, i.e., γ = 1 + γcum.
There are two types of scale-free models available in the literature: the first one that creates static scale-free networks and the second that creates evolving scale-free networks. The former is simply generated as a special case of random graphs with a given degree distribution. A model that belongs to this category is for instance the so-called fitness model [27]; it starts from n isolated nodes, and associates at every node i a fitness ηi, which is a real number taken from a fitness distribution ρ(η). For
each couple of nodes, i and j, a link is drawn with a probability f (ηi, ηj), with f
being a symmetric function of its arguments. The model generates power-law P (k) for various fitness distributions and attaching rules, while it gives ER random graph if f (ηi, ηj) = p for each i, j.
Conversely, in the evolving scale-free category, the growth process that determines the structural properties of the network is taken into account. The Barab´asi-Albert (BA)[15] network growth model was inspired from the formation of the World Wide
2.2. NETWORK MODELING 33
Figure 2.5: Example of noise reduction in the data calculating cumulative degree distribution Pcum(x) [86]. A set of one million random numbers, power-law distributed, with scaling
exponent γ = 2.5 is considered (a). Plot of the original data (b). Same histogram on logarithmic scale. Note the noise in the tail of the curve (c). A histogram with logarithmic binning (d). A cumulative histogram of the same data. This curve follows a power law, but the scaling exponent of the original curve can be obtained as the exponent of Pcum(x) minus
34 CHAPTER 2. BACKGROUND
Web and it is based on two basic ingredients: growth and preferential attachment. The basic idea is that in the WWW, web sites with high popularity (high degree) acquire new hyper links at higher rate than unpopular web sites (low-degree). More precisely, an undirected graph is constructed as follows: starting with m0 isolated
nodes, at each time step t = 1, 2, 3, ..., n − m0 a new node j with h ≤ m0 links is
added to the network. The probability that a link will connect j to an existing node i is linearly proportional to the actual degree of i:
Y
j→i
=Pki
lkl
.
As every new node has h links, the network at time t will have n = m0+ t nodes
and m = h · t links, corresponding to an average degree hki = 2m for large times. The Barab´asi-Albert model has been solved in the mean-field approximation and, exactly, by means of rate equation [64] and master equation approaches. In the limit t ∼ ∞, the model produces a degree distribution P (k) ∼ k−γ, with an exponent γ = 3.
On the contrary, the case of a growing network with a constant attachment probability Q
j→i= 1/(m0+ t − 1) produces a degree distribution P (k) = e/h · exp(−k/h). This
implies that the preferential attachment is an essential ingredient of the model. The BA model shares many similarities with the model developed by Price [38] in 1976. The author’s theory explains the power laws in citation networks he found out a decade before. In Price’s model, the probability that a new published paper cites a previous one is taken to be proportional to kin+ 1, where kinis the number of times
that the paper has already been cited. Price’s model is a reformulation, in terms of network growth, of a model developed by Simon in 1955 to explain the power laws appearing in a wide range of empirical data, as well as in the distribution of words in prose samples by their frequency of occurrence, or in the distributions of cities by population. Here, we simply mention that the model differs from the BA model in two main aspects: on one hand, it builds a directed graph and, on the other hand, the number of edges added with each new node is not a fixed quantity.