CAPÍTULO II Descripción de la zona de estudio
2.1. AREA DE REFERENCIA
2.1.1. Aspectos generales de la industria del turismo en el área.
Seeking a vertex partition that optimizes, or approximately optimizes, an appropriate score function is a standard approach to single layer community detection (e.g. (Newman, 2006b; Wang and Wong, 1987; Chung, 1997); see Section 1.2.3 for a complete discussion). Rather than scoring a partition of the available network, Multilayer Extraction makes use of a significance based score that is applicable to individual vertex-layer sets. In the following sections, we describe the multilayer null model, and then the proposed score. First, we define some notation that will be used exclusively for this chapter and Appendix C. Let G(m, n) = ([n],[m],(E1, . . . , Em)) be an observed (m, n)- multilayer network, with [n] = 1, . . . , n the node set, [m] = 1, . . . , mthe layer set, and E` the edge set of layer `∈[m]. For each layer`∈[m] and pair of vertices u, v∈[n], let
indicate the presence or absence of an edge betweenu andv in layer`ofG(m, n). Thedegreeof a vertex u∈[n] in layer `, denoted byd`(u), is the number of edges incident onuinG`. Formally,
d`(u) =
X
v∈[n]
x`(u, v).
The degree sequenceof layer`is the vector d`= (d`(1), . . . , d`(n)) of degrees in that layer; the full degree set of G(m, n) is the list d = (d1, . . . ,dm) containing the degree sequence of each layer in the network. Define as dT ,` :=
P
u∈[n]d`(u) the total degree in layer `. In this section, a partition vectorc retains its meaning from Chapters 1-3, but we denote its elements by cu instead ofc(u).
4.1.1 The Null Model
Our significance-based score for vertex-layer sets in multilayer networks is based on comparing observed graph statistics with a null distribution, as in the NST framework (Chapter 2). The null model we use for this algorithm is best described as a multi-layer configuration model, where each layer of a random network is generated (independently) according to the single-layer configuration model. The single-layer configuration model was described in detail in Section 1.2.2. We give a brief reminder of the model, using the notation from this setting. LetG(m, n) denote the family of all (m, n)-multilayer networks. Given the degree sequenced of the observed networkG(m, n), we define a multilayer configuration model and an associated probability measure Pd on G(m, n), as follows. In layerG1, each node is given d1(u) half-edges. Pairs of these half-edges are then chosen uniformly at random, to form edges until all half-edges are exhausted (disallowing self-loops and multiple edges). This process is done for every subsequent layer G2, . . . , Gm independently, using the corresponding degree sequence from each layer.
Under the above null model, each layer is generated according to the Molloy and Reed (1995) algorithm for the single-layer configuration model Bollob´as (1980); Bender (1974). The probability of an edge between nodesuandvin layer`depends only on the degree sequenced` of the observed graphG`. The distributionPdhas two complementary properties that make it useful for identifying communities in an observed multilayer network: (i) it preserves the degree structure of the observed network; and (ii) subject to this restriction, edges are assigned at random. As discussed in 1.2.2,
these characteristics have caused configuration model to have long been taken as the appropriate null model against which to judge the quality of a proposed community partition.
The configuration model is the null model which motivates the modularity score of a partition in a network (Newman, 2004b, 2006b). Recall the modularity score, discussed in Section 1.2.3:
Q(c) := 1 dT X u,v∈[n] x(u, v)−d(u)d(v) dT 1(c(u) =c(v)) (4.1)
A brief reminder of the motivation for this score is as follows. Above, the ratio d(u)d(v)/dT is the approximate expected number of edges between u and v under the configuration model. If the partition c represents communities with a large observed intra-edge count relative to what is expected under the configuration model, it receives a high modularity score. The identification of the communities that (approximately) maximize the modularity of a partition is among the most common techniques for community detection in networks.
4.1.2 Multilayer Extraction Score
Rather than scoring a partition, the Multilayer Extraction method scores individual vertex-layer sets. We define a multilayer node score that is based on the single-layer modularity score (4.1) and amenable to iterative maximization. We first define a localsetmodularity for a collection of vertices
B ⊆[n] in the layer `∈[m]: Q`(B) := 1 n |B|2 1/2 X u,v∈B:u<v xl(u, v)− d`(u)d`(v) dT,` (4.2)
The scaling term in the equation above is related to the total number of vertices in the network and the total number of possible edges between the vertices in B. This score is one version of the various set-modularities considered in Fasino and Tudisco (2016), and is reminiscent of the local modularity score introduced in Clauset et al. (2004).
The Multilayer Extraction procedure seeks communities that are assortative across layers, in the sense that Q`(B) is large and positive for each `∈L. In light of this, we define themultilayer set score as H(B, L) := 1 |L| X Q`(B)+ !2 , (4.3)
where Q+ denotes the positive part of Q. Generally speaking, the score acts as a yardstick with which one can measure the connection strength of a vertex-layer set. Large values of the score signify densely connected communities.
We note that the multilayer scoreH(B, L) is reminiscent of a chi-squared test-statistic computed from |L|samples. That is, under appropriate regularity assumptions on Q`(B), the score in (4.3) will be approximately chi-squared with one degree of freedom.