• No se han encontrado resultados

FUNDAMENTOS TEORICOS DEL ESTUDIO.

A) Imparcialidad y objetividad.

The Graphlet Correlation Matrix is a new network statistic that encodes the topology of a network using the Spearman’s Correlation Coefficients among various node properties contained in graphlet degrees, over all nodes. Given a network G(V, E), first we compute graphlet degree vectors of all nodes, v2V, and construct a matrix where each row represents the graphlet degree vector of a node,GDV(v). We exploit the existence of dependencies between orbits by computing the Spearman’s correlation coefficient among all pairs of orbits (i.e., among all columns of the matrix of graphlet degree vectors) and present them in an⇥nsymmetric matrix that we name as theGraphlet Correlation Matrix of network, GCMG. Graphlet correlation matrices can be defined using di↵erent sets of orbits. We focus on two particular orbit sets in our experiments: (1) 11 non-redundant orbits of 2- to 4-node graphlets (illustrated in Figure 2.3), (2) the complete set of 73 orbits of 2- to 5-node

graphlets (illustrated in Figure 1.4). In this way, we can encode the topology of a network of any size into annnsymmetric matrix with values in the interval [ 1,1], wherenis the number of orbits that are used for computing theGCM. Graphlet Correlation Matrix computation is illustrated in Figure 2.5 on a random geometric graph with 500 nodes and 1% edge density.

Networks that have di↵erent topologies are expected to have di↵erent graphlet correlation matrices. For example, Figure 2.6 illustrate the graph- let correlation matrices of four di↵erent networks: a scale-free network that is generated by the preferential attachment (i.e., Barab`asi-Albert) model, a network generated by the geometric random network model, the world trade network of 2010, and the human metabolic network. In agreement with known properties of scale-free Barab´asi-Albert (SF-BA) networks, or- bits 0, 2, 5, and 7, which are characteristic to existence of hubs, form a cluster of dependent orbits with their correlation coefficients being close to 1 (Figure 2.6–A). Orbits 10 and 11, which are characteristic to existence of clustering “near” hubs, also form a cluster of correlated orbits. Finally, orbits 1, 4, 6, and 9, which are characteristic to existence of a large num- ber of degree 1 nodes, are dependent as well. The picture is quite di↵erent for geometric random graphs (GEO) of the same size, which have Poisson degree distributions, and hence the structure is not dominated by a large fraction of degree 1 nodes and a small number of hubs (Figure 2.6–B).

Uncovering orbit dependencies in real-world networks is much more inter- esting, since they can reveal currently unknown organizational principles of these networks. Indeed, the world-trade network of 2010 [34] contains two large clusters of dependent orbits,{0,2,5,7,8,10,11} and{6,9,4,1}, while there is anti-correlation between orbits{4,6,9}and orbits{0,2,5,7,8,10,11} (Figure 2.6-C). Investigating the implications of this, we notice that orbits 4,6 and 9 correspond toperipheral, degree 1 nodes that are “hanging” from graphlets G3, G4 and G6 (Figure 2.3), while members of the large cluster

of correlated orbits,{0,2,5,7,8,10,11}, correspond to higher degree, either clustered (in a dense neighbourhood), or broker-type (mediators between nodes that are not directly interacting) orbits. Since these two clusters are anti-correlated, we can conclude that countries are either clustered/bro- kers, or on the periphery of the world trade [44], but not both. Hence, GCM unveils a hidden structure of this network that can be further interpreted qualitatively: through further analysis presented below, we interpret this

94 21 291 256 93 290 491 136 223 9 281 116 359 245 25 122 472 433 68 169 376 469 181 173 263410 129 315 204 389 187 492 184 124 242 90330 120 81 407 260 380 379 375 306 35 248 343 450 237 49 83 4 91 349 357 96 38 14 323 198 489 293 141 458 165 316 58 419 130 220 222 105 103 482 197 372 109 200 490 175 369 119 481 307 214 271 412 189 451 344 243 434 265336 409 384 370 483 15 348 445 262 404 396 267 234 194 301 42 495 51 447 378 288 121 221 117 99 162 302 298 321 79 24 272 399 215 177 383 170 29 386 429 371 64 401 53 395 32 253 207 62 415 174 361 132 414 337 289 18 110 12 54 252 478 282 216 168 126 201 227 403 228 8 26 313 392 440 354 151 452 183 244 303 258 377 138 362 45 387 164 144 139 87 397 210 497 406 33 317 250 477 5 240 413 203 443 190 73 346 6 460 459205 148 113 335 84 46 112 368 16 382 442 471 128 274 95 39 334 484 418 279 353 86 27 254 479 318 192 55 23 230 261 30 416 284 60 270 213 20 332 283 277 180 188 229 107 428 159 351 36 56 356 88 366 172 338 150 75 340 11 461 186 388 101 385 104 327 224 140 310 299 314 339 292 342 365 206 28 247 160 411 438 98 178 225 417 300 70 199 161 439 59 52 324 320 363 100 114 465 77 152 449 171 65 143 441 115 191 232 462 259 92 2 311 287 485 127393 48040 67 102 179 239 295 123 352 468 212 487 322 448 106 66 297 111166 146408 63 61 367 286 436 331 308 476 195 69 209 211 133 218 80 255 125 456 264 43 486 296 31 402 147 193 493 345 420 358 157 364 431 76 496 208 182 235 97 142 435 3 350 454 236 457 217 34 437 473 13 89 231 464 82 475 48 41 108 251 57 427 425 278 72 7 137 400 268 266 238 373 312 154 1 257 19 333 50 394 466 22 280 17 167 273 329 309 347 405 276 463 249 391 390 424 319 153 155 430 226 156 10 37 325 432 202 74 341 422 360 135 426 488 118 149275 131 453 78 355 305 158 32838147 467 246 85 423 444 494 163 474 71 446

Figure 2.5: Graphlet Correlation Matrix computation is illustrated on a ge- ometric network G with 500 nodes and 1% edge density (the network on the left). In the matrix of graphlet degree vectors (shown on the left), each row represents the graphlet degree vec- tor of a node, and each column contains the graphlet degrees of all nodes for orbit i, di

G. The graphlet degrees of orbits 0 and 1, d0G and d1G are highlighted in red. The graphlet correlation between orbits iand j, GCMG[i, j], is the Spearman’s correla- tion coefficient betweendi

Gand djG. Computing theGCMG[i, j] for all pairs of orbits, we obtain the symmetric graphlet correla- tion matrix ofG,GCMG. The rows and columns of theGCMG are ordered based on the correlation similarities of orbits for visualising the orbit clustering patterns better.

(A) (B)

(C) (D)

Figure 2.6: Graphlet Correlation Matrices (GCMs) of di↵erent types of net- works: Panel A – a scale-free Barab`asi-Albert (SF-BA) network with 500 nodes and 1% edge-density; Panel B – a geometric random network (GEO) with 500 nodes and 1% edge-density; Panel C – the world trade network of 2010; and Panel D – the human metabolic network. The rows and columns of the GCMs are ordered based on the correlation similarities of orbits for visualising the orbit clustering patterns better.

observation on 49 world trade networks corresponding to trade data from 1962 to 2010. In contrast, the topology of the human metabolic network [98] is very di↵erent from the topology of world trade networks: the correlations between all orbits are high, indicating that constituent bio-molecules can be at the same time both peripheral and clustered/broker (Figure 2.6-D).

It is possible that a graphlet does not appear in a network. When this is the case, graphlet degrees of all nodes are equal to 0 for the corresponding orbits. Since the graphlet degrees are constant for all nodes, Spearman’s Correlation coefficient cannot be computed for these orbits. To overcome this problem, we include a dummy graphlet degree vector, [1,1, ...,1], into the matrix of graphlet degree vectors. This small amount of noise resolves the Spearman’s correlation coefficient computation problem. As a result, the problematic orbits correlate perfectly (having Spearman’s correlation coefficients of 1) while these orbits do not correlate with the rest of the non-zero orbits (having Spearman’s correlation coefficients close to 0).

The graphlet degrees of di↵erent orbits do not scale within the same in- tervals, due to the di↵erences in the search spaces of orbits. For example, graphlet degree of orbit 15 searches up to 4th neighbourhood of a node, while graphlet degree for orbit 7 is only dependent on the 1st neighbour- hood, which causes the graphlet degrees of orbit 15 to span at a wider range. The graphlet degree ranges might even di↵er for orbits that search the same distance neighbourhoods, since the chances of each graphlet’s appearance are not distributed evenly and depend on the density of the network. Due to the di↵erences in the graphlet degree scales, a ranking based correla- tion coefficient that measures monotonic correlations between orbits (i.e., Spearman’s Correlation Coefficient) is preferable over a correlation coef- ficient that measures the linear correlations among graphlet degrees (i.e., Pearson’s Correlation Coefficient) for measuring the correlation between the graphlet degrees of di↵erent orbits. This is the reason for us to de- fine the Graphlet Correlation Matrices based on Spearman’s Correlation Coefficients rather than any other correlation coefficients.

Documento similar