2. Fundamentación teórica de la investigación
2.2. Marco conceptual
2.2.3. Características de los estudios recopilados
The optimisations to the BASIC2−DYNAMIC−k−CORE algorithm of
Subsection 4.4.3 can insulate higher k-cores from any edge additions that are between nodes whose degree is lower than the higher k-core. However, those optimisations can not insulate higher k-cores from edge additions between a lower and higher k-core, even when that edge addition would not ultimately change the kmax of the higher k-core.
For graphs with scale-free, power-law decay degree distributions, such as the Internet, edges are strongly biased toward connecting together a small number of
4.4 Distributed k-cores on Dynamic Graphs 83
very high-degree nodes with the low-degree nodes of the rest of the graph. This can occur where growth is such that low-degree, lower k-core vertices connect preferentially to the high-degree vertices in the higher k-cores. A high proportion of the edges in such graphs will therefore be incident on vertices in the higher k-cores. The optimisations to the dynamic, distributed k-core algorithm of Subsection 4.4.3 would not apply where the lower-degree is higher than the higher-k-core, and so the BASIC2−DYNAMIC−k−CORE algorithm of Algorithm 4.9 on page 79 generally would not be much more efficient than the static algorithm on such graphs.
E.g., the nodes in the 201306 UCLA IRL Topology AS graph data-set have a maximum kmax of 85, however Figure 2.8 on page 24 shows that 63% of edges in this graph are incident on nodes with a degree at least as high as this. The optimisations in the previous section could have no benefit whenever such edges were added. Further, there will still be many additional edges where the lower degree of the 2 ASes is higher than the kmax of the other AS of that edge, even if that kmax is much lower than the maximum kmax in the graph.
Figure 4.3: Example graph. As an example, consider again the
toy example graph of Figure 4.3 and what would happen if an edge is added from E, a low-degree node, to C, a higher-k-core node. Assume
BASIC2−DYNAMIC−k−CORE has been running and is converged prior to the addition of the edge, and so has state identical to that in Table 4.1. C
must respond to the edge addition and update its generation count and reset its kbound to its degree and broadcast this, as does E. As C is in the highest-kcore, the increase of its generation will causes all other nodes to have to send updates at least once, if only to update their own generation.
Round A B C D E F G Edge is added between C and E
3 <2,5> <2,2>
4 <2,3> <2,3> <2,3> <2,2> <2,1>
5 <2,3> <2,1>
Table 4.4: BASIC2−DYNAMIC−k−CORE on the example graph of Figure 4.2 on page 76, once an edge is added between E and C.
In total, 9 messages are sent over 3 rounds, as a result of the edge addition. The static algorithm, if run from scratch with this new edge, would have sent only 8 messages over 2 rounds (F no longer has to send a 2nd message, as its degree is now
its kmax). Yet E is the only node whose kmax ultimately changes, and E’s degree is such it could never have affected C’s membership. However C does not initially
have sufficient information about E to allow C to safely ignore the edge addition. This may be optimised by adding a message type to exchange generation and
degree information between vertices when an edge is added, as in
DYNAMIC−k−CORE of Algorithm 4.13. Increasing the local generation count, recalculating the GENKBOUND and broadcasting the new (g, k) can then be deferred to the point when the degree information message is received from a new neighbour. Further, it means a vertex with a GENKBOUND higher than the received neighbour’s degree can confidently suppress the regeneration of the algorithm that would otherwise be carried out.
Neighbours still need a mechanism to synchronise their generation counts. In the BASIC−DYNAMIC−k−CORE algorithm generation synchronisation is guaranteed because neighbours always latch onto higher generation counts in received
messages and rebroadcast, ensuring the entire network must converge on the highest generation. In the BASIC2−DYNAMIC−k−CORE algorithm, nodes always latch onto higher generation counts in messages received from equal or higher GENKBOUND neighbours, and it is guaranteed that lower k-cores, and only lower
k-cores, will have generation counts at least as high as those of higher k-cores thus
allowing nodes in higher k-cores to include higher-generation / lower-kbound messages in their consideration. This latter condition must still be preserved. Generation synchronisation is guaranteed in DYNAMIC−k−CORE by ensuring that at least one recipient of the “DEGREE” message will latch onto the generation of the other side . Further, if only one side latches, then DYNAMIC−k−CORE
guarantees it must be the side in a lower k-core and that the other side can safely ignore the message at that point in the execution of the distributed algorithm. Where only one side latches, the side that ignored the “DEGREE” message will still later receive “KBOUND” messages, and the algorithm will then proceed as in BASIC2−DYNAMIC−k−CORE.
These modifications are shown in DYNAMIC−k−CORE in Algorithm 4.13 on page 86. When an edge is added, a new h“DEGREE”, g, di message is sent directly to the new neighbour, containing the local generation and degree. This allows the remaining work of handling the new edge to be deferred until this “DEGREE” message is received. Knowing the degree of the new neighbour allows the local node to ignore it, and suppress a disruptive generation update, if it is clear the new neighbour’s degree can not affect the local node.
Note that as Sv stores the last message from v, in DYNAMIC−k−CORE this now
includes the message type information. The GENPICK process must be modified to use this type information, to always use the neighbour’s degree information,
4.4 Distributed k-cores on Dynamic Graphs 85
message. For the neighbour’s degree can never exceed its kmax, and so it is always safe to use this, regardless of the generation. This is shown in OPT_GENPICK in Algorithm 4.14. The UPDATE process must also be updated, to send “KBOUND” as the message type, as shown in OPT_UPDATE in Algorithm 4.15.
Algorithm 4.13 DYNAMIC−k−CORE: Distributed k-core optimised for dynamic
graphs, with higher k-cores fully insulated from irrelevant changes in lower k-cores. Changes from the BASIC2−DYNAMIC−k−CORE of Algorithm 4.9 are highlighted.
1: procedure DYNAMIC−k−CORE 2: select ← OPT_GENPICK 3: kbound ← OPT_GENKBOUND 4: update ← OPT_UPDATE 5: for all v ∈ N do 6: Sv ← (1, |N |) 7: end for 8: g ← 1 9: k ← kbound(g,S,select) 10: loop 11: dispatch event
12: handle event: edge to neighbour v has been added 13: send(v, h“DEGREE”, g, |N |i)
14: end handle
15: handle event: edge to neighbour v is removed 16: remove Sv from S
17: (g, k) ← update(g,g, k,S,kbound, select)
18: end handle
19: handle event: message of type “DEGREE” from any v ∈ N
20: gprev ← g
21: (t, g0, d) ← Sv ← receive(v)
22: if d ≥ k then
23: g ← max(g0, g + 1)
24: (g, k) ← update(g,gprev,k,S,kbound, select)
25: end if
26: end handle
27: handle event: message of type “KBOUND” from any v ∈ N
28: gprev ← g
29: (t, g0, k0) ← Sv ← receive(v)
30: if g0 > g and k0 >= k then
31: g ← g0
32: end if
33: (g, k) ← update(g,gprev,k,S,kbound, select)
34: end handle
35: end dispatch
36: end loop 37: end procedure
4.4 Distributed k-cores on Dynamic Graphs 87
Algorithm 4.14 OPT_GENPICK: Pick the correct value to use for the given
neighbour message. Modified to consider the message type. The changes from the GENPICK2 function of Algorithm 4.10 are highlighted.
1: procedure OPT_GENPICK(g,d,s) 2: (t, g, v) ← s 3: if t = “DEGREE” then 4: return v 5: end if 6: // t must be “KBOUND”now 7: if g0 >= g then 8: return k 9: end if 10: return d 11: end procedure
Algorithm 4.15 OPT_UPDATE: Update process, modified to include the required
message type. The change from UPDATE2 of Algorithm 4.12 is highlighted.
1: function OPT_UPDATE(gcur,gprev, kprev, S,kbound, select)
2: (g, k) ← kbound(gcur,S,select)
3: if kprev 6= k or gprev 6= g then
4: broadcast(h“KBOUND”, g, ki) 5: end if
6: return (g, k) 7: end function
The effect of this optimisation is to eliminate the full generation update and effective reset of the distributed algorithm, when an edge is added between a high
k-core node and a lower-degree node. To return to the example graph of Figure 4.2
on page 76, the optimised DYNAMIC−k−CORE algorithm would behave as shown in Table 4.5. There is an initial overhead in that every node must send a
“DEGREE” message to each of their neighbours. After which, the algorithm can behave no worse than the static algorithm, on a static graph.
When the edge between C and E is added, with the DYNAMIC−k−CORE
algorithm they exchange “DEGREE” messages. Note that these are sent unicast, directly to each other, unlike the other messages which are broadcast to all neighbours. This allows C to ignore E, and so avoids messages having to traverse the entire graph. The DYNAMIC−k−CORE algorithm responds to the new C-E edge with just 3 messages, between only 2 nodes, over 2 rounds. In contrast to the BASIC2−DYNAMIC−k−CORE algorithm, with the DYNAMIC−k−CORE algorithm the remainder of the network is not disturbed at all by this edge addition. A significant saving.
Round A B C D E F G
1 <D,1,3> <D,1,3> <D,1,4> <D,1,4> <D,1,1> <D,1,2> <D,1,1>
2 <K,2,3> <K,2,3> <K,2,3> <K,2,3> <K,2,1> <K,2,1> <K,2,1>
Edge is added between C and D
3 <D,2,5> <D,2,2>
4 <K,3,2>
Table 4.5: DYNAMIC−k−CORE on the example graph of Figure 4.2 on page 76, where an edge is added between E and C. The 2 highlighted messages are sent only to a single neighbour, rather than broadcast to all neighbours.
for an efficient distributed, k-core algorithm, even on dynamic, power-law degree distribution graphs, such as the Internet. This hypothesis will be tested in the next sub-section.