• No se han encontrado resultados

Configuration This experiment evaluated the effectiveness of Sargon using BFS and SSSP on the com-orkut dataset, when it was partitioned across three 20-core compute nodes (one partition per core). Note that Sargon partitioned the graph along with a trace of a single BFS/SSSP execution on the graph with one randomly selected source vertex. Given the long execution time of BFS and SSSP on the dataset we grouped multiple messages sent by a single MPI rank (process) to the same destination into a single one.

Results (Table 5.2) Table5.2 presents the BFS and SSSP execution time on the dataset with 100 randomly selected source vertices and different message grouping sizes. As shown, even though the partitionings computed by Sargon had higher edgecut than that of Metis

Table 5.2: BFS and SSSP execution time in seconds on com-orkut dataset with varying message grouping size

Workloads BFS SSSP

Message Grouping Size 64 128 256 64 128 256

Metis 1459 260 73.24 30,784 6,152 727

LDG 2027 396 116 37,418 5,293 870

reLDG 1114 317 83.26 27,099 2,921 677

Sargon 857 238 63.28 21,643 2,426 431

and reLDG (Section 5.4.2), Sargon consistently outperformed LDG and reLDG thanks

to its capability of avoiding algorithmic and structural skewness. In comparison to Metis, LDG, and reLDG, Sargon speeded up the execution of BFS and SSSP by up to 2.36 and 2.53 times, respectively.

We also noticed both Metis and reLDG performed better than LDG in most cases. This was probably because Metis and reLDG produced decompositions of lower edgecut than LDG. What we did not expect was that Metis was outperformed by reLDG in many cases even though its decompositions had lower edgecut. We attributed this to the fact that the decompositions computed by Metis had highly skewed active (high-degree) vertex distribution across partitions (Section 5.1.1).

Interestingly, for the BFS execution, reLDG outperformed Metis only if the message grouping size was small enough (when the message grouping size equaled 64). This was be- cause the smaller the message grouping size was the more the messages were communicated, which in turn put more contention on the network interface and memory subsystems and therefore exacerbated the performance impact of skewness. This was further confirmed by the observation that the smaller the message grouping size was, the longer the execution of BFS/SSSP took.

The reason why reLDG was always better than Metis for the execution of SSSP in the experiment was because the execution of SSSP required more data communication than that of BFS. Consequently, in spite of the increasing message grouping size, there would

still be a large number of message exchanges, calling for skew-resistant graph partitioners to avoid both the network and memory contention. This also indicates that Sargon is more suitable for workloads with a large number of small message exchanges and larger graphs. The latter was attributed to the fact that as the size of the graph increased, the amount of data communication would also increase regardless of the message grouping size.

5.4.4 Scalability Study

5.4.4.1 Scalability in terms of Graph Size

Configuration This experiment investigated the scalability of Sargon as the size of the graph increased. Towards this, we first generated six additional datasets by sampling the edge set of the Friendster and Twitter dataset. Then, we examined the BFS execution time on the datasets when they were partitioned across three 20-core machines (with 10 randomly selected source vertices and message grouping size of 512). Note that Metis failed to partition the datasets.

Table 5.3: BFS execution time in seconds with 10 randomly selected source vertices on varying sized graphs

Dataset Friendster Twitter

# of Edges (Billion) 0.9 1.8 2.7 3.6 0.98 1.96 2.94 3.92 LDG 34.01 158 623 1,239 45.65 460 1,092 2,219 reLDG 34.24 132 480 1,171 54.91 403 1,217 2,499 Sargon 26.96 137 392 933 38.53 275 924 1,982

Results (Table 5.3) Table 5.3 shows the corresponding BFS execution time on varying sized graphs. As can be seen, Sargon outperformed LDG and reLDG in almost all the cases. In comparison to LDG and reLDG, Sargon speeded up the execution of BFS by up to 1.67 and 1.46 times, respectively. The speedup remained quite stable regardless of the increasing graph size.

Interestingly, we noticed that reLDG was outperformed by LDG in many cases, es- pecially on the execution of BFS on the Twitter dataset, even though the decompositions computed by reLDG had lower edgecut. This was probably because reLDG tended to

produce decompositions of higher skewness than those of LDG (Section 5.1.1). The fact that the Twitter dataset had higher average vertex degree and higher variation in its ver- tex degree distribution than that of Friendster dataset further aggravated the performance impact of the skewness.

5.4.4.2 Scalability in terms of # of Partitions

Configuration This experiment inspected the effectiveness of Sargon as the number of partitions increased. Towards this, we first partitioned the original Friendster and Twitter datasets across three up to ten 20-core machines (one partition per core) and then examined the BFS execution time on the partitionings (with 10 randomly selected source vertices and message grouping size of 512).

Table 5.4: BFS execution time in seconds with 10 randomly selected source vertices on varying number of partitions

Datasets Friendster Twitter

# of Partitions LDG reLDG Sargon LDG reLDG Sargon

60 1,239 1,171 933 2,219 2,499 1,982 80 444 318 285 973 771 706 100 148 189 126 264 258 230 120 103 103 71.48 133 172 127 140 85.27 127 69.36 150 147 117 160 58.39 59.32 57.72 70.64 83.30 91.27 180 48.53 54.24 40.00 50.75 54.69 48.84 200 40.35 32.95 34.21 56.48 61.24 44.21

Results (Table 5.4) Table5.4shows the corresponding results as the number of partitions increased. As shown, Sargon performed better than LDG and reLDG in almost all the cases in spite of the increasing number of partitions. When compared with LDG and reLDG, Sargon speeded up the execution of BFS by up to 1.55 and 1.49 times, respectively. Consistent with our previous observations, reLDG was better than LDG in many cases. However, it did got beat by LDG in some cases, further highlighting the importance of skew-awareness. The reason why the improvement achieved by Sargon gradually become

smaller was because as the number of partitions increased the impact of skewness was also mitigated due to the reduced work per core (partition). However, the improvement was still non-negligible, since it reduced the execution time of all the computing elements (60 up to 200 cores) by this much.

5.5 CHAPTER SUMMARY

In this chapter, we introduced the multi-label graph partitioning problem and an application of such idea to avoid the skewness of traversal-style graph workloads by being aware of the characteristics of the target workload and the structure of the graph. We also demonstrated the effectiveness and scalability of our proposed solution, Sargon, on many real-world graphs of varying sizes (up to 3.9 billion edges) and varying number of partitions.