METODO Participantes
RESULTADOS ANALISIS DESCRIPTIVO
Pregel[97] is a runtime developed for processing large graphs in which the programs are expressed as a sequence of iterations. A user defined function is evaluated at each vertex of the graph during an iteration, and between iterations, the vertices send messages to each other. The authors describe the programming model as follows:
“Pregel computations consist of a sequence of iterations, called supersteps. During a superstep the framework invokes a user defined function for each vertex, conceptually in parallel. The function specifies behavior at a single vertex V and a single superstep S. It can read messages sent to V in superstep S + 1, send messages to other vertices that will be received at superstep S + 1, and modify the state of V and its outgoing edges. Messages are typically sent along outgoing edges, but a message may be sent to any vertex whose identifier is known.”
Although the programming model of Pregel is different than Twister’s MapReduce based programming model, there exist some similarities between the two runtimes. Most notably, unlike other MapReduce runtimes such as Hadoop and Dryad, both Twister and Pregel use long running tasks. A Vertex in Pregel holds a user defined value corresponding to the node of the graph that it represents, and it keeps changing this value depending on the computation performed by the user defined function, which is executed at each vertex. Twister also uses long running map/reduce tasks, and it supports configuring them with any static data once per computation; this possibility allows for the elimination of the need to re-load static data in each iteration. Although the functional view of MapReduce does not encourage the use of stateful
map/reduce tasks, as we have shown in the Fox matrix multiplication described in section 6.9, one can use stateful map/reduce tasks in Twister to implement complex applications.
121
The possibility in supporting fault tolerance easily is one of the key benefits of the MapReduce programming model. However, with the use of stateful tasks, this possibility will no longer be in effect, because the tasks cannot be re-executed without losing their current state. The runtime needs to be able to preserve the state of every task in order to recover from failures. Furthermore, the runtime cannot simply save the current state of tasks to the local disks of the computers where they are executed, because a disk failure could result in a complete re-execution of the entire program. In typical MapReduce, a disk failure could results in the re-execution of the failed tasks in order to produce the missing intermediate data, however with stateful, tasks this proves impossible. Therefore, the task state must be preserved in a fault tolerant distributed file system such as GFS or HDFS. From the Pregel paper, it is not clear which mechanism the system uses to save the state of the vertices in every super-step. However, it could most likely be stored in the Google File System so as to support fault tolerance. Serializing the entire graph to a distributed file system in each iteration is a costly checkpointing mechanism; therefore, we believe that a checkpoint at every few iterations will be a more practical approach. Currently, we do not support fault tolerance for stateful map/reduce tasks in Twister, as it is not coupled with a distributed file system such as HDFS or GFS. The development of this type of failure handling mechanism should emerge in interesting future research.
Under the MapReduce model, there is no direct communication path from the reduce stage back to the map stage of the computation. However, such a communication can be simulated by writing the output of the reduce stage to a distributed file system, and then reading the output back in map tasks during the next iteration. To illustrate this approach, let’s consider a MapReduce implementation of a PageRank algorithm. For this analysis, we assume that the link graph is presented as an adjacency matrix in the format <<page_1, <link_1, .. ,link_m1>>, <<page_2, <link_1 ,.. ,link_m2>, …, <<page_n, <link_1 ,.. ,link_mn>>. The following algorithm shows a possible
122
Pagerank Algorithm for MapReduce 1 2 3 4 5 6 7 8 9 10 11 12 do
[Perform in parallel] –the map() operation
for each page Pi
PR(Pi)=ReadPageRankFromFileSystem(Pi) r=PR(Pi) / num_out_links
for each link Lj Emit(Lj,Pi, r)
[Perform Sequentially] –the reduce() operation
Collect all (Lj,Pi, r) for eachLj
for each page Pi PR(Li) =PR(Li) + r
WriteOutPutToFileSystem(Li,r)
while (num_iterations<MAX_ITERATIONS)
As can be seen in the above algorithm, steps 3 and 12 use a distributed file system to share the current PageRank values between the reduce and the map stages of the computations. In Twister, we used the combine operation to collect these current PageRank vectord to the main program. Then we broadcasted it to all map tasks again. However, in both these implementations, the above steps are responsible for the majority of the running time of the PageRank computation. In Pregel, the above steps are represented by direct messages transferred between super steps. Further, the communication between vertices does not introduce additional overheads. Therefore, the messaging-based approach adopted by Pregel provides a natural programming model for graph based algorithms.
123