Índices o Razones Financieras - CONTENIDO DE LA PROPUESTA

CAPÍTULO IV: MARCO PROPOSITIVO

4.2 CONTENIDO DE LA PROPUESTA

4.2.6 Índices o Razones Financieras

In this section, we present the experimental results to validate whether ppBLAST in a small scale peer-to-peer overlay can reduce the search time of BLAST. First we randomly extracted 10 sets of queries, each with 1MB size of sequences, from the nt database and searched (using blastn) against the nt database in PC1, PC2, and PC3 locally. For each PC, the average values of the time spent searching the 10 queries are calculated and recorded. We then split the nt

database into segments, each with 3MB size. The segment distributor was set up in peer2 and began the segment distribution. At each time when the overlay has a new complete copy of all segments (a maximum of five complete copies when each peer has one), we submit a random query (1MB) to ppBLAST via the service broker residing on peer3. A copy of the complete segment set in the overlay means that all segments that peers own can cover the segment set (but some peers may lack certain segments). For the time spent on a job in ppBLAST, we calculate

1 2 3 4 5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 (b) S p ee d u p

Number of copies of complete set of segments

Comparing to PC3 local search Comparing to PC2 local search Comparing to PC1 local search

Figure 5-4. Speedup of ppBLAST comparing to the local searches on PC1, PC3, PC5

97 the time period between the point of submitting a job and the point at which all the results are retrieved from the DAST-DHT.

Fig. 5-3 plots the BLAST search time comparisons between ppBLAST and the local searches of PC1, PC2, and PC3. We can see that the execution time of BLAST jobs on ppBLAST are much less than those obtained from local runs on a single PC. The speedups are depicted in Fig. 5-4. As described in Section 5.1, the performance of a local BLAST search significantly decreases if the size of the database is bigger than the memory on the PC. The nt database used in the ex- periments has the size of 8.76GB and none of the three PCs have memories larger than this. Hence, the execution of the jobs run locally takes considerable time.

peer1 exhibits the worst case where it averages almost 10 hours to complete a BLAST search of a 1MB query against the nt database. ppBLAST improves the BLAST search jobs significantly because (1) the size of database segment is much smaller than the local memory; (2) five peers cooperate to finish one job. Note that there are several additional operations, such as peers retrieving queries and putting the results files into a DAST-DHT, which cost time in ppBLAST. How- ever, the size of the query is small enough to be obtained quickly (larger queries can be split) and storing result files can be managed asynchronously to the BLAST searching, i.e., if a peer finishes a task, it can store the result file in the background while it is doing the next task. Compared to the original BLAST search time, the overhead for the retrieving and storing operations are insignifi- cant.

Fig. 5-4 also shows the relationship between the ppBLAST execution time and the number of copies of the complete segment set in the overlay. It is obvious that the performance of ppBLAST is better if there are more copies of the complete segment set in the overlay. ppBLAST does not adopt a segment-on-demand mechanism [24] where a worker peer obtains a missing segment for a task from a central server. Instead, it assigns tasks based on what segments a peer current has. We design ppBLAST in this way because a central segment providing server may be the single point of failure, but without the server, segment-on-demand can force peers to download the lacking segments from others and the BLAST tasks

98 will have to wait, which itself is not time-efficient. Therefore, when there is only one copy of the complete set in the overlay, a job can be completed but a number of peers may not obtain tasks and be idle because they do not have the necessary segments. When more copies of segments are delivered among peers, the performance increases. This is shown in Fig. 5-3 and 5-4; when there are four copies of the complete segment set in the overlay, the job execution performance reaches near-optimal.

Fig. 5-5 presents the workload distributions of five peers. The workloads were calculated from the number of tasks that a peer took in a BLAST job. We can see that when there is only one copy of complete segment set in the overlay,

peer1 and peer2 take the majority of the workload. As listed in Table I, peer1 and

peer2 have high speed Internet access and they can download segments from others (including the segment distributor) quickly. Inside the first copy of the complete set in the overlay, they owned most of the segments. Hence, when the service broker dispatched the tasks, peer1 and peer2 were assigned more than 70%

1 2 3 4 5 0 5 10 15 20 25 30 35 40 W or k loa d pe rc ent age s (% )

Number of copies of complete set of segments

peer1 peer2 peer3 peer4 peer5

Figure 5-5. The work loads of each PC (the percentage of the total number of tasks in a BLAST job)

99 of the tasks. The remaining peers only obtain 26% of the tasks due to lacking certain segments. The workload tends to balance while more copies of segments are distributed among peers. When every peer in the overlay owns a complete segment set, peer3, peer4 and peer5 obtain more tasks than peer1 and peer2 because of their higher computing abilities.

In document Estudio de factibilidad para la creación de un centro de difusión y formación de arte dancístico, Riobamba 2017 (página 114-123)