The model presented above assumes that computation, network transfer, and disk access can be completely overlapped. This ignores Amdahl’s Law, which can be expressed using the parameters of the model as:
where p is the parallel fraction of the computation, the portion that can be performed in parallel at the Active Disks. This equation also assumes that CPU processing is the bottle- neck, although a similar calculation would apply for an interconnect bottleneck as well.
We see that even if there is no parallel fraction ( ), the system with Active Disks is never slower than the system without. On the other hand, for applications such as the example Data Mining application shown for the AlphaServer 8400 system in the previous
System Disks SCSI PCI Actual (Q6)
AlphaServer 8400 5,210 MB/s 120x20=2,400 MB/s 266x3=798 MB/s 446 MB/s AlphaServer GS140 5,640 MB/s 96x40=3,840 MB/s 12x266=3,192 MB/s 684 MB/s Sun Enterprise 4500 990 MB/s 6x100=600 MB/s 2x1000=2,000 MB/s 180 MB/s
Table 3-3 Interconnect limitations in today’s database systems. The table shows the theoretical and achieved bandwidth of a number of large database systems executing the TPC-D decision support benchmark. Query 6 is a simple aggregation, so the primary cost should be the reading of the data from disk. The query is a scan based on the shipdate attribute in the table. All of the system use a layout that range-partitions the table based on shipdate. This optimizes performance for this particular scan on shipdate, but would be useless for another query that used orderdata, for example. It does provide a surrogate for determining the raw disk performance possible with the system. speedup 1–p ( ) scpu w --- ⋅ p d s⋅ cpu’ w --- ⋅ + scpu w --- --- = p = 0
section, the parallel fraction is close to 100% ( ), meaning a speedup of 1.75, close to twice as fast even with the low processing power of today’s disks. With next generation Active Disks at 200 MHz, the ratio would be 13.9. Even if the parallel fraction were only 50% ( ), the speedup would still be 7.6.
We see that the non-parallel fraction of a computation will definitely affect an appli- cation’s performance in an Active Disk system, but if we view the Active Disks as simply an “accelerator” on the host system, overall system performance will never be worse with Active Disks than without, while in many cases it will be many times better.
3.3.1 Startup Overhead
The serial fraction of the computation ( ) can be an inherent property of the com- putation, but it may be due simply to the overhead of starting up the parallel computation at the disks. The Validation section in Chapter 5 discusses the startup overheads seen in the prototype system and their impact on performance. In general, this will be the time to send the necessary code to the drives, initialize the execution environment on each disk, and begin execution on a particular data object or set of objects. In applications that oper- ate on very small data sets spread across a large number of disks, this could become a sig- nificant fraction of the overall execution time. However, given the applications and data sizes discussed in the previous chapter and the prototype applications illustrated in the next chapter, this overhead should easily be overcome by the amount of data being pro- cessed, resulting in a very low serial fraction and good speedups. In addition, many of the factors that contribute to the startup overhead will be static properties of the application or the Active Disk system, meaning that a query optimizer or runtime system could take this overhead into account and not initiate an Active Disk computation if it would be over- head-dominated. It could them proceed simply with the host processor and not take advan- tage of the extra power available at the drives.
3.3.2 Phases of Computation
The final property of an application that will work against the fully overlapped assumption of the model is synchronization between different phases of a computation. For example, the frequent sets application discussed in the next chapter proceeds in sev- eral stages and requires synchronization among all the disks and the host at the end of each stage, as illustrated in Figure 3-4. In this computation, the host sets the initial parameters for the computation and starts parallel execution at the disks. The disks then perform their computation locally and determine the results for their own data. These results are passed to the host and combined for the start of the second phase. This process is repeated through several more phases, until the host determines that the results obtained are com- plete and computation ends.
This type of synchronization among processors operating in parallel is the bane of all parallel programmers and system designers. There are several reasons to believe that this effect will be less severe in the case of Active Disk computations than in general par-
p = 0.98
p = 0.50
allel programs. For one, the types of computations performed at the Active Disks will usu- ally be data parallel, since the basic point of executing function at the disks is to move function and processing power where the data is - and distribute it in the same way that data is distributed. In addition, the disks will be largely homogeneous, eliminating some of the imbalances seen in general parallel systems.
In a sense, one of the degrees of freedom available in a general parallel program- ming system - the ability to move data to the place where there are available computing resources - is removed with Active Disks. The most successful Active Disk applications will operate on the data at the disk where it already resides. By computing on the data before it is placed on the network, Active Disks eliminates one of the phases of parallel computation that proceeds in three steps:
1) read data into the memories of the processing elements (whether into distributed memories or into a single, shared memory)
2) rearrange the data to the most appropriate node for processing, and 3) perform the processing
With perhaps a fourth phase:
4) rebalance the data (and work) among the processors
This process is simplified for Active Disks because the basic tenet is to compute on the data where it is stored, and then send it onto the network. The most effective Active Disks applications will perform the largest portion of their processing on the disks, before data is ever put onto the network. This does not mean that it is not possible to move data among computation elements, but it does lead to a different cost/benefit tradeoff for doing such a
Figure 3-4 Synchronization in a multiple phase computation. The diagram shows several stages of the frequent sets application introduced in the next chapter. The host initiates the computation at all the disks, the disks proceed in parallel computing their local results. The results are then gathered at the host which combines the individual disk results and prepares the parameters for the second phase. This continues through several phases, requiring synchronization among all the drives and the host at the end of each.
Phase II Phase I Initialize Computation Parallel Computation Combine Results Initialize Phase II Parallel Computation
and a further discussion of the differences between Active Disks and general parallel pro- gramming are provided in Chapter 7.