COOK, COUTURE Y DURANTI (1991 / 1992 / 1998 / 2003)
4.1.10 CRUZ MUNDET (1994 / 2001 / 2006) 262 263
In this chapter, we presented a novel approach to system-wide monitoring that achieves sev- eral orders of magnitude of data reduction and sublinear merge times, regardless of system size. We introduced a model for high-level load semantics in SPMD applications. Using aggressive compression techniques from signal processing and image analysis, our approach can reduce and aggregate distributed load data to accommodate significant I/O bottlenecks. Additionally, our approach achieves very low error rates and high speed, even at the highest levels of compression.
full applications with dynamic behavior: Raptor and ParaDiS. Our framework is capable of efficiently handling both applications and captures information that has yielded insight into the evolution of load-balance problems, as demonstrated in our qualitative study of ParaDiS. Additionally, our evaluation showed that even with timing and rank information the size of the data files grows slowly with the number of processors and, hence, allows detailed measurement even at large scales. Further, we demonstrated that our framework preserves significant qualitative features of compressed data, even for very small compressed file sizes.
Chapter 4
Trace Sampling
4.1
Introduction
The previous chapter detailed an approach for lossy compression of load-balance data using techniques adapted from signal processing and imaging to transform and reduce performance data. In this chapter we introduce a second technique for scalable, system-wide data collec- tion that uses statistical sampling to reduce data volume.
Sampling has been used historically to estimate properties of large populations for sur- veys and opinion polls (U.S. Census Bureau, 2009; Gallup Organization, 2009; Cochran, 1977; Schaeffer et al., 2006). Unlike wavelet compression, which performs signal analysis to reduce a data set to a set of approximation coefficents, samplingrandomlyselects repre- sentative values from a data set according to statistical parameters. We demonstrate here that it can be applied to performance traces to reduce data volume.
Recall that in large systems, full-application event traces can grow to unmanageable sizes. Peak I/O throughput of the BlueGene/L system at Lawrence Livermore National Laboratory is around 42 GB/s (Ross et al., 2006)1. A full trace from all of its 212,992 processors could easily saturate this pathway, perturbing measurements and making the recorded trace useless. 1Ross puts the throughput at 25 GB/s, but this was measured before Blue Gene/L was upgraded from
Fortunately, Amdahl’s law dictates that scalable applications exhibit extremely regular behavior. A scalable performance-monitoring system could exploit such regularity to remove redundancies in collected data so that its outputs would not depend on total system size. An analyst using such a system could collect just enough performance data to assess application performance, and no more.
The difficulty of such an approach lies in deciding just how much data is enough for performance analysis. In wavelet compression, we value thresholds by truncating an EZW stream, but we still must collect values from all processes at the first level of the transform. Using sampling, we instead pick a random subset of processes from the population, and we sample only these processes to estimate properties of the system as a whole.
It has been shown using simulation and ex post facto experiments (Mendes and Reed, 2004) that statistical sampling is a promising approach to the data-reduction problem. We can use it to estimate accurately the global properties of a population of processes without collecting data from all of them. Sampling is particularly well suited to large systems, be- cause the sample size needed to measure a set of processes scales sub-linearly with the size of the set. For data with fixed variance, the sample size is constant in the limit. Thus sampling very large populations of processes is proportionally much less costly than measuring small ones.
We extend existing work with techniques for on-line, adaptively sampled event tracing of arbitrary performance metrics gathered using on-node instrumentation. We dynamically collect summary data and use it to tune the sample size as a run progresses. We also present techniques for subdividing, or stratifying, a population into independently sampled behav- ioral equivalence classes. Stratification can provide insight into the workings of an appli- cation, as it gives the analyst a rough classification of the behavior of running processes. If the behavior within each stratum is homogeneous, the overall cost of monitoring is re- duced. These techniques are implemented in the Adaptive Monitoring and Profiling Library
(AMPL), a library for Libra which can be linked with instrumented scientific applications. The remainder of this chapter is organized as follows. In§4.2, we detail statistical sam- pling theory, emphasizing its fitness for performance monitoring. We describe the architec- ture and implementation of AMPL in§4.3. An experimental validation of AMPL is given in
§4.4. We summarize of our research contributions in§4.5.