4. The OOSH C++ Library
4.7 Application Implementation Issues
To implement a completely new application, one way of starting would be to do a trivial application in which no derived classes are defined, and build from there by defining classes specific to the application.
A minimal main program is illustrated in figure 4.4. This is a complete program (when linked to OOSH) which launches the number of processes given in the command line and terminates each after discovering that there are no precincts in the processor's queue.
58 AN 0BJECT-0RIEN1ED LIBRARY FOR SHARED-MEMORY PARALLEL SIMULATIONS
University
of Cape
Town
To tum OOSH classes into a full application, classes would have to be defined for application-specific precincts, space units, simulation environment and per-processor data-and any other classes relevant to the application.
The steps required for porting an existing application to OOSH depend on how closely the application structure corresponds to the library's flow of control. One approach is to start with the real-world problem being solved, and identify objects using an object-oriented design methodology [Booch 1991]. This would work to the extent that the goal of spatial decomposition was observed; otherwise fitting the resulting design to OOSH could be difficult. Rather than a classic object-oriented design strategy, it might be better to start with attempting to identify space units and precincts in the real- world problem. A possible view of the benefit of starting with OOSH rather than programming from scratch is that OOSH provides a start towards an object-oriented design.
A key factor, once the major objects have been identified, is identifying shared and non-shared data, and ensuring that data is grouped into padded and aligned objects to avoid false sharing. Working towards fitting real-world objects to OOSH classes would be a helpful strategy, since this strategy would be a short cut to both producing an object-oriented design and matching the design to OOSH. Another important aspect of the design is attempting to decompose the real-world space in such a way that interactions are local as far as possible, which would fit the OOSH emphasis on spatial decomposition with minimal communication. Again, relating real-world objects to OOSH classes as soon as possible would probably facilitate the process described here.
Identification of the spatial decomposition should also lead to defining the outer level control flow of the code. Once a decision has been made as to how the application's outer level of control maps on to the OOSH library, earlier decisions as to which objects are best represented as precincts, space units etc. can be validated-and possibly altered.
If a spatial decomposition can be found, units of work can be grouped into precincts, the size of which should be kept flexible, to allow for blocking to different
University
of Cape
Town
cache sizes. Even if spatial decomposition cannot be carried as far as identifying units of work with purely local interactions, precincts can still be used as the unit of dispatching work, and should be defined in any implementation.
As part of the process of identifying precincts, any opportunities for blocking should be sought. If blocking avoids the alternative of many references to data across widely spaced points in a time step, it is likely to result in a major improvement.
If the code is being ported to OOSH from existing shared-memory multiprocessor code, attention also has to be paid to whether it relies on fork semantics. If it does, global variables that are assumed to be copied to each process have to be moved to the per-process data (derived from OOSH class Proc_data).
Once all these decisions have been taken, the final step before coding is to decide on the trade-off between maximizing the benefit of using OOSH, versus minimal change to the original code. The trade-off can be quantified with the aid of profiling tools such as are typically found on UNIX systems-or performance visualization tools such as Chiron [Goosen et al. 1993]-which can give an indication as to the potential for improvement of the original code.
In some cases, if an existing well-optimized application is being ported to OOSH, major restructuring will not be worth the effort. In others, where the original application has poor cache behaviour, a major restructuring will be worthwhile. When an application is being coded from scratch, this trade-off does not apply. In some cases a spatial decomposition may be easy to achieve. In other cases, it may not be worth the effort to find a spatial decomposition, because the amount of computation in relation to cache misses is high enough for cache misses to be insignificant. The definition of a
"high" amount of communication relative to cache misses depends on the characteristics of available machines: as the memory-processor speed gap widens, applications previously considered high in computation in relation to misses may become candidates for implementation using OOSH.
Chapter 5, which explains the choice of applications used to measure performance, takes points raised in this section into account.
60 AN OBJECT-ORIENI'ED LIBRARY FOR SHARED-MEMORY PARALLEL SIMULATIONS