3. Related Work
3.5 Synchronization Strategies
This section presents a few possibilities for synchronization, to illustrate possibilities I
for future work. Strategies which are relevant to shared-memory programming could be incorporated into the library. However, since the focus of the research is reducing cache misses, relatively simple approaches to synchronization were generally considered sufficient for OOSH. However, where the need for synchronization can be reduced by restructuring, such restructuring is worth considering-see for example the distributed synchronization strategy adopted for MP3D in 6.2.2.
Most work on novel strategies for synchronization has been done for distributed memory systems, since communication costs on these systems are high. Also, on a distributed memory system, it is not possible for a processor to look into another's address space, so problems like deadlock detection are more of an issue than for shared-memory systems.
University
of Cape
Town
As latencies for memory access increase, it is possible that shared memory systems will require some of the strategies used in distributed memory systems. Some of the work by others aimed at distributed memory systems includes virtual time and distributed mutual exclusion.
Virtual time is based on the idea that a process may optimistically assume that it has not moved ahead of other parts of a simulation. If, however, it receives a message from another part of the simulation that is older than work the process has already completed, the process must roll back work to the time of the newly received message, and broadcast antimessages to inform other processes that they must roll back work resulting from any now invalidated messages originating from the rolling back process [Fujimoto 1990].
Virtual time, or the optimistic simulation model, is most suited to cases where the amount of memory used by a simulation is relatively small in relation to the amount of processing required. Also, the communication cost for checking clocks of other parts of a simulation must be high in relation to potential losses from broadcasting anti- messages and performing a rollback. In current shared-memory systems, these limitations restrict the class of applicable applications more than on distributed memory systems, or distributed systems (i.e., optimistic simulation is less attractive on a shared- memory system).
Distributed mutual exclusion is intended mainly for distributed systems, but is applicable to any system connected by a network, including a distributed shared memory system. Mutual exclusion is necessary for managing any shared resource, including atomic file transactions. Fault tolerance is a major requirement of the general solution to distributed mutual exclusion, since a networked distributed system can include networked nodes which may go down, and parts of the network itself may fail
[Agrawal and El Abbadi 1991].
While fault tolerance is not as important to programmers of shared memory systems, some of the issues which arise in achieving efficient implementation of
40 AN 0BJECT-ORIEN1ED LIBRARY FOR SHARED-MEMORY PARALLEL SIMULATIONS
University
of Cape
Town
distributed mutual exclusion may become relevant as memory latencies rise in relation to CPU speeds.
Research on synchronization for shared-memory systems has focused more on improving the speed of existing mechanisms than on finding novel mechanisms.
For example, tree-based barriers solve essentially the same problem as the distributed synchronization strategy introduced by this research, but without requiring any change to program structure [Mellor-Crumney and Scott 1991].
Cache coherency-based locks are implemented as part of the cache mechanism.
Instead of using lock variables and relying on the usual cache coherency protocol to ensure atomicity of an attempt at acquiring a lock, the lock mechanism is implemented by the cache controller. Cache coherency-based locks address the problem of hot spots caused by locks [Cheriton et al. 1991a].
Contention-free locks attempt to solve the lock hot spot problem without special hardware. For example, a ticket lock gives a process attempting to acquire a lock a number. When the global lock value reaches the value of the process's ticket, it has acquired the lock. It only sets the global value on releasing the lock. The number of invalidations can be minimized by using two different counters for picking up the ticket and for the global count which is tested. This is an improvement on a test-and-set lock where every processor attempting to acquire the lock spins on it and attempts to set it constantly (conditional on its value), and also a little more efficient that a test-and-test- and-set lock, where every process waiting on the lock only tries to set it when its value changes [Mellor-Crumney and Scott 1991].
Distributed shared memory systems are making increased use of relaxed consistency models, such as release consistency, in which it is assumed that shared variables will only be written when shared by a lock. In the release consistency model, dirty blocks only need be written back when a lock is released. To ensure atomicity of the write, the release is blocked until the write back is complete [Dwarkadas et al.
1993]. Release consistency is also used in the DASH architecture, which is not a
University
of Cape
Town
distributed shared memory system but does have high latency for misses far down the hierarchy [Lenoski et al. 1992].
Adopting such relaxed consistency models, while a memory system implementation technique, cannot be considered in isolation from strategies for synchronization since it relies on the interaction between memory referencing and locking.