Grafica IV – 1.9 DEFICIT DE VIVIENDA URBANO SEGÚN DIFERENTES FUENTES
110 1.7.2 Espacio privado
1.8 NIVELES DE CONSOLIDACION URBANA
A system obeys the PL3 memory model if and only if for any PL3 program (defined above), all modified executions of the program on this system are sequentially consistent.
Possible Optimizations for PL3
The distinction of sync operations into loop and non-loop categories enables the system to relax the program order among competing operations as compared to the previous two models we described. Specifically, given a competing write followed by a competing read, the program order between the write-read pair need not be maintained if either operation is identified with the loop label. In addition, competing writes that are identified with the loop label can be non-atomic with respect to multiple copies. It is possible to map PL3 programs
(a) P1 b1: a1: A = 1; c1: d1: u = B; while ( == 0); Flag2 Flag1 = 1; B = 1; a2: b2: P2 c2: d2: v = A; = 1; Flag2 while ( == 0); Flag1 Labels c1,c2: loop (acq) b1,b2: loop (rel)L L (b) while ( == 0); test&set(L2) P1 b1: a1: A = 1; B = 1; c1: d1: a2: b2: P2 c2: d2: u = B; v = A; unset(L2); unset(L1); while ( == 0); test&set(L1) Labels b1,b2: loop (rel) c1,c2 (test): loop (acq) c1,c2 (set): non−competing L L L a1,a2,d1,d2: non−competing L a1,a2,d1,d2: non−competing L
Figure 3.11: Example program segments with loop read and write operations.
to RCpc in order to exploit the above relaxations. Chapter 5 describes the mapping to RCpc, in addition to presenting a more aggressive set of constraints that still satisfy the PL3 model.
Figure 3.11 shows a couple of program segments to provide intuition for the program reordering opti- mization described above. The first example in Figure 3.10(a) shows two processors communicating data. Each processor produces a value, sets a flag, waits for the other processor’s flag to be set, and consumes the value produced by the other processor. The operations to Flag1 and Flag2 are competing and are shown in bold. These operations all qualify as loop operations. The optimization discussed above allows the write of one flag and the read of the other flag to be overlapped on each processor, e.g., read of Flag1 can be reordered with respect to the write of Flag2 on P1. As long as we ignore unsuccessful reads of the flag locations on each processor, the above optimization yields sequentially consistent executions of the program.6 Furthermore,
the writes to the flag locations need not be atomic with respect to multiple copies; therefore, even in scalable systems (where it is difficult to make an update write appear atomic), it is possible to use a simple update protocol to more efficiently communicate the modification to each flag.
Figure 3.11(b) shows a program segment that is similar to the one in Figure 3.11(a), except we use locks and unlocks (implemented by test-and-set and write to unset the lock) instead of flags. As shown, the test of the test-and-set and the write to unset the lock on each processor are loop reads and writes, respectively, and the set is a non-competing operation. Thus, the acquisition of the lock can occur fully before the release of the previous lock on a given processor (i.e., if the lock being acquired is already free).
Figure 3.12 provides another example to illustrate the reordering and overlap that is enabled by PL3. The sequence of operations shown in Figure 3.12(a) is the same as those in Figure 3.5. As before, the competing
6In a sequentially consistent execution, it is impossible for both P1 and P2 to read the value of 0 for Flag2 and Flag1, respectively.
However, this can occur with the optimization discussed above. Nevertheless, these unsuccessful reads do not need to be considered as part of the outcome for the (modified) execution, thus allowing us to consider executions with the optimization as being sequentially consistent.
READ/WRITE READ/WRITE READ WRITE READ/WRITE READ/WRITE READ READ/WRITE READ/WRITE WRITE 1 2 3 4 5 6 7
(a) program order
READ/WRITE READ/WRITE READ WRITE 1 2 3 READ READ/WRITE READ/WRITE WRITE 5 6 7 READ/WRITE READ/WRITE 4 (b) sufficient order
Figure 3.12: Possible reordering and overlap for PL3 programs.
operations are shown in bold. Assume all competing operations that are shown are identified with the loop label, and that blocks 1 to 3 and blocks 5 to 7 correspond to two critical sections. Compared to the overlap shown in Figure 3.5(b), the categorization of competing operations into loop and non-loop enables further overlap by allowing the write in block 3 (end of first critical section) to be reordered with respect to the read in block 5 (beginning of the second critical section). As a result, the two critical sections on the same processor can be almost fully overlapped.
3.2.4
Relationship among the Properly-Labeled Models
We have presented three programmer-centric models that successively exploit more information about mem- ory operations to increase opportunities for overlap and reordering. The first model (PL1) requires information about operations that are competing (i.e., involved in a race). The second model (PL2) requires a further dis- tinction among competing operation based on whether they are used to synchronize other memory operations. Finally, the third model (PL3) further distinguishes a common type of waiting synchronization construct. We expect that the majority of programmers will opt for the PL1 model due to its simplicity and the fact that the information required by PL1 enables by far the most important set of optimizations. The PL2 and PL3 models target a much smaller group of programmers. For example, the PL3 model may be used by system programmers who write the code for synchronization primitives such as locks and barriers.
Since the three properly-labeled models form a hierarchy, programs written for one model can be easily and automatically ported to another model in the group. We first consider porting a program in the direction of a more aggressive model. The categorization tree for memory operations (e.g., Figure 3.9) can be used to determine how to correctly transform operation labels in one model to labels in another model. For example, to port a program written for PL1 to PL2, any operations labeled as competing in PL1 can be trivially treated as sync in PL2. Of course, with some extra reasoning, the programmer may be able to more aggressively label some of the competing operations in PL1 as non-sync in PL2. Porting a program to a less aggressive model
is similar. For example, a sync or non-sync label under PL2 must be treated as a competing label under PL1. A subtle issue arises when porting a program from PL3 to either PL1 or PL2 because PL3 excludes unsuccessful operations (in synchronization loop constructs) from executions when deciding whether an operation is competing. As a result, some operations that are labeled as non-competing in PL3 may be considered as competing in PL1 and PL2 (e.g., the set in a test-and-set used within a lock primitive). Therefore, a simple transformation such as treating non-loop, loop, and non-sync operations in PL3 as competing operations in PL1 does not necessarily lead to a PL1 program. This means that the PL1 model does not theoretically guarantee that such a program will behave correctly. One possible remedy is to extend the PL1 and PL2 definitions to also exclude unsuccessful operations in a synchronization loop construct. This is not necessary in practice, however, since the memory ordering constraints enforced by systems that support the PL1 or PL2 models are typically a strict superset of the sufficient constraints required for supporting the PL3 model.7
Section 3.5 describes practical ways for programmers to convey information about memory operations to the system based on the proper labeling framework discussed in this section. As we will discuss, most application programmers deal with information at the level of the PL1 model only. Therefore, only a few system programmers or sophisticated application programmers may deal with the extra information required by the PL2 and PL3 models.
3.3
Relating Programmer-Centric and System-Centric Models
This section summarizes the memory optimizations enabled by properly-labeled models as compared to the system-centric models described in Chapter 2. The more formal sets of sufficient conditions for supporting the three properly-labeled models are presented in Chapter 4.
Table 3.1 summarizes the set of sufficient constraints for satisfying each of the three properly-labeled models described in the previous section. For each model, we show the labels used by the model and the sufficient program order and atomicity constraints that would satisfy the model. The program order constraints apply to operations to different locations. For simplicity, we no longer carefully distinguish an operation’s label from the operation’s intrinsic category. Furthermore, we use the names of categories (i.e., the non-leaf nodes in the category trees shown in Figures 3.3, 3.6, and 3.9) in addition to label names (i.e., the leaf nodes). For example, the competing category covers the sync and non-sync labels in PL2 and the non-loop, loop, and non-sync labels in PL3. We want to emphasize that the constraints we describe are only sufficient constraints; they are not necessary constraints for either supporting the given PL model or for guaranteeing sequentially consistent results for a given program.
Consider the PL1 model. Table 3.1 shows that it is sufficient to maintain program order between a non-competing operation followed by a competing write, a competing read followed by a non-competing operation, and two competing operations. Similarly, multiple-copy atomicity should be maintained for competing writes. Table 3.2 provides the complementary information to Table 3.1 by showing the operations for which program order and multiple-copy atomicity do not need to be maintained. As before, the program order relaxations apply to operations to different locations. For PL1, program order need not be maintained between two non-competing operations, a non-competing operation followed by a competing read, and a
Table 3.1: Sufficient program order and atomicity conditions for the PL models. Program Order (sufficient) Multiple-Copy Atomicity
Model Labels first op second op (sufficient)
competing non-competing competing write
PL1 non-competing competing read non-competing competing write
competing competing
sync non-competing sync write
PL2 non-sync sync read non-competing competing write
non-competing competing competing
non-competing sync write
non-loop sync read non-competing
PL3 loop competing read competing read non-loop write
non-sync competing read competing write non-sync write non-competing competing write competing write
non-loop or non-loop or non-sync write non-sync read
competing write followed by a non-competing operation. Similarly, multiple-copy atomicity need not be maintained for non-competing writes.
As shown in Tables 3.1 and 3.2, each PL model successively relaxes the program order and atomicity constraints of the previous level. Consider how program order is relaxed. The distinction between competing and non-competing operations in PL1 enables the most important class of optimizations by allowing for non-competing operations to be overlapped with respect to one another. Since non-competing operations constitute the large majority of operations in most program, relaxing the ordering constraints among them can provide substantial performance gains. The further distinction of competing operations into sync and non-sync in PL2 can improve performance by relaxing the program order between non-competing operations and competing operations that are categorized as non-sync. Finally, the distinction of sync operations into loop and non-loop in PL3 allows the system to relax the program order between a competing write and a competing read if either one is a loop operation. Relative to PL1, the extra optimizations enabled by PL2 and PL3 are important only if the program has a frequent occurrence of competing operations.
The above discussion shows that the information conveyed through labels can be used to exploit the same type of optimizations that are enabled by aggressive system-centric models. Below, we provide some intuition for how this information may be used to efficiently execute PL programs on system-centric models while still maintaining sequential consistency. The next chapter provides a more formal set of conditions for porting PL programs to system-centric models.
We begin by considering the first set of system-centric models introduced in the previous chapter (i.e., IBM- 370, TSO, PC) that allow reordering of a write followed by a read in program order. Given the information conveyed by a PL1 program, this reordering is safe if either the write or the read is non-competing. The additional information provided by PL2 does not provide any additional cases. Finally, with PL3 information, the reordering is also safe between a competing write and a competing read as long as at least one is a loop operation. The second set of system-centric models that includes PSO can further exploit the information conveyed by labels by allowing the reordering of two writes as well. For example, the PL1 information makes the reordering of two writes safe as long as the second write is non-competing.
Table 3.2: Unnecessary program order and atomicity conditions for the PL models. Program Order (unnecessary) Multiple-Copy Atomicity
Model Labels first op second op (unnecessary)
competing non-competing non-competing
PL1 non-competing non-competing competing read non-competing write competing write non-competing
non-competing non-competing
sync non-competing sync read
PL2 non-sync sync write non-competing non-competing write
non-competing non-competing non-sync
non-sync non-competing
non-competing non-competing
non-loop non-competing sync read
PL3 loop sync write non-competing non-competing write
non-sync non-competing non-sync loop write
non-competing non-sync non-competing
competing write loop read loop write competing read
The third and last category of system-centric models consists of models such as WO, RCsc, RCpc, Alpha, RMO, and PowerPC, that allow program reordering among all types of operations. Given WO, we can exploit the information conveyed by PL1 to allow non-competing operations to be reordered with respect to one another. The extent to which each model exploits the label information varies, however. For example, while WO cannot exploit the distinction between competing read and competing write operations, the other models (i.e., RCsc, RCpc, Alpha, RMO, and PowerPC) can use this distinction to safely relax the program order between certain competing and non-competing operations (e.g., between a competing write followed by a non-competing read).
Among the system-centric models, the RCsc model best exploits the label information provided by PL1 and PL2 programs, and the RCpc model best exploits the information provided by PL3 programs. This is partly because the definition of PL programs was closely linked to the design of release consistency as the implementation conditions for a system [GLL+
90]. The next chapter provides an even more aggressive set of conditions, as compared with RCsc and RCpc, that still lead to sequentially consistent executions of PL programs. While the release consistency conditions are sufficiently aggressive for most practical hardware designs, the more aggressive set of conditions provide opportunity for higher performance in designs that support shared-memory in software (see Chapter 5).
3.4
Benefits of Using Properly-Labeled Models
The benefits of using properly-labeled models mainly arise from programming simplicity and easy and efficient portability among different systems. We briefly summarize some of these advantages below.
The primary advantage of properly-labeled models is that they are simpler to program with and reason with than the system-centric models. While system-centric models require the programmer to reason with low-level reordering optimization, programmer-centric models maintain the intuitive sequential consistency
model as the base model and simply require extra information to be provided about the behavior of memory operations in sequentially consistent executions of the program. By far the most important piece of information is whether a memory operation is involved in a race (i.e., competing or non-competing). Since most parallel programs are written with the intent of disallowing races on shared data, the programmer already has to analyze the program for any races. Therefore, requiring this information to be made explicit leverages the fact that such information is already naturally known by the programmer. The other two types of information required by PL2 and PL3 (i.e., sync/non-sync and loop/non-loop, respectively) are also relatively intuitive, even though the formal definitions for these categories may seem complex.
An important attribute of the properly-labeled models is that they allow the programmer to provide conservative information about memory operations. This greatly simplifies the task of providing correct information about memory operations since if the exact information about some operation is not known, the programmer can simply provide the conservative information. For example, an operation may be safely labeled as competing if the programmer is not sure whether the operation is involved in a race. At the extreme, a program can be trivially labeled by providing conservative labels for all operations; for example, labeling all operations as competing trivially yields a PL1 program. Of course, conservative labels reduce the opportunity for doing optimizations, and in the extreme case where all labels are conservative, system performance degrades to the level of a typical sequentially consistent implementation. Therefore, from a performance perspective, it is best to limit the use of conservative labels.
Another benefit of allowing conservative labels is that programmers can focus on providing accurate information for the performance critical regions in a program. For example, if most of the program’s execution time is spent in a handful of procedures, the programmer can concentrate on analyzing the memory operations in those procedures, and provide more conservative labels for the remaining memory operations. Similarly, the more detailed information required by PL2 or PL3 may be provided for critical synchronization algorithms used in a program, while the remaining parts may provide the less detailed information required by the PL1 program. Overall, the ability to provide conservative labels allows programmers to observe performance benefits that are proportional to the amount of effort spent by the programmer to properly label a program.
The information provided by properly-labeled programs also allows automatic and efficient portability of such programs across a wide range of systems. Contrast this to system-centric models. For example, consider porting a program that is originally written for the Alpha model to the RMO model. Recall that the Alpha model inherently maintains program order among read operations to the same location while the RMO model does not maintain this order. This subtle difference makes it difficult to efficiently port a program from Alpha to RMO because we must conservatively assume that the program orders between all pairs of reads to the same location are pertinent for correctness. In contrast, the labels in a PL program convey a lot more