CAPITULO V: RESULTADOS Y DISCUSION
5.1. Resultados
5.1.3. De las Encuestas
5.1.3.1. Para Pobladores
This chapter concerns the first step of the DeNovo hardware project to rethink multi-core memory hierarchies driven by disciplined software models. The key observation is that disciplined program- ming models will be essential for software programmability and clearly specifiable hardware/software semantics, and can drive a holistic co-design of hardware.
DeNovoD shows that race-freedom, structured parallel control, and the knowledge of regions and effects in deterministic codes enable much simpler, more extensible, and more efficient cache coherence protocols than the state-of-the-art. The resulting protocol has no transient states, no invalidation message traffic, no sharer lists in directories, and no false sharing. A holistic co-design of software and hardware also allows new ideas (e.g., flexible cache partitions based on software specified regions), simpler and more efficient incarnations of previous ideas (e.g., use of bulk transfer, but with flexible software-driven granularity and with no directory serialization), and a synergistic collection of previously proposed optimizations.
Overall, compared to state-of-the-art MESI protocols, DeNovoD is much simpler and easier to verify and extend, performs comparably or better, and is more energy-efficient (since it reduces cache misses and network traffic) for a range of deterministic codes.
Chapter 3
DeNovoND: Efficient Hardware Sup-
port for Disciplined Non-Determinism
As explained in the previous chapter, DeNovoD presents a complexity-, performance-, and power- efficient hardware coherence solution for deterministic codes, driven by disciplined programming models. Although determinism is considered desirable for many application classes, many common algorithms take any of multiple possible outputs as legitimate for a given input. Such potential non-determinism in output often allows the algorithms to be more flexible and simpler than deterministic versions. For industry to exploit the benefits of DeNovo, it is imperative that we develop techniques to support non- deterministic codes that perform at least as well as conventional systems, without losing the benefits of DeNovoD.
In this chapter, we propose DeNovoND, a significant step toward achieving the DeNovo vision, by providing support for programs with disciplined non-determinism. We continue to apply our hardware- software co-design approach for DeNovoND; we exploit disciplined programming models with support for safe non-determinism to extend DeNovoD for programs that contain non-determinism. We aim to show that such programs can be supported by simple additions to DeNovoD, without sacrificing DeNovoD’s advantages.
3.1
Software Assumptions
In this thesis, we define non-determinism as potential “output non-determinism” through different schedule-dependent interleavings of shared data accesses. To include a larger range of programs in the non-determinism category, we define output as either intermediate or final output. (A program with non-deterministic intermediate output but with deterministic final output is also non-deterministic.) What is important is that the non-determinism should not simply come from uncontrolled and unex- pected behavior due to data races. Such behavior is not only potentially erroneous but can also make
program executions difficult to maintain and reason about. Therefore, in this chapter, we assume “disci- plined non-determinism” for DeNovoND where disciplined languages provide safer and more structured mechanisms to express non-determinism.
Figure 3.1 summarizes the software assumptions and constraints for disciplined non-deterministic codes.
1. Parallelism patterns: For parallelism patterns, we assume the same nested fork-join parallelism for disciplined determinism for DeNovoND as for DeNovoD in Section 2.1. Unlike programs with deterministic data accesses only, however, non-deterministic programs can include critical sections with locks in parallel forked tasks. Parallel forks are labeled non-deterministic if their tasks have such critical sections.
2. Conflicting accesses: In addition to non-conflicting deterministic accesses as assumed and supported by DeNovoD, conflicting accesses with potential non-determinism are supported by DeNovoND with the following assumptions: (1) non-deterministic accesses are distinguished from deterministic accesses at compile time, and the distinction is conveyed to the hardware. (2) Disciplined non-deterministic accesses to a given memory location are protected by the same lock and enclosed in its critical sections. Non-determinism is allowed only when it is explicitly requested through the aforementioned language constructs. As a result, the program is guaranteed to be data-race-free.
3. Execution semantics: Disciplined languages ensure that a potentially non-deterministic pro- gram obeys the above properties (structured parallelism and data-race-freedom). Such programs produce sequentially consistent results on DeNovoND with safety guarantees such as strong iso- lation between tasks within a deterministic fork and non-deterministic tasks with critical sections (refer to Section 3.1.1 for more details).
3.1.1 DPJ for Disciplined Non-Determinism
DeNovoND uses DPJ for non-deterministic codes [24] as an exemplar disciplined programming model to drive its detailed design. To enable the programmer to express non-determinism, DPJ provides parallel
Conflic'ng)accesses) Execu'on)seman'cs) DeNovoND)for)Non6determinis'c)Accesses) Parallelism)pa=erns)) Nested&fork+join¶llelism&with&cri6cal&sec6ons&&& Conflic6ng&accesses&from&concurrent&tasks&must&be&dis6nguished&from& non+conflic6ng&accesses&in&cri6cal&sec6ons&protected&by&the&same&lock& Disciplined&non+determinism&
Figure 3.1: Software assumptions and execution semantics for DeNovoND.
constructs that are potentially non-deterministic; i.e., foreach nd and cobegin nd [24]. These constructs allow conflicting accesses between their tasks, but require that such accesses be enclosed within atomic sections, that their read and write effect declarations also include the atomic keyword, and that their region types be declared as atomic. Note that there continue to be no conflicts allowed between a task from a deterministic parallel construct and any other concurrent (non-deterministic or deterministic) task. The compiler checks that all of the above constraints are satisfied by any type-checked program, again using a simple, modular type checking algorithm.
With the above constraints, DPJ can provide the following guarantees: (1) Data-race freedom. (2) Strong isolation of accesses in atomic section constructs and all deterministic parallel constructs; i.e., these constructs appear to execute atomically. (3) Sequential composition for deterministic con- structs; i.e., tasks of a deterministic construct appear to occur in the sequential order implied by the program (even if they contain or are contained within non-deterministic constructs). (4) Determinism- by-default; i.e., any parallel construct that does not contain an explicit non-deterministic construct provides deterministic heap output for a given heap input. The above guarantees not only ensure se- quential consistency but also allow programmers to reason with very high-level, strongly isolated, and composable components such as complete foreach constructs and all atomic sections.
For data accesses, we assume that the ISA provides a mechanism by which loads and stores can be tagged as accessing atomic regions with atomic effects (e.g., with a bit in the op-code). The DPJ compiler has this information and can generate code with the bit set for such accesses. We refer to such accesses as atomic accesses and to others as non-atomic accesses. Note that the former are regular data accesses from atomic sections and are not to be confused with atomic read-modify-writes or the C++ atomic keyword used for synchronization races. Support for atomic read-modify-writes in
synchronization races is introduced in Chapter 4.
Although DPJ supports atomic sections, DeNovoND assumes we can convert them to locks. This is possible because by default we can associate each atomic region with its own lock. For each atomic section, we can acquire locks for each atomic region that it accesses in a predefined order. This can be optimized in several ways; e.g., by coarsening the locks. An implementation of this algorithm is outside the scope of the thesis. The benchmarks evaluated for DeNovoND in this chapter are either originally written with lock synchronization or manually analyzed and converted to critical sections (from transactions).