CAPÍTULO XI. GERENCIA DEL PROYECTO
Gráfica 59. Ingresos mensuales vs acumulados por ventas
In this section we discuss works that are not strictly categorized as cache- or scratchpad- oriented solutions. However, they propose engineered hardware or software solutions that provide better task isolation thus improving predictability.
Hardware Solutions
Huangfu et al. [75] introduced Performance Enhancement Guaranteed Cache (PEG-C). It is a hardware addition to regular I-cache in the form of a benefit counter for the hit and miss rates. This hardware design addresses the unpredictability in the access to caches, and also enhances the average performance comparable to a regular cache. The benefit counter keeps track of the number of hits and misses at runtime and provides access to the cache only when the value of the benefit counter is positive; otherwise, the access is served from memory.
Allard et al. [19] have proposed a hardware component named hardware context switch (HwCS). HwCS replaces the standard Ll cache controller of a processor. It divides the cache into two interchangeable layers, similar to our approach. The CPU can execute from one layer that acts as a regular cache, while enabling to save or load the content of the other layer simultaneously. HwCS makes the preemption overheads smaller compared to the task WCET as the cache content is saved after preempting the task and restored before resuming the task. Since both layers can access main memory at the same time, memory bandwidth is divided between the tow layers in the worst case.
PRET machine [56, 96, 61, 135] is another line of solutions that target real-time sys- tems from another angle. PRET stands for PREcision Timed machine. The philosophy of PRET is to provide a computing environment that is as timely and precise as the underlying synchronous digital system that implements the computer hardware. PRET demands big changes in processor design, memory subsystem, Instruction Set Architec- ture (ISA), programming language, and RTOS. The main reason for these changes is to deliver or propagate the notion of timing from the circuit-level to the software applica- tion level. Only then can the programmer express the temporal behavior as easy as the functional behavior of the application. As a first step, [96] proposed a multi-threaded single-core processor with extended version of SPARC ISA that delivers predictable tim- ing. The processor pipeline is interleaved between six hardware threads. Each thread has
private instructions and data scratchpad memories. Access to the shared main memory is arbitrated by a mechanism called memory wheel that uses round-robin scheduling policy to guarantee an exclusive access of threads to main memory in their time window. In addition, a deadline instruction has been added to the ISA to help synchronizing threads precisely. [61] presented a plug-in for LabVIEW Embedded that maps the LabVIEW G graphical programming language and its timing specifications to PRET. [135] proposed a tool that statically allocates instructions from multiple threads to a shared SPM for the PRET architecture. Similarly, Multi-Core Execution of Hard Real-Time Applications Supporting Analysability (MERASA) is a project that develops hardware specifically for Hard-real-time system [2, 4, 125]. Similar to PRET, MERASA [167] involves big modi- fications in processor design, cache memory, and bus and other interconnects for single- and multi- core systems. P-SOCRATES project [133] focuses on designing a predictable many-core systems. Specifically, the purpose of P-SOCRATES is to develop an entirely new design framework, from the conceptual design of the system functionality to its phys- ical implementation, to facilitate the deployment of standardized parallel architectures in all kinds of systems.
Software Solutions
Pellizzoni et al. [131] highlighted the impact of the multicore architecture on WCET based on contention for access to main memory. This work shows a linear increase in the WCET with the number of cores. Pellizzoni et al. [128] also introduced a Predictable Execution Model (PREM). PREM provides isolation in a multi-tasking system by scheduling access to main memory. PREM is based on software only and does not require hardware arbiters. It is based on compiler and OS techniques to divide a task into a memory phase and an execution phase; a task can run predictably from cache with no access to main memory while it is in the execution phase. This allows other masters in the system, such as IO, to be scheduled to access main memory while a task is in the execution phase. The work has been extended to multicores in [178] adopting a TDMA arbitration scheme with partitioned system. [26] studied different scheduling policies for PREM in multicore systems and compared it with the previous TDMA approach, EDF, and contention-based. The best schedule was based on least-laxity-first policy. After that, a global schedulability study has been introduced in [15]. A parallel task model for PREM has also been introduced in [16]. [178] and its derivatives partition the cache space among all tasks.
Note that, our proposed solution differs from PREM in two ways; 1) while PREM uses the CPU to prefetch the task into the local cache affectively wasting the CPU time, in ours solution, we pipeline the CPU execution and the DMA transfer. 2) PREM partitions the
cache space among all tasks, while in our solution, we limit the local SPM to two partitions only.
MemGuard [180] is another work which provides memory performance isolation while still maximizing memory bandwidth utilization, based on resource reservation or reclaiming techniques. MemGuard dynamically reserves and regulates per-core memory bandwidth or accesses based on hardware performance counters. If a core exceeds the predefined maxi- mum access usage, an interrupt will cause the core to jump back to the OS-level bandwidth regulator. As a result, the DRAM bandwidth is partitioned among cores guaranteeing a minimum bandwidth for each core. MemGuard dynamically adjusts the resource provi- sion based on its actual usage. For example, when the task is highly demanding on the resource, it can try to reclaim some possible spare resources from other tasks; on the other hand, when it consumes less than the reserved amount, it can share the extra resource with others. Another way to achieve predictability can be DRAM bank-aware allocation (PALLOC) proposed by H. Yun [179]. However, on some platforms (such as NVIDIA TX1), controlling DRAM bank allocation is problematic due to address randomization aimed at improving average performance.
From higher level, Mancuso et al. [106] proposed OS-level techniques for COTS mul- ticore architectures that partition the system into isolated single-core virtual machines. Usually, modular per-core certification cannot be performed in COTS multicore systems due to shared resource interference. However, this work allows per-core schedulability re- sults to be calculated in isolation and to hold when multiple cores run in parallel. Thus, existing software and schedulability analysis developed for single-core can be used as is in a multicore environment by utilizing the proposed single-core-equivalent virtual machines.