• No se han encontrado resultados

Costo total Bambú Tower

CAPÍTULO XI. GERENCIA DEL PROYECTO

Gráfica 49. Costo total Bambú Tower

Several well-developed cache analysis techniques have been proposed for single-core pro- cessors. These techniques analyze the interference due to intra-task and intra-core cache conflicts. The latter is known as cache related preemption delay (CRPD). The CPRD focuses on cache reload overhead due to preemptions while the intra-task analysis focuses on the cache conflicts within the same task assuming non-preemptive execution.

In existing multicore processors, the last-level cache is typically shared by multiple cores. This design has several merits such as increasing the cache utilization, reducing the complexity of cache coherency and facilitating a fast communication medium between cores. However, it is extremely difficult to accurately determine the cache miss rate because the cache content depends on the size, organization and replacement strategy of the cache in addition to the order of accesses. Shared caches in multicore processors are similar to caches in single core processors in that they all have inter/intra-task interference. In addition, when multiple cores share a cache, they can evict each other’s cache lines, resulting in a problem known as inter-core interference.

Unfortunately, single-core cache timing analysis techniques are not applicable to mul- ticore systems with shared caches. Inter-core interference is caused by tasks that can run in parallel and this requires analyzing all systems tasks. The analysis of non-shared caches has been already considered as a complex process and extending it to shared caches is even harder. In fact, the researchers in the community of WCET analysis [155] seem to agree that ”it will be extremely difficult, if not impossible, to develop analysis methods that can accurately capture the contention between multiple cores in a shared cache”.

A timing analysis technique for concurrently running software on multicores with shared caches was proposed by Liang et al. [95], which extend the work in [177]. This analysis targeted the inter-core cache evictions. In this work, the lifetime of all the tasks that concurrently run on multiple cores are determined and then the anticipated conflicts in LLC are computed. The analysis accounts for the cache accesses and use static analysis approaches to estimate tasks WCETs. Another work by Hardy et al. [70] aimed to tighten WCET estimate. This work is based on Hardy’s previous work [71]. In this work, the authors proposed a compile-time method to reduce shared cache interference for instruc- tions among cores. This work supports WCET estimates in multiple cache levels in the presence of inter-core interference. In addition, this work statically identifies code blocks

that are used only once during the execution enabling these blocks to bypass the cache. Consequently, a tighter WCET can be computed.

Probabilistic approach to analyze cache also has been introduced in the literature. Quinones et al. [139] explored the effect of using random cache replacement policy on hard real-time systems. They showed that by applying Probabilistic Timing Analysis (PTA), they were able to avoid the risk of the aforementioned cache-based unpredictable access time. Probabilistic Timing Analysis (PTA) has emerged as a solution to reduce the amount of information needed to provide tight WCET estimates. Nevertheless, it imposes new requirements on hardware design. For instance, before [84], only fully-associative random- replacement caches have been proven to fulfill the needs of PTA, but they are expensive in size and energy. As a solution Kosmidis et al. [84] have proposed a hardware design that im- plements random-replacement cache which allows set-associative and direct-mapped caches to be analyzed with PTA. There however exist other opinions in the real-time community regarding the use of PTA in real-time systems. For example, Reineke [140] showed that the probability of hits that are computed for PTA are not independent. Consequently, the convolution of Execution Time Profiles (ETP) is not possible with randomized caches and hence is not suggested to be used in real-time systems.

In contrast to timing analysis techniques where caches are used without restrictions, the approach of managed caches has the advantage to avoid complex analysis methods for estimating the cache behavior.

Cache Locking and Partitioning

Cache locking and partitioning are techniques to gain predictable access to the shared cache. By locking a cache-line, no replacement/eviction can occur on the content of that line until it is unlocked. Cache locking requires hardware support and it is only supported in some COTS platforms. Different platforms support different styles of locking. For example, lockdown by cacheline or by way. In addition, in multicore systems, there are some platforms that support lockdown by master (core) in which the locked cache-way by some-master cannot be altered by other masters. Among the platforms that support this feature are Nvidia Tegra-2 and Tegra-31, Xilinx Zynq-70002, and Samsung Exynos 44123.

Similarly, cache partitioning provides exclusive access to a portion/partition of the 1 http://www.nvidia.ca/object/tegra-superchip.html 2 https://www.xilinx.com/products/silicon-devices/soc/zynq-7000.html 3https://www.samsung.com/semiconductor/minisite/exynos/products/mobileprocessor/ exynos-4-quad-4412/

cache. Partitioning can be core-based or task-based. Contrarily to locking, cache parti- tioning can be done in hardware or in software. The most common software-based cache partitioning technique is page coloring [97, 164, 67]. Page coloring explores the virtual to physical page address translations presented in virtual memory systems at OS-level. Cache partitioning can also be done by the compiler [117].

Suhendra et al. [155] explored the effect of locking and partitioning of last-level shared cache on predictability in multicore systems. They used different locking and partitioning schemes. The study combined static/dynamic locking with task-based/core-based parti- tioning. The results showed clear impacts of different configurations on predictability versus performance. They concluded that there is no one configuration suitable for all types of applications. The best cache configuration for predictability was static locking/task-based partitioning, as each task has its own cache partition that was not affected by preemption. However, a task gets a smaller partition as the number of tasks increases, which impacts the performance.

Shekhar et al. [147] is another work that utilizes cache locking to improve overall system utilization. This work targets many-core architecture where each core has lockable private cache, there is no L2 shared cache in the system. This work proposed a semi-partitioned scheduling where tasks, as many as possible, are statically assigned to cores. Those tasks that are not statically assigned to a core are allowed to migrate from one core to another. Tasks, in each core, are locally scheduled according to Earliest-Deadline-First (EDF). Tasks are allowed to lock some data in the private local cache. For the migrating tasks, locked lines belonging to that task are unlocked, migrated and re-locked on the target core.

A work at the kernel-level toward improving performance with guaranteed predictable timing is done by Mancuso et al. [104]. This work targeted shared last-level cache. In this work, real-time tasks are first profiled in order to depict the most accessed memory locations (hot pages) for each task. The profiling information are then used, at runtime, in a cache coloring and locking mechanism that helped tighten the WCET for real-time tasks. In this work, the cache space is partitioned among all tasks. Another work by Ward et al. [169] proposed the use of page coloring mechanism along with cache scheduling instead of statically partition the cache. However, this work exhibited some overheads due to the dynamic locking approach. None of the studies reviewed above consider parallel tasks and associated need for intra-task communication.