4. La oferta comercial minorista en la Comunitat Valenciana
4.5. Grandes equipamientos comerciales: Centros comerciales en la Comunitat Valenciana
of the prediction.
Composability and predictability are also greatly affected in the presence of timing anomalies. Timing anomalies related to WCET analyses were first described by (LUNDQVIST; STENSTROM,1999). A timing anomaly is a situation where the local worst case does not contribute to the global worst case, i.e., a cache miss, though increasing the execution time, results in shorter global execution time. The first condition to avoid timing anomalies is the use of in-order resources (LUNDQVIST; STENSTROM, 1999) what is not common in today hardware architectures. Timing anomalies jeopardize the composabil- ity because we cannot divide WCET calculation in subproblems. Pre- dictability is also affected because simplifications on the analyses will not produce results with a reasonable accuracy margin.
In order to guarantee a composable timing analysis, state-of- the-art processor technologies such as dynamic branch prediction and cache memories with out-of-order pipelines should be avoided. Yet, as stated by (PUSCHNER; KIRNER; PETTIT,2009), strategies adopted in real-time architectures should not lead to significant performance losses when compared to state-of-the-art technologies.
1.2 THESIS OBJECTIVE
The objective of this thesis is to investigate various processor architecture features that lead to a predictable design with reasonable WCET performance. The thesis to be demonstrated is that it is possi- ble to assemble together hardware elements that increase performance but are predictable enough to ensure efficient and precise analyses. As described in the previous section, one of the first steps to demonstrate predictability is by obtaining the WCET. The construction of a WCET analysis tool as well as aspects involved with the hardware are also sub- jects of this work.
Among the architectural elements that are covered, we have:
Pipelining is a technique where complex operations are organized into sequential simpler ones to increase throughput. In the case of predictable processor design, pipelines are necessary but in- structions should be executed in-order. Pipelines with out-of- order execution allow high average-case performance but jeop- ardize WCET analysis due to timing anomalies (LUNDQVIST; STENSTROM, 1999).
∙ Instruction parallelism:
In modern processor design, the concept of superscalar is exten- sively used where more than one instruction is executed in each pipeline stage. This design overcomes the limitation of standard pipelines where the maximum throughput is one instruction per cycle. In the case of real-time processors, multiple instructions could also be executed in each pipeline stage using the Very Long Instruction Word (VLIW) design philosophy (FISHER; FARA- BOSHI; YOUNG,2005). VLIW machines are better for real-time systems because instruction scheduling is fixed and defined offline during compilation time and, that enhances the analyzability. No hardware for instruction scheduling have to be modeled in VLIW design.
∙ First level of the memory subsystem:
Memory subsystems designed for real-time systems are the sub- ject of various recent works (REINEKE et al., 2011). There are several approaches and some of them require complex modifica- tions in the compiler and/or overload the hardware (SCHOE- BERL et al., 2011). In this work, we will address the memory predictability issues using a direct-mapped instruction cache and a scratchpad memory for data. Scratchpad memories are simi- lar to caches but their contents must be managed explicitly by software.
1.2. Thesis objective 39
Branches are instructions that perform conditional control-flow modifications. They are used forif,forandwhilestructures and they usually decrease pipeline performance adding stall cycles to the pipeline. One way to overcome this limitation is the use of branch prediction. There are dynamic branch predictions and static branch predictions. The use of dynamic branch predictions jeopardizes predictability and the static ones provide interesting WCET performance (BURGUIERE; ROCHANGE; SAINRAT,
2005). We support static branch prediction demonstrating its importance in terms of performance and we provide methods for correct WCET analyzability.
∙ Predication:
Predication is a technique where instructions are conditionally ex- ecuted based on a Boolean register. It is different from branches because there is not any control flow modification to execute or ignore instructions. There are two types of predication: partial and full. Full predication allows instructions to be executed or ignored directly based on a Boolean register (this type of predi- cation is common in ARM architectures). In case of partial pred- ication, instructions cannot be ignored through a Boolean oper- ator but two values can be selected using special selectinstruc- tions. Predication is an important technique to reduce program paths through inducing to the single-path programming paradigm (PUSCHNER,2005). We support both partial and full predica- tion but the latter is simplified to improve WCET analysis with- out jeopardizing performance.
∙ Complex arithmetic instructions:
There are complex arithmetic instructions like division and mul- tiplication that impose considerable overhead to the processor pipeline. Frequently, those instructions are only supported via software, mainly division. In this work we support both hard-
ware division and multiplication and they are implemented to have constant timing independently of their input parameters.
We conduct a study of the impact on the WCET performance of those processor features that are typically disabled or not fully ex- plored in real-time applications. Such features include the use of static scheduling VLIW processor with wide fetch, the importance of static branch prediction, the performance of complex instructions (memory, multiplications, division) and the use of predication.
Increasing the performance of real-time processors while preserv- ing analyzability is a relevant subject. For that purpose, we analyze the WCET performance of the deterministic four-issue Very Long In- struction Word (VLIW) processor prototype describing its features and its timing characteristics. This prototype is implemented in VHDL us- ing an Altera Cyclone IV GX (EP4CGX150DF31C7) in a DE2i-150 development board.
Besides the VHDL prototype, there are other products of this thesis like the implementation of the hardware modeling of the WCET analysis tool and cycle accurate software simulator. Both WCET anal- ysis tool and the simulator are written in C++. In order to have a compiler for the architecture and to provide a customizable environ- ment to research real-time compiler capabilities, a new code generator back-end for LLVM (LATTNER; ADVE,2004) was implemented but it is out of the scope of this thesis.
In terms of test cases, we used representative examples of Mälar- dalen WCET benchmarks (GUSTAFSSON et al.,2010) which are com- monly used for WCET evaluations.
1.3 CONTRIBUTIONS
Regarding our main contributions to real-time processor archi- tectures, we can highlight: