introspec.vo' Análisis'del' des.no' Análisis'de' riesgos'
4 Factores que afectan a la movilidad de la actividad manufacturera de la empresa multinacional
4.2 Obstáculos y facilidades a la salida de un territorio
With a few slight exceptions, static branch prediction can be broadly defined as at-compilation based techniques which may utilise; profile statistics, path information or general heuris- tics to determine likely branch outcomes. An exception to this fully software based defi- nition can be seen in the forward not-taken/ backward taken scheme. Program analysis in [105] by Smith found that, for loop intensive operations, those branch operations jumping backward in code will typically be taken, while those branch operations which are for- ward pointing are less likely to be evaluated as true. While the compiler can reorganise the application code to reflect this information, the micro-architecture must be designed to reflect this, with any prediction logic comparing the current program counter to the branch target address. A coarsely grained prediction mechanism, the forward taken/back- ward not taken scheme clearly mispredicts a large number of conditions. For example, the last iteration of every loop is predicted incorrectly while a conditional jump to a sub- routine requires extensive code reordering. An extension of such hybrid-static techniques
might be to encode additional information in the instruction word at compile time. The hardware is then redesigned so that a branch prediction decision takes into account not only the direction of the branch but also whether the branch was hinted as likely to be taken or not at compile time [106].
Referring back to the DLX pipeline it can be seen that the next instruction to be fetched after a branch instruction is purely speculative since it is unknown whether the branch will be evaluated true or false and whether the pipeline must therefore be flushed or not. A simple solution to this wasted slot might be to move an instruction which is unrelated to either branch decision but would be required later to this slot. An example of such a case might be where a variable is set after a conditional call to a subroutine. Although effi- cient in that no additional hardware is required and no pipeline stages are wasted, there are a number of difficulties associated with filling every post branch delay slot with an independent instruction. Firstly, there must always be an independent instruction which can be positioned in the delay slot. Since most applications are sequential in nature this is not always possible. Secondly, as the number of pipeline stages increases, the number of delay slots also increases, meaning multiple independent instructions must be relocated. Finally, the compiler must take into account these relocations when allocating register space to variables. In the event of no instruction being available, the compiler will be forced to insert a No-Operation (NOP) into the delay slot. In [107] it was found that branch instructions comprised 10.96% of all instructions executed for the SPEC bench- mark, while the injected NOP operations comprised 8% of the total instruction workload.
While both of the solutions outlined above are simple to implement, more complex static analysis methodologies are possible. In general, static techniques typically fall into one of two categories, profile-based static prediction, which attempts to extract predic- tion information from sample runs of the target program, and program-based prediction, which maps heuristic rules to the target program. Fisher and Freudenberger found that by subjecting a target program to a number of previous runs, it is possible to deduce the likely direction of branches regardless of the dataset [108]. Such profile (or path) based tech-
niques can be expanded to include path information based on logical correlation [109]. The difficulty with such techniques is the time required to generate the input traces, pro- file the target application and finally recompile the modified target program to include the new branch information. The work in [110] highlighted some of the performance issues related to the original static correlation method outlined by Young et al [109]. Reducing the need for recompilation, other static methodologies have encoded the profile informa- tion in a likely bit within the branch instruction. This likely bit is then used at run-time by hardware to guide any branch prediction. At a more fundamental level, the problem with
profile based static prediction is how to generate the trace dataset used to profile the target
application. In [111] it was found that real profiles provide significantly better coverage than either estimated or random traces. From an NP perspective, the limitation with re- gards to the profile-based technique is how variations in network traffic would affect any predictions. Consider a NP router running packet metering and IPv4 forwarding. Over a short period of time the metering algorithm must handle short bursts of traffic [112]. Over a longer period, say 24 hours, the profile of the metering algorithm would adjust to reflect the periods of under-utilisation. At the same time, the execution trace for the forwarding application would adjust to routing table changes. On the other hand, the fact that the majority of packets must pass verification for the network to remain viable should make
profile-based techniques applicable to NP systems.
Another common method of providing static branch prediction is via program-based correlation techniques. As was noted previously, work by Smith [105] and McFarling [106] found that common programming idioms can be used to detect likely conditional outcomes. In [113] and [114] a number of more structured heuristic rules were defined which could be used for static branch correlation. The basic methodology employed within a program-based technique is to detect whether a branch operation is likely to be taken, based on the program structure. For example, routines required for error process- ing are rarely called and it can therefore be assumed that any conditional call to an error sub-routine would rarely be taken. Calder et al. proposed a more global framework for
Figure 2.12: Sample 2-Bit State Transitions for Branch Predictor
large programming body currently deployed [115]. When compared to path-based tech- niques, the primary advantage of this method is the removal of any testing and simulation steps when profile traces are extracted. Instead, the program heuristics can be incorpo- rated at compile time [116].
Within general purpose processing static prediction techniques have not provided the prediction rates required for modern microprocessors. Neither profile nor program frame- works obtained prediction rates in excess of 90%, but both methods do provide a means of optimising object code during compilation, e.g. static prediction provides a method of removing redundant paths [117], and is commonly implemented at compiler time.