• No se han encontrado resultados

IDENTIFICACIÓN Y CUANTIFICACIÓN

1.6.3. Clorofilas y derivados en el aceite de oliva virgen

RDP is an initial scheme of a runahead distance predictor that guides the runahead thread executions based on the last observed useful runahead distance. This func- tionality causes the next runahead thread for a given load never exceeds the useful runahead distance computed in the previous runahead execution. So, the useful runa- head distance associated with a load instruction either stays the same or monotonically decreases during the different executions of its runahead thread. However, we observe that the useful runahead distance can unexpectedly vary during the execution of the thread. According to our analysis with regard to the useful runahead distance behavior, on average, 58% of times runahead distance changes (up and down). As a result, if the runahead threads initiated by a load change to a larger useful runahead distance in the future, this cannot be detected because there is no way to learn increasing runahead distances in the previous first approach.

Another important case is when the useful runahead distance at any runahead thread for a load results zero (no long-latency load has been detected). After this point, a runahead thread is never initiated due to that load even though runahead periods might become useful in the future. The worst case would be when the runahead distance is zero the first time, since we are banning these loads from initiating a runahead thread in the rest of misses during the execution. There are several cases in which the useful runahead distance becomes larger after having values of zero (16% of cases for MEM workloads on average). If the runahead distance predictor cannot take into account these cases, it would lose performance improvement opportunities by eliminating a large number of overlapped runahead prefetches.

To avoid this unwanted behaviors, we refine our initial approach to include several modifications that make it more accurate and flexible in order to predict the useful runahead distance. Following subsections describe the different refinements applied to this second approach for the useful runahead distance prediction.

Improving the reliability of the useful distance prediction

This second approach maintains the philosophy of the useful runahead distance concept but we introduce a new level of decision to enhance reliability (i.e., confidence) of the predicted runahead distance. Now, instead of recording only one (the last) useful runahead distance in the RDIT, this mechanism works with two runahead distances per load: the useful runahead distance obtained in the full -length runahead execution periods and the last useful runahead distance. The former (full runahead distance) is updated each time a runahead thread is fully executed whereas the latter (last runahead distance) is always updated per each runahead thread execution. We call this second approach Runahead Two Distance Predictor (R2DP) due to this feature.

R2DP uses these two distance values as an indication of how reliable runahead distance information stored for a particular load is. When a load misses in the L2 cache, the R2DP computes the absolute difference between the last and full distance values and compares the result to a threshold value, called Distance Difference Threshold (DDT). If the distance difference is larger than DDT, R2DP treats the stored distance information as not reliable. In this case, a runahead thread is executed until the L2 miss is fully serviced in order to update the ‘unreliable’ stored distance values (allowing the mechanism to store a possibly higher distance value in the RDIT). Both the last and full useful distance values are updated at the end of this new full runahead thread execution. On the other hand, if the difference is smaller than DDT, then the stored distance values are considered to be reliable. The last useful runahead distance value is used to decide whether or not and how long the runahead thread should be executed. Using this enhancement, R2DP can correct the useful runahead distance value if the last distance becomes significantly different than the previous ones.

We examine several different threshold values for DDT. Although all these thresh- olds resulted in better performance and efficiency than the initial RDP proposal, we found that a threshold value of 64 obtains the best tradeoff between performance and extra work reduction. We will show results about this analysis in section 5.2.1.

Deciding whenever starting a runahead thread

As we described, RDP decides to start a runahead thread if the value of the useful runahead thread is not zero. If it is zero, the thread is not turn into a runahead mode. However, there are also useful runahead distances whose values are so small

that starting a runahead thread is not necessary. For instance, executing 10 runahead instructions to issue only one long-latency load. All the process to create the check- point, start the runahead mode and restore the context state would not be worthwhile in this case. Generally, the normal execution advance in the reorder buffer is enough to capture these close loads with less complexity.

For this second approach, we propose to start a runahead thread when the predicted useful runahead distance is above a particular activation threshold. We select this threshold based on a study about the number of instructions that can be executed from the time a candidate runahead load is detected to the time the load reaches the head of ROB. The extreme values for this study are between 0 and 113 instructions on average for the different threads in our SMT model. Thus, we set the threshold value in 32 instructions based on the global median for this study results. Therefore, if the useful distance is below this value, it means that the possible (near) MLP can be exploited without entering runahead execution, since other possible loads would be issued in the short period till the long-latency load reaches the head of the ROB.

Avoiding incorrect zero distances

Finally, we focus on resolving the limitation of runahead threads predicted with unset- tled useful runahead distance value of zero. Certainly, many runahead periods caused by a load do not provide any prefetching benefits, i.e., their useful runahead distance values are zero. However, there is also certain variability in the usefulness of some runahead threads, with useful runahead periods (with non-zero useful distance values) interleave with useless runahead periods (with zero useful distance values). Our study shows that 38% on average of runahead threads have a zero distance value at least in one runahead execution, and then, 26% of them increase their distance value later. Ac- cording to this study, using the RDP approach, the useful runahead distance value can be set to zero incorrectly, thereby making it impossible to find a larger useful runahead distance even though one might exist later.

To avoid this problem, we propose an additional modification based on a simple heuristic built on the previous improvement. This heuristic consists of the following: while the computed useful runahead distance results lower than the previous starting runahead thread condition (e.g. a distance less than 32) in the last N times for a load instruction, R2DP discards that prediction and the processor initiates full runa- head execution for that load instruction. At the end of this full runahead execution,

the runahead distances (full and last) are updated with the new obtained distance. Therefore, R2DP can update the RDIT information with larger useful runahead dis- tance in case it happens. Nevertheless, note that we enforce a limit of N attempts for this process, because if we always enable full runahead execution, we will ignore when the runahead periods are truly useless, diminishing the capability of this approach to reduce the useless extra work.

After performing an exploration of the design space, that we show in Section 5.2.1, our experiments show that a value of N=3 performs well in terms of performance and efficiency. To implement this heuristic, we simply add a 2-bit counter per-entry in RDIT to perform the countdown from 3. This complementary heuristic essentially provides an additional confidence mechanism that determines whether or not a small distance is reliable.

Operation of R2DP

Figure 5.3 shows a flow diagram that summarizes the R2DP mechanism procedure of this second approach. The flowchart shows the different steps from the RDIT access to the runahead distance update process once runahead execution ends. As before, the first time a static load misses in the L2 cache, RDIT has no useful information. For this reason, the corresponding runahead thread is executed until the L2 miss is fully serviced. RDIT updates both full and last useful distance fields with the observed useful runahead distance in this case.

Later, when that load misses in the L2 cache during normal thread execution once again, RDIT is accessed to decide about the execution of the possible runahead thread based on the information associated with it (1). Firstly, R2DP checks whether the recorded distances are reliable (2). To do this, R2DP computes the difference between the full and the last runahead distance values. If the difference is larger than the DDT (too much distance between them), a runahead thread is executed until this L2- miss load is fully serviced since they do not fulfill the two-distance reliability condition (f ull − last < DDT ). Both last and full useful distance values will be updated again at the end of this particular full-length runahead thread execution.

On the other hand, if the last and full distance difference is small, then the distance values are considered to be reliable. In this case, the last runahead distance value is used as the current useful runahead distance to decide whether or not to start the runahead thread and to control how long the runahead thread should be executed

Figure 5.3: Flowchart of Runahead Two Distance Prediction (R2DP)

(3). R2DP decides to start a runahead thread if the value of the last useful runahead distance is above the activation threshold. If so, R2DP turns the normal thread into a runahead thread to execute as many instructions as the last useful runahead distance indicates.

Finally, once the runahead thread executes as many instructions as R2DP predicts, the mechanism updates the corresponding runahead distance fields in the RDIT (4) for the runahead-causing load and the thread is restored back to resume normal execution. Using all described refinements in the runahead distance predictor scheme, this new approach is able to, first control possible variation of the runahead distances and, second, to enhance the reliability (i.e., confidence) of the computed runahead distance.

5.1.4

Implementation issues related to runahead distance tech-