• No se han encontrado resultados

Implementaci´ on num´ erica

We are motivated to implement adaptive multicore CPU protection because of the varying nature of SEUs errors in orbit, where the error rate increases near the poles, and near the South Atlantic Anomalies “SAA”. The adaptive implementation will guarantee a reliable system without sacrificing performance. In this work, the dependent variables are the protection modes and the duration of a particular protection mode. The independent variables are reliability and performance.

A trade-off between reliability and performance can be achieved by controlling the protection mode, and its duration before switching to another mode. The switching is done in a dynamic environment where the error rate changes with time, because of the variation in radiation flux in orbit.

If we can switch the operation mode in real time, we will be able to apply more redundancy, and hence more reliability to the system when the error rate is high, and if the error rate drops, the higher performance mode (mode with less powerful protection techniques or without any protection techniques) will be switched on. The switching between modes will be automated depending on the error rate.

6.1.1 Implementation of Adaptive Multicore Protection

The Adaptive software works in three different modes, the first one is the simple mode: where no protection is applied, and the software is in its highest performance in terms of execution time (no overhead is added since there is no protection code added). The second mode is the ITMR, which uses compiler passes to automatically add protection code, by adding redundant instructions. The third mode is TTMR, running functions in three independent threads, and the outcome of every function will be compared to decide the right one. The fourth operation mode consists of combining the protection of ITMR with TTMR.

The mode of running will alternate periodically depending on the error rate Figure 6-1. For T1 the error rate is minimal, the unprotected mode is enough, where the code is running at its best performance. For T2 the error rate reaches high levels requiring the activation of the combined TTMR & ITMR mode, which can add an extra overhead that is benchmark dependent, however, the reliability will be the at its highest level during this mode. During T3 the error rate decreases, so the ITMR mode will be switched on, with lower overhead, compared to the combined protection mode. The switching is done periodically. At T4 the error rate has dropped again, meaning that the unprotected mode can be switched on to have higher performance Figure 6-1.

114

Figure 6-1 Alternating Protection Mode in Real Time

At the start, a periodic function of time will be created, representing the change of the error rates in orbit. According to the different values of the error function, the different protection modes will be called and executed. Calling the TTMR protection mode will create three threads and runs them, if the function running in the threads returns values, then the values will be compared to determine the correct ones, if the function is a continuous one (runs indefinitely), the compare function will be called in a different thread periodically in order to detect and correct errors. The threads are created using the pthread C++ library in a manner to avoid data racing while changing the values of the shared variables to detect and recover errors. The whole process is shown in the flowchart in Figure 6-1.

6.1.2 Adaptive Protection Modes 6.1.2.1 Instructions-TMR Mode

The LLVM compiler is chosen as the baseline source of this protection mode, where LLVM analysis and transformation passes have been created in order to automatically add protection code to the intermediate representation of any code of choice (supported by LLVM) [12]. The users of our software protection method do not have to write a single line of protection code, all they have to do is compile their unprotected code via our passes, and the code will be protected. The passes include an analysis and transformation one. The analysis pass will go through the code line by line in order to determine all types of instructions and provides statistical information about them. The transformation pass will use the information provided by the analysis to call the appropriate protection technique.

6.1.2.2 Threads-TMR Mode

This technique is using multithreading to enable code to run on multicore platform, where three redundant threads are running on three different processing cores. At the end of their execution, the different threads outcome will be compared for error detection and recovery.

6.1.2.3 Combined Protection Mode

TTMR creates three parallel threads. The threads will execute functions that have instructions triplicated using the ITMR protection technique explained previously. The functions can either execute a finite number of times and at the end update or return values, or will run indefinitely and keep updating their variables.

115

creation, if yes then they will be used in the threads, else the threads will use the initial values of these variables, (Figure 6-3).

The function inside the threads will be checked, if it is infinite, then the threads will be paused every (n Seconds) and wait for the voter to detect and correct their values. There is also a periodic check for threads timeout, if there is no timeout then the threads variables will be updated, and the execution of the threads will continue. If there is a timeout, then the threads will be restarted using their last correct values.

If the function inside the threads executes a finite number of times, then the threads will be joined, and their values returned for error detection and correction, in case there is no timeout. If a timeout occurred, then the threads will be terminated and restarted using their last updated variables if they exist, Figure 6-2.

116

Figure 6-3 TTMR Mode of Operation

Documento similar