METODOLOGÍA Y EQUIPO EXPERIMENTAL
31 3.6. Cantidad de Humedad
The run time behaviour of ODE can be divided into the following stages:
1 Thread start: Similar to double execution, the leading and trailing thread are started when a PE becomes available.
2 Thread execution: When the threads are executed, the leading thread’s PE buffers all twrite/ twritep/tdecrease instructions in the write buffer and the OWM buffer. Simultaneously, the core probe creates the CRC-32 signature, similar to double execution. The core probe of the trailing thread’s core creates also the signature of its output.
3 Thread end: When the leading thread has finished execution, indicated by tdestroy, the PE’s core probe sends the CRC-32 signature to the FDU. Afterwards, the leading thread immediately commits without waiting for the trailing thread and the synchronisation counts of the succeeding threads are decremented instantaneous. The TSU can then immediately start new threads, when their synchronisation counts has reached zero.
4 Output comparison: The FDU waits for the signatures of both the leading and the trailing threads and compares them.
5 Thread commit or recovery: In case of matching signatures, the TSU can continue execution. When a fault is detected, the FDU triggers the global recovery mechanism described in 6.2.
By contrast to double execution, ODE always schedules the trailing threads with lower priority than the leading threads, because only the leading threads can commit its (unverified) results and guarantee global progress of the system.
6.1.3 Input Replication
Similar to double execution, the input data must be replicated for the leading and trailing threads. Since the TF of a data-flow thread is immutable and can not be modified after the synchronisation count has reached zero, explicit duplication of the TF is not necessary, even when the leading thread commits.
In case of double execution, stores to the OWM region do not require explicit input replication, because concurrent OWM stores to the same memory location are prevented by data dependencies in the data-flow graph. This means that parallel executable data-flow threads can not write the same OWM addresses, while writes to the OWM region of redundant threads are isolated from each others in the OWM buffer. However, ODE allows the leading thread to write (unverified and possible erroneous) output to the
Time Leading Trailing TSU PE 1 PE 2 FDU start start send sig. 1 ack commit 1 2 3 4 5 send sig. 2 commit comp.
Figure 6.2: Execution stages of ODE.
OWM region. Furthermore, overwritten values in the OWM may be later required by the trailing thread. This can lead to input incoherence between trailing and leading threads. To prevent input incoherence for OWM reads of redundant threads, the accessed parts of the OWM must be replicated before they can be changed by the leading thread’s commit. Therefore, the OWM section accessed by a thread must be copied before the leading thread can commit. To restrict the size of the copied section to a minimum, an additional instruction is used (owm mem), which is executed at the beginning of the thread to inform the TSU about the start address and the size of the OWM section. The owm mem instruction can be added by the compiler or the programmer and must be executed in both redundant threads. The TSU then initiates a replication of this section in the OWM region to prevent input incoherence between the trailing and the leading threads.
The management of the mapping between the original OWM section and the copied section is managed by the TSU by an OWM mapping table, depicted in Figure 6.3. The mapping table stores the ID of the trailing thread, the pointer to the original section, the pointer to the copied section, and the length of the section.
When the owm mem is executed the first time, the TSU searches in the OWM mapping table, whether an entry for the trailing thread ID exists. If this is not the case, the TSU creates a copy of the section specified by the owm mem instruction using a DMA transfer and allocates a new entry in the OWM mapping table. When the trailing thread calls the same owm mem instruction, the TSU searches again in the OWM mapping table for the
6.1 Optimistic Double Execution
Figure 6.3: OWM mapping table and pointers to the original and the copied OWM sections.
entry of the corresponding thread. The corresponding entry of the OWM mapping table is then forwarded to the core probe. The core probe takes now care for the mapping and prevents the PE from writing to the original OWM region.
The core probe monitors all OWM read addresses of a PE and translates them to the appropriate physical addresses of the leading or trailing thread. As a consequence, the leading and trailing threads have different physical OWM addresses in the system. This prevents the leading thread from serving as a prefetcher for the trailing thread, since redundant threads read OWM input from different physical addresses.
Reducing the OWM Copy Overhead The copying of the trailing thread’s shared memory section can induce execution overhead, since the OWM section specified in the owm mem instruction must be copied before the leading thread can commit. To prevent stalling of the trailing thread’s start, the OWM region is started immediately when the redundant threads become ready for execution and are moved to the RQ.