Instalaciones Pirometalúrgicas

Pipeline interlocking is a mechanism to detect a hazard and resolve it. This mecha-nism is necessary to preserve original data dependencies specified in a sequence of the instructions. Moreover, an interlock prevents instructions from being executed in a wrong order.

Pipeline interlock is usually implemented by the interlock unit of a processor, which constantly monitors the processor’s internal resources, such as pipeline’s registers and computational units, required by the instructions processed in each of the pipeline stages, and performs several operations, such as the following related to in-order dual-issue processors:

Dual-issue detection: the interlock unit detects whether two decoded instructions can be dual-issued, issuing a nop instruction together with the first instruction in the negative case.

Multi-cycle instructions: the interlock unit assures in-order execution of instruc-tions, by eventually stalling the pipeline in case instructions requiring more than one clock cycle are being processed.

Structural dependency check: the interlock unit checks functional units required by decoded instructions and eventually stalls the pipeline until the needed functional unit is available.

Data dependency check: the interlock unit checks data required by decoded in-structions and eventually:

1. activates appropriate selection of forwarding paths, if the data is available in a register of the pipeline;

2. stalls the pipeline until the needed data is available, in case the data is being processed.

Dual-issue detectionis typically implemented, at higher level, as a condition composed of several clauses, each one corresponding to a specific case where the dual-issue of instructions is possible (or not possible). All possible combinations of instructions that cannot be dual-issued are reported in the processor’s documentation.

When a condition precluding dual-issue is detected, the control logic activates suitable control signals that propagate a nop instructions in the pipeline and prevent one of the instructions to be issued (it will be issued at the following time slot).

1 mul r1 , r 2 , r 3 ; P i p e l i n e A

2 mul r 4 , r 5 , r 6 ; P i p e l i n e A ( no d u a l−i s s u e ) . . .

3 add r 7 , r1 , r 4

Fig. 2.20 Example of instruction schedule on a dual-issue processor

In case of faults affecting the dual-issue detection, this behavior can be disturbed, resulting in the following unexpected effects:

1. instructions that can be dual-issued are actually delayed and unnecessary stalls are introduced;

2. instructions, that are normally delayed, are issued.

In the first case, the performances of the processor are affected, and faults causing this are classified as “performance faults”. Such faults can be detected by using some performance counters that are able for example to compute the number of stalls;

these modules exist in many processors.

In the second case, the faults may change the data-flow of the instructions. In some situations, the instruction is skipped, thus the following instructions that retrieve data produced by the skipped instruction is wrong. For example, let us consider two consecutive mul instructions executed by a dual-issue processor equipped with a single multiplication unit, as in the code snippet of Fig. 2.20. The mul instructions (1,2) cannot be dual-issued; however, in case of fault, the instruction 2 may be skipped. In this specific case, the following add instruction (3) needs the result of 2 and will use an old and most likely wrong value in its computation. The same happens in case the instruction 1 is skipped.

In other situations, the internal registers of the pipeline may be changed, thus cor-rupting the results of other instructions in the pipeline that use data from feed-forward paths. This cascade effect may finally bring to unexpected run-time exceptions.

According to these considerations, the proposed functional test strategy aimed at detecting faults affecting the dual-issue detection feature is the following:

1. Preliminary setup operations:

(a) to initialize (or deactivate) modules affecting the performances, such as Branch Prediction Unit (BPU), and Caches, in order to create a determin-istic environment with respect to the execution time;

1 d i v r 1 , r 2 , r 3 ; P i p e l i n e A

2 x o r r 4 , r 5 , r 6 ; P i p e l i n e B

3 x o r r 2 , r 3 , r 5 ; P i p e l i n e A ( d e l a y e d ) . . .

4 add r 7 , r 1 , r 4

5 add r 7 , r 7 , r 2

Fig. 2.21 Example of instruction schedule including a multi-cycle instruction

(b) to initialize performance counters when available in the processor;

2. Test each pair of instructions that cannot be dual-issued:

(a) to prepare the operands with values suitable to observe the result in the signature computation;

(b) to perform a reset of the pipelines (refer to Section 2.3.2);

(d) to use the data produces by the two instructions in the signature compu-tation.

3. Read the value of the timer or performance counters and update the signature.

In case of in-order multi-issue processors, each pair of instructions is executed on all combinations of pipelines.

The proposed strategy can be extended to other interlocking features, such as the management of multi-cycle instructions and structural dependency check (or hazards).

In many processors, specific execution hardware is not fully pipelined, thus executing particular instructions may require additional clock cycles. This is the case, for example, of multiplications and especially divisions. In case a multi-cycle instruction is being processed, the interlock unit stalls the pipeline until the instruction (and the other one dual-issued) completes. We can consider a div instruction followed by other single-cycle instructions as a significant scenario, as reported in Fig. 2.21.

In the example, let us consider the div instruction 1 executed on the A pipeline and the instruction 2 that is dual-issued. Normally, the following instruction (3) is delayed to wait for instruction 1. In case of fault, the result of 1, 2, or 3 may change, thus corrupting the results of the following instructions (4,5).

1 l o a d r 4 , p a t t e r n A ; P i p e l i n e A

2 add r 3 , r 0 , r 0 ; P i p e l i n e B ( r 0 != r 4 )

Fig. 2.22 Test sequence that applies the first pattern of Fig. 2.22 to two CMPs involved in data-dependency check

Structural hazards may exist in a dual-issue processor with asymmetric pipelines, where one of the execution units contains certain functional units. As an example, in case of a single multiplier, a multiply instruction can only be issued in one of the pipelines; thus, it is delayed in case the order is not respected. In case of fault, the instructions are corrupted similarly to the previous examples.

The step 2 of the presented algorithm can be extended by:

1. a test for each multi-cycle instruction;

2. a test for each possible structural dependency.

The last feature which is interesting to be analyzed is the data dependency check.

The test of such a feature follows the strategy described in Section 2.2.2, which is extended for dual-issue processors. To detect if there is a data-dependency present in the pipeline, the register identifiers, which are encoded in the instruction operating code, are compared between different pipeline stages, by means of comparators (CMPs). In details, the amount of CMPs is equal to the number of input operands per each pipeline stage involved in the forwarding mechanism. For example, the EX_A OP1(analyzed in Section 2.3.5) includes a series of CMPs that check the operand against the possible sources in every one of the pipeline stages where these values may be produced (see Fig. 2.17). Accordingly, one of the inputs of these CMPs is connected to the operand identifier (i.e., a register) encoded in the instruction and used by the instruction in one stage, while the other is the identifier of the output register on one of the possible source stages where the other instruction in the pair assumed to have a data dependence is placed.

In Section 2.3.5, systematic patterns for CMPs (see Fig. 2.10) are transformed in a sequence of assembly instructions. In the following, the strategy is adapted to dual-issue processors.

As an example, it is possible to apply the first pattern of Fig. 2.10 to the two CMPs that verify data-dependency between two consecutive instructions, which are supposed to be dual-issued, with the instructions reported in Fig. 2.22.

Setup phase Load all registers with different values Reset of pipelines See Section 2.3.2

Test phase

; P i p e CMP i n p u t 1 CMP i n p u t 2

l o a d r 4 , v a l u e ; A −− −−

nop ; B −− −−

add r 2 , r 0 , r 4 ; A r 4 ( 1 0 0 ) r 0 ( 0 0 0 )

nop ; B −− −−

add r 1 , r 0 , r 2 ; A r 2 ( 0 1 0 ) r 0 ( 0 0 0 )

nop ; B −− −−

add r 0 , r 0 , r 1 ; A r 1 ( 0 0 1 ) r 0 ( 0 0 0 )

nop ; B −− −−

add r 0 , r 4 , r 0 ; A r 0 ( 0 0 0 ) r 4 ( 1 0 0 )

nop ; B −− −−

add r 0 , r 2 , r 0 ; A r 0 ( 0 0 0 ) r 2 ( 0 1 0 )

nop ; B −− −−

add r 7 , r 1 , r 0 ; A r 0 ( 0 0 0 ) r 1 ( 0 0 1 )

nop ; B −− −−

add r 0 , r 7 , r 0 ; A r 7 ( 1 1 1 ) r 7 ( 1 1 1 )

nop ; B −− −−

add r 0 , r 0 , r 0 ; A r 0 ( 0 0 0 ) r 0 ( 0 0 0 )

nop ; B −− −−

Compact registers and compute test signature.

Fig. 2.23 Implementation of the test algorithm for the EXAOP1CMP of an example in-order dual-issue processor with 8 registers

In the example, the first input of the CMP is r4 (100₂), while the second input is r0(000₂). Normally, the two instructions do not present any data-dependency, thus they are dual-issued. In case of fault, the CMP may signal a data-dependency and a stall may be inserted in the pipeline. Again, the effect of the fault may be observed by means of performance counters.

A complete application of test pattern to the EX_A OP1CMP can be performed with a suitable sequence of instructions interleaved with nop instructions, aimed at scheduling the test instructions only on the pipeline A, as depicted in Fig. 2.22.

In document RSSO2020 (página 187-193)