EL CONTROL INDIRECTO “CUASI BÚSQUEDA” - CLASIFICACIÓN DE LOS ALGORITMOS DE BÚSQUEDA DEL MPP

1. MARCO TEÓRICO

1.4. CLASIFICACIÓN DE LOS ALGORITMOS DE BÚSQUEDA DEL MPP

1.4.1. EL CONTROL INDIRECTO “CUASI BÚSQUEDA”

one cycle earlier (one less stage to go through). If there were data hazards from

loads to other instructions, the change would help eliminate some stall cycles.

Instructions Executed Cycles with 5 Stages Cycles with 4 Stages _Speedup a. 5 4 + 5 = 9 3 + 5 = 8 9/8 = 1.13 b. 4 4 + 4 = 8 3 + 4 = 7 8/7 = 1.14

4.14.3 Stall-on-branch delays the fetch of the next instruction until the branch

is executed. When branches execute in the EXE stage, each branch causes two stall

cycles. When branches execute in the ID stage, each branch only causes one stall

cycle. Without branch stalls (e.g., with perfect branch prediction) there are no stalls,

and the execution time is 4 plus the number of executed instructions. We have:

Instructions Executed Branches Executed Cycles with Branch in EXE Cycles with Branch in ID Speedup a. 5 1 4 + 5 + 1 ´ 2 = 11 4 + 5 + 1 ´ 1 = 10 11/10 = 1.10 b. 4 1 4 + 4 + 1 ´ 2 = 10 4 + 4 + 1 ´ 1 = 9 10/9 = 1.11

4.14.4 The number of cycles for the (normal) 5-stage and the (combined EX/

MEM) 4-stage pipeline is already computed in 4.14.2. The clock cycle time is equal

to the latency of the longest-latency stage. Combining EX and MEM stages affects

clock time only if the combined EX/MEM stage becomes the longest-latency stage:

Cycle Time with 5 Stages

Cycle Time

with 4 Stages Speedup

a. 200ps (IF) 210ps (MEM + 20ps) (9 ´ 200)/(8 ´ 210) = 1.07

b. 200ps (ID, EX, MEM) 220ps (MEM + 20ps) (8 ´ 200)/(7 ´ 220) = 1.04

4.14.5

New ID Latency New EX Latency New Cycle

Time Old Cycle Time Speedup

a. 180ps 140ps 200ps (IF) 200ps (IF) (11 ´ 200)/(10 ´ 200) = 1.10

b. 300ps 190ps 300ps (ID) 200ps (ID, EX, MEM) (10 ´ 200)/(9 ´ 300) = 0.74

4.14.6 The cycle time remains unchanged: a 20ps reduction in EX latency has no

effect on clock cycle time because EX is not the longest-latency stage. The change

does affect execution time because it adds one additional stall cycle to each branch.

Because the clock cycle time does not improve but the number of cycles increases,

the speedup from this change will be below 1 (a slowdown). In 4.14.3 we already

computed the number of cycles when branch is in EX stage. We have:

Cycles with Branch in EX Execution Time (Branch in EX) Cycles with Branch in MEM Execution Time

(Branch in MEM) Speedup

a. 4 + 5 + 1 ´ 2 = 11 11 ´ 200ps = 2200ps 4 + 5 + 1 ´ 3 = 12 12 ´ 200ps = 2400ps 0.92 b. 4 + 4 + 1 ´ 2 = 10 10 ´ 200ps = 2000ps 4 + 4 + 1 ´ 3 = 11 11 ´ 200ps = 2200ps 0.91

Solution 4.15

4.15.1

a. This instruction behaves like a normal load until the end of the MEM stage. After that, it behaves like an ADD, so we need another stage after MEM to compute the result, and we need additional wiring to get the value of Rt to this stage.

b. This instruction behaves like a load until the end of the MEM stage. After that, we need another stage to compare the value against Rt. We also need to add an input to the PC Mux that takes the value of Rd, and the Mux select signal must now include the result of the new comparison. We also need an extra read port in Registers because the instruction needs three registers to be read.

4.15.2

a. We need to add a control signal that selects what the new stage does (just pass the value from memory through, or add the register value to it).

b. We need a control signal similar to the existing “Branch” signal to control whether or not the new comparison is allowed to affect the PC. We also need to add one bit to the control signal that selects whether the target address is PC + 4 + Offs or the register value.

4.15.3

a. The addition of a new stage either adds new forwarding paths (from the new stage to EX) or (if there is no forwarding) makes a stall due to a data hazard one cycle longer. Additionally, this instruction produces its result only at the end of the new stage, so even with forwarding it introduces a data hazard that requires a two-cycle stall if the ADDM instruction is immediately followed by a data-dependent instruction.

b. The addition of a new stage either adds new forwarding paths (from the new stage to EX) or (if there is no forwarding) makes a stall due to a data hazard one cycle longer. The instruction itself creates a control hazard that leaves the next PC unknown until the BEQM instruction leaves the new stage, which is two cycles longer than for a normal BEQ.

4.15.4

a. LW Rd,Offs(Rs) ADD Rd,Rt,Rd

E.g., ADDM can be used when trying to compute a sum of array elements.

b. LW Rtmp,Offs(Rs) BNE Rtmp,Rt,Skip JR Rd

Skip:

E.g., BEQM can be used when trying to determine if an array has an element with a specifi c value.

4.15.5 The instruction can be translated into simple MIPS-like micro-operations

(see 4.15.4 for a possible translation). These micro-operations can then be exe-

cuted by the processor with a “normal” pipeline.

4.15.6 We will compute the execution time for every replacement interval. The

old execution time is simply the number of instructions in the replacement interval

(CPI of 1). The new execution time is the number of instructions after we made the

replacement, plus the number of added stall cycles. The new number of instruc-

tions is the number of instructions in the original replacement interval, plus the

new instruction, minus the number of instructions it replaces:

New Execution Time Old Execution Time Speedup

a. 30 − (2 − 1) + 2 = 31 30 0.97

Solution 4.16

4.16.1 For every instruction, the IF/ID register keeps the PC + 4 and the instruc-

tion word itself. The ID/EX register keeps all control signals for the EX, MEM, and

WB stages, PC + 4, the two values read from Registers, the sign-extended lower-

most 16 bits of the instruction word, and Rd and Rt fi elds of the instruction word

(even for instructions whose format does not use these fi elds). The EX/MEM reg-

ister keeps control signals for the MEM and WB stages, the PC + 4 + Offset (where

Offset is the sign-extended lowermost 16 bits of the instructions, even for instruc-

tions that have no offset fi eld), the ALU result and the value of its Zero output, the

value that was read from the second register in the ID stage (even for instructions

that never need this value), and the number of the destination register (even for

instructions that need no register writes; for these instructions the number of the

destination register is simply a “random” choice between Rd or Rt). The MEM/WB

register keeps the WB control signals, the value read from memory (or a “random”

value if there was no memory read), the ALU result, and the number of the destina-

tion register.

4.16.2

Need to be Read Actually Read

a. R6, R16 R6, R16

b. R1, R0 R1, R0

4.16.3

EX MEM

a. −100 + R6 Write value to memory

b. R1 OR RO Nothing

4.16.4

Loop a. 2: LW R2,16(R2) 2: SLT R1,R2,R4 2: BEQ R1,R9,Loop 3: ADD R1,R2,R1 3: LW R2,0(R1) 3: LW R2,16(R2) 3: SLT R1,R2,R4 3: BEQ R1,R9,Loop WB EX MEM WB ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID *** EX MEM IF *** ID *** IF ***

b. LW R1,0(R1) LW R1,0(R1) BEQ R1,R0,Loop LW R1,0(R1) AND R1,R1,R2 LW R1,0(R1) LW R1,0(R1) BEQ R1,R0,Loop WB EX MEM WB ID *** EX MEM WB IF *** ID EX MEM WB IF ID *** EX MEM WB IF *** ID EX MEM IF ID *** IF ***

4.16.5 In a particular clock cycle, a pipeline stage is not doing useful work if it is

stalled or if the instruction going through that stage is not doing any useful work

there. In the pipeline execution diagram from 4.16.4, a stage is stalled if its name is

not shown for a particular cycle, and stages in which the particular instruction is

not doing useful work are marked in red. Note that a BEQ instruction is doing use-

ful work in the MEM stage, because it is determining the correct value of the next

instruction’s PC in that stage. We have:

Cycles per Loop Iteration

Cycles in Which All Stages Do Useful Work

% of Cycles in Which All Stages

Do Useful Work

a. 7 1 14%

b. 8 2

₀

4.16.6 The address of that fi rst instruction of the third iteration (PC + 4 for the

BEQ from the previous iteration) and the instruction word of the BEQ from the

In document JUAN PABLO MONTENEGRO ROSERO WILMAR JOHNNY SÁNCHEZ (página 34-38)