• No se han encontrado resultados

de evaluación, dónde aparece?

DEFINICIÓN DE LOS CONCEPTOS DEL PROCEDIMIENTO

The lfetch instruction requests that lines be moved between different levels of the memory

hierarchy. Like all hint instructions in IA-64, lfetch has no effect on program correctness, and any microarchitecture implementation of IA-64 may choose to ignore it.

9.3

Instruction Dependencies

Data and control dependencies are fundamental factors in optimization and instruction scheduling. Such dependencies can prevent a compiler from scheduling instructions in an order that would yield shorter critical paths and better resource usage since they restrict the placement of instructions relative to other instructions on which they are dependent.

In general, memory references are the major source of control and data dependencies that cannot be broken due to getting a wrong answer (if a data dependency is broken) or raising a fault that should not be raised (if a control dependency is broken). This section describes:

Background material on memory reference dependencies.

Descriptions of how dependencies constrain code scheduling on traditional architectures.

Section 9.4 describes IA-64 memory reference features that increase the number of dependencies that can be removed by a compiler.

9.3.1

Control Dependencies

An instruction is control dependent on a branch if the direction taken by the branch affects whether the instruction is executed. In the code below, the load instruction is control dependent on the branch:

(p1)br.cond some_label ld8 r4=[r5]

The following sections provide overviews of control dependencies and their effects on optimization.

9.3.1.1

Instruction Scheduling and Control Dependencies

The code below contains a control dependency at the branch instruction:

add r7=r6,1 // Cycle 0 add r13=r25,r27 cmp.eq p1,p2=r12,r23 (p1)br.cond some_label ;; ld4 r2=[r3] ;; // Cycle 1 sub r4=r2,r11 // Cycle 3

A compiler cannot safely move the load instruction before the branch unless it can guarantee that the moved load will not cause a fatal program fault or otherwise corrupt program state. Since the load cannot be moved upward, the schedule cannot be improved using normal code motion. Thus, the branch creates a barrier to instructions whose execution depends upon it. In Figure 9-1, the load in block B cannot be moved up because of a conditional branch at the end of block A.

9.3.2

Data Dependencies

A data dependency exists between an instruction that accesses a register or memory location and another instruction that alters the same register or location.

9.3.2.1

Basics of Data Dependency

The following basic terms describe data dependencies between instructions: Write-after-write (WAW)

A dependency between two instructions that write to the same register or memory location.

Write-after-read (WAR)

A dependency between two instructions in which an instruction reads a register or memory location that a subsequent instruction writes.

Read-after-write (RAW)

A dependency between two instructions in which an instruction writes to a register or memory location that is read by a subsequent instruction.

Ambiguous memory dependencies

Dependencies between a load and a store, or between two stores where it cannot be determined if the involved instructions access overlapping memory locations. Ambiguous memory references include possible WAW, WAR, or RAW dependencies. Independent memory references

References by two or more memory instructions that are known not to have conflicting memory accesses.

Figure 9-1. Control Dependency Preventing Code Motion

block A

block B br

9.3.2.2

Data Dependency in IA-64

The IA-64 architecture requires the programmer to insert stops between RAW and WAW register dependencies to ensure correct code results. For example, in the code below, the add instruction computes a value in r4 needed by the sub instruction:

add r4=r5,r6 ;;// Instruction group 1 sub r7=r4,r9 // Instruction group 2

The stop after the add instruction terminates one instruction group so that the sub instruction can legally read r4.

On the other hand, IA-64 implementations are architecturally required to observe memory-based dependencies within an instruction group. In a single instruction group, a program can contain memory-based data dependent instructions and hardware will produce the same results as if the instructions were executed sequentially and in program order. The pseudo-code below

demonstrates a memory dependency that will be observed by hardware:

mov r16=1 mov r17=2 ;; st8 [r15]=r16 st8 [r14]=r17 ;;

If the address in r14 is equal to the address in r15, uni-processor hardware guarantees that the memory location will contain the value in r17 (2). The following RAW dependency is also legal in the same instruction group even if software is unable to determine if r1 and r2 overlap:

st8 [r1]=x ld4 y=[r2]

9.3.2.3

Instruction Scheduling and Data Dependencies

The dependency rules are sufficient to generate correct code, but to generate efficient code, the compiler must take into account the latencies of instructions. For example, the generic

implementation has a two cycle latency to the first level data cache. In the code below, the stop maintains correct ordering, but a use of r2 is scheduled only one cycle after its load:

add r7=r6,1 // Cycle 0 add r13=r25,r27 cmp.eq p1,p2=r12,r23 ;; add r11=r13,r29 // Cycle 1 ld4 r2=[r3] ;; sub r4=r2,r11 // Cycle 3

Since the latency of a load is two cycles, the sub instruction will stall until cycle three. To avoid a stall, the compiler can move the load earlier in the schedule so that the machine can perform useful work each cycle:

ld4 r2=[r3] // Cycle 0 add r7=r6,1 add r13=r25,r27 cmp.eq p1,p2=r12,r23 ;; add r11=r13,r29 ;; // Cycle 1 sub r4=r2,r11 // Cycle 2

In this code, there are enough independent instructions to move the load earlier in the schedule to make better use of the functional units and reduce execution time by one cycle.

Now suppose that the original code sequence contained an ambiguous memory dependency between a store instruction and the load instruction:

add r7=r6,1 // Cycle 0 add r13=r25,r27 cmp.ne p1,p2=r12,r23 ;; st4 [r29]=r13 // Cycle 1 ld4 r2=[r3] ;; sub r4=r2,r11 // Cycle 3

In this case, the load cannot be moved past the store due to the memory dependency. Stores will cause data dependencies if they cannot be disambiguated from loads or other stores.

In the absence of other architectural support, stores can prevent moving loads and their dependent instructions: The following C language statements could not be reordered unless ptr1 and ptr2

were statically known to point to independent memory locations:

*ptr1 = 6; x = *ptr2;

9.4

Using IA-64 Speculation to Overcome Dependencies

Both data and control dependencies constrain optimization of program code. IA-64 provides support for two basic techniques used to overcome dependencies:

Data speculation Allows a load and possibly its uses to be moved across ambiguous memory writes. Control speculation Allows a load and possibly its uses to be moved across a branch on which the

load is control dependent.