• No se han encontrado resultados

4.2 Comentarios, conclusiones y recomendaciones

4.2.2 Comentario de Auditoria de Gestión

Figure 54: The simulator function unit pipeline has three stages. The Fetch and Wte stages are one cycle each. The Execute stage shownis one cycle but it can be specified by the user. Results are bypassed from one Execute stage to the next so that one thread can issue an operation on every cycle.

the same cycle may complete on different cycles. For any given thread, a operations in an instruction are fetched simultaneously and only after all operations from the previous instruction have issued. There are no branch delay slots and a branch operation stalls the branching thread until the target address is available. Other threads may continue to execute.

5.1.4 lunction Unit Models

As shown in Figure 55, the simulator's representation of a function unit contains an ALU pipeline, a result register, an operation cache, and the functions to select and sequence operations from active threads. The thread control functions access all of the active thread data structures to determine which operation to issue to the pipeline. ALU results are stored in the Result Register while waiting to be sent to the interconnection module.

During the fetch stage, a function unit fetches operations from each thread that has successfully issued all operations from its previous instruction. Operations are retrieved from the operation cache and placed in the operation buffer.

During the execute phase a function unit examines each active thread's operation buffer.

If an operation that has 'Its register requirements satisfied is found, it 'is issued to the execution pipeline. If no valid operation if found, a nop is sent 'instead. The pipeline then advances one stage. Threads are prioritized by time of creation so that the first thread always has the best chance of issuing an operation.

-A-LU-Pipeline

Result Register

Cluster Interconnection Network Module

Figure 5.5 Smulator function unit model. Each function unit has an operation pointer and an operation buffer for each thread. The register file for a thread 'is shared with other function units in the cluster. The operation cache is shared by a active threads.

In the register write phase, the function unit attempts to send 'Its result to the appro-priate register file. If the register write fails, then the function unit stalls util the resources to perform the write can be obtained on a later cycle. Register writes that succeed can also bypass the register file so that operations requiring the data can issue immediately.

The compiler guarantees that each function unit will execute only appropriate opera-tions. Function unit types are int, f loat, mem, MOV, and branch. Each of these unit is

described below.

9 Arithmetic Units: The integer and floating point units (int andf loat) execute arith-metic and logical operations. In addition, the imov (integer unit move) and fmov (floating point unit move) operations aow an operand to pass through the ALU unchanged so that it can be transferred to another cluster.

Unit Control I I

Shared Operation

Cache

i

I

I

rrn_ -Anra-a-d- 0 Operation

P inter

Buffer Register

Ffie

0 * 0

1 1

Thread N

Operation Pointer Operation

Buffer Register

File I

64 CHAPTER 5. PROCESSOR COUPLING SIMULATION

0 Memory Unit: The memory Unit (mem) issues requests to the memory system and sends the results to the Cluster Interconnection Network. Only simple addressing modes which can be expressed by adding two operands are aowed. Operands may be imediate constants or they may reside in registers.

* Move Uit: The move unit (mov) 'is responsible for transferring data between different clusters' register files. The compiler can be directed to use imov and fmov operations in the arithmetic units instead, rendering the move unit unnecessary

* Branch Unit: The branch unit (branch) executes flow of control operations such as conditional and unconditional branches as well as thread control operations such as fork and exi't. Branch operations cause the other function units to stall until the target address is available. Optirm'zations such as branch prediction or delay slots are not used.

5.1.5 Cluster Communication

The function units request register file ports and bus access from the Cluster Interconnection Network (CIN) module which arbitrates when conflicts occur. Different configurations can be selected to explore the effects of restricting interconnection bandwidth and the number of register ports between clusters. After arbitration, the CIN module sends acknowledgments to those units which are granted use of the buses and ports; it then routes the data to the appropriate register files. In some network configurations a of the function units may write results simultaneously, while more restrictive schemes require blocked units to stall until the ports or buses become available.

The communication specifier in the PCS configuration file consists of four fields: in-terconnection type, number of global buses, number of total register ports, and number of local register ports. The number of remote transfers that may take place simultaneously is limited by the number of buses. The number of register ports specifies the total number of wte ports for a register file. The number of local register ports indicates how many of the register file write ports are reserved for use within the cluster. Figure 56 shows a configuration with two local buses, four register ports, and two locally reserved register

n Cluster

I

-Punctioi

Units

- -1 --- ---- -I -Local Register Write Ports

Register Write Ports

v

IS1-41-gister File

A a I

I

I

Global Buses

I

Figure 56: Interconnection network buses and ports. Wtes to the local register file can use any of the register file ports. Remote writes may not use the local register write ports.

ports.

The three main interconnection types each have their own arbitration functions. The type full specifies that all function units are fully connected and there are no shared re-sources. The type input indicates that register writes between function units must compete for register file ports but not for global buses. The type sharedbus is smilar to input ex-cept that writes must compete for access to shared buses and register file ports. Function units have identification numbers generated from the configuration file and request resources from the CIN in that order.

5.1.6 Memory System Model

The memory system module interface consists of a request and reply port for each memory unit. On a memory read, a memory unit sends an address to the request port. Some number of cycles later the result 'is returned on the reply port of the memory unit in the destination cluster. The destination cluster may be local or remote. On a memory write, a memory unit sends both address and data to the memory system via the request port;

CHAPTER 5. PROCESSOR COUPLING SIMULATION

Documento similar