Hay que revisar el lenguaje, aprender las partes más importantes y cuidar el vocabulario que se va a utilizar.

Así, estas orientaciones son el inicio de un camino, buscan despertar la sensibilidad hacia la palabra y la propagación del gozo que encuentra toda persona que se da a entender y

10. Hay que revisar el lenguaje, aprender las partes más importantes y cuidar el vocabulario que se va a utilizar.

The writeback buffer (WBB) is an eight entry buffer used to store the 64-byte evicted dirty data line from the L2-cache. The replacement algorithm picks a line for eviction on a miss. The evicted lines are streamed out to the DRAM opportunistically. An instruction whose cache line address matches the address of an entry in the WBB is inserted into the miss buffer. This instruction must wait for the entry in the WBB to write to the DRAM before entering the L2-cache pipe.

The WBB is divided into a RAM portion, which stores the evicted data until it can be written to the DRAM, and a CAM portion, which contains the address.

The WBB has a 64-byte read interface with the scdata array and a 64 -bit write interface with the DRAM controller. The WBB reads from the scdata array faster than it can flush data out to the DRAM controller.

4.1.2.11 Remote DMA Write Buffer

The remote DMA (RDMA) write buffer is a four entry buffer that accommodates the cache line for a 64-byte DMA write. The output interface is with the DRAM

controller that it shares with the WBB. The WBB has a direct input interface with the JBI.

4.1.2.12 L2-Cache Directory

Each L2-cache directory has 2048 entries, with one entry per L1 tag that maps to a particular L2-cache bank. Half of the entries correspond to the L1 instruction-cache (icache) and the other half of the entries correspond to the L1 data-cache (dcache). The L2 directory participates in coherency management and it also maintains the inclusive property of the L2-cache.

The L2-cache directory also ensures that the same line is not resident in both the icache and the dcache (across all CPUs). The L2-cache directory is written in the C5 cycle of a load or an I-miss that hits the L2-cache, and is cammed in the C5 cycle of a store/streaming store operation that hits the L2-cache. The lookup operation is performed in order to invalidate all the SPARC L1-caches that own the line other than the SPARC core that performed the store.

The L2-cache directory is split into an icache directory (icdir) and a dcache directory (dcdir), which are both similar in size and functionality.

The L2-cache directory is written only when a load is performed. On certain data accesses (loads, stores and evictions), the directory is cammed to determine whether the data is resident in the L1-caches. The result of this CAM operation is a set of

match bits which are encoded to create an invalidation vector that is to be sent back to the SPARC CPU cores to invalidate the L1-cache lines. Descriptions of these data access are as follows:

■ Loads – The icdir is cammed to maintain I/D exclusivity. The dcdir is updated to reflect the load data that fills the L1-cache.

■ IFetch – The dcdir is cammed to maintain the I/D exclusivity. The icdir is updated to reflect the instruction data that fills the L1-cache.

■ Stores – Both directories are cammed, which ensures that (1) if the store is to instruction space, the L1 icache invalidates the line and does not pick up stale data; (2) if a line is shared across SPARC CPUs, the L1 dcache invalidates the other CPUs and does not pick up the stale data; and (3) the issuing CPU has the most current information on the validity of its line.

■ Evictions from the L2-cache – Both directories are cammed to invalidate any line that is no longer resident in the L2-cache.

The dcache directory is organized as sixteen panels with sixty-four entries in each panel. Each entry number is formed using the cpu ID, way number, and bit 8 from the physical address. Each panel is organized in four rows and four columns. The icache directory is organized similarly. For an eviction, all four rows are cammed.

4.1.3 L2-Cache Pipeline

This section describes the L2-cache transaction types and the stages of the L2-cache pipeline.

4.1.3.1 L2-Cache Transaction Types

The L2-cache processes three main types of instructions:

■ Requests from a CPU by way of the PCX

■ Requests from the I/O by way of the JBI

■ Requests from the IOB by way of the PCX

The requests from a CPU include the following instructions – load, streaming load, Ifetch, prefetch, store, streaming store, block store, block init store, atomics,

interrupt, and flush.

The requests from the I/O include the following instructions – block read (RD64), write invalidate (WRI), and partial line write (WR8).

performs diagnostic reads from the JTAG or the L2-cache, and it sends a request to a CPU by way of the CPX. The CPU bounces the request to the L2-cache by way of the PCX.

4.1.3.2 L2-Cache Pipeline Stages

The L2-cache access pipeline has eight stages (C1 to C8), and the following sections describe the logic executed during each stage of the pipeline.

C1

■ All buffers (WBB, WB and MB) are cammed. The instruction is a dependent instruction if the instruction address is found in any of the buffers.

■ Generate ECC for store data.

■ Access VUAD and TAG array to establish a miss or a hit.

C2

■ Pipeline stall conditions are evaluated. The following conditions require that the pipeline be stalled:

■ 32-byte access requires two cycles in the pipeline.

■ An I-miss instruction stalls the pipeline for one cycle. When an I-miss

instruction is encountered in the C2 stage, it stalls the instruction in the C1 stage so that it stays there for two cycles. The instruction in the C1 stage is replayed.

■ For instructions that hit the cache, the way-select generation is completed.

■ Pseudo least recently used (LRU) is used for selecting a way for replacement in case of a miss.

■ VUAD is updated in the C5 stage. However, VUAD is accessed in the C1 stage. The bypass logic for VUAD generation is completed in the C2 stage. This process ensures that the correct data is available to the current instruction from the previous instructions because the C2 stage of the current instruction completes before the C5 stage of the last instruction.

■ The miss buffer is cammed in the C1 stage. However, the MB is written in the C3 stage. The bypass logic for a miss buffer entry generation is completed in the C2 stage. This ensures that the correct data is available to the current instruction from previous instructions, because the C2 stage of the current instruction starts before the C3 stage of the last instruction completes.

C3

■ The set and way select is transmitted to scdata.

■ An entry is created in miss buffer for instructions that miss the cache.

C4

■ The first cycle of read or write to the scdata array for load/store instructions that hit the cache.

C5

■ The second cycle of read or write to the scdata array for load/store instructions that hit the cache.

■ Write into the L2-cache directory for loads, and CAM the L2-cache directory for stores.

■ Write the new state of line into the VUAD array (by now the new state of line has been computed).

■ Fill buffer bypass – If the data to service the load that missed the cache is available in the FB, then do not wait for the data to be available in the data array. The FB provides the data directly to the pipeline.

C6

■ 128-bits of data and 28-bits of ECC are transmitted from the scdata (data array) to the sctag (tag array).

C7

■ Error correction is done by the sctag (data array).

■ The sctag sends the request packet to the CPX, and the sctag is the only interface the L2-cache has with the CPX.

C8

■ A data packet is sent to the CPX. This stage corresponds with the CQ stage of the CPX pipeline.

In document aprendizaje-autonomo (página 78-81)