• No se han encontrado resultados

2.2 Mariposa Monarca (Danaus plexippus (Linnaeus, 1785))

2.2.4 Ciclo de vida y aspectos de la biología de Danaus plexippus (Linnaeus, 1758)

3.2

The official release version of the gem5 simulator

The gem5 simulator has two main components: the CPU and the memory. Each of them can be sub-divided into many other blocks. This model allows the user to choose from several options, shown in Figure 3.1. At the top is the CPU module and it is composed of the ISA and the CPU model. Below is the memory system, which can be chosen to act as a classic memory system or as the detailed (Ruby) memory system. It is only with the Ruby memory system the user is able to choose between the NoC options. The reason for this is that, in gem5, the NoC is considered to be part of the memory system and, in particular, part of the Ruby memory system. Further information on each of these modules is given in the following subsections.

It is important to mention that gem5 is a relatively new simulator [7], so it is currently not completely finished. Therefore, some of the combinations that the simulator will allow in the future are not ready at the present time. The gem5 website [27] contains a status matrix displaying the current state of development of the simulator.

3.2.1

CPU module

The CPU module contains two main areas: (1) the ISA and (2) the CPU model. Gem5 can work with 6 different ISAs: ARM, ALPHA, X86, SPARC, Power PC, and MIPS. The user can also choose the level of detail used to model the micro-architectural aspects of the CPU. As shown in Figure 3.1, there are two levels of detail: simple and detailed. Furthermore, there are two CPU models for each level of detail. It should be noted that a higher level of detail entails a longer simulation time.

3.2.2

The memory system

The memory module behaves as the CPU. The simulator allows users to choose between two memory systems: Classic (simple) and Ruby (more detailed), Ruby takes much more simulation time than Classic, but it also has much more detail and versatility. It is important to remark that Classic already has a prefetching module implemented with several prefetching engines, but this memory system cannot simulate the NoC. Therefore, all our modifications have been carried out using Ruby (the detailed memory system).

The internal structure of Ruby is shown in the memory model box in Figure 3.1. Ruby is a complete memory simulator: it includes the cache modules, the coherence controllers, and the interconnection network between the different tiles of the system, details of which will be discussed in section 3.2.3. Ruby can model inclusive/exclusive cache hierarchies with

Fig. 3.1 Internal structure of gem5, composed of two main modules: CPU and Memory.

various replacement policies, coherence protocol implementations, interconnection networks, DMA, memory controllers, and various sequencers (initiating memory requests and handling responses). Ruby is implemented as a combination of different modules, making it flexible and configurable. There are three main modules: the implemented C/C++ classes, the network, and the protocol. Ruby also allows any aspect related to the memory hierarchy functionality to be configured, while also enabling any modification in the memory controller or memory protocol to be made by means of the SLICC code without modifying anything in the implemented C++ classes.

3.2 The official release version of the gem5 simulator 39

Fig. 3.2 High level view of the connections between the SLICC state machine and network in/out ports.

This code, SLICC (Specification Language for Implementing Cache Coherence), is a domain specific language that is used for specifying cache coherence protocols. In essence, a cache coherence protocol behaves like a state machine and SLICC is used for specifying the behavior of the state machine. Since the aim is to model the hardware as closely as possible, SLICC allows the user to impose specifiable constraints on the state machines. For example, SLICC can impose restrictions on the number of transitions that can take place in a single cycle. Apart from protocol specification, SLICC also combines some of the components in the memory model. As can be seen in Figure 3.2, the state machine takes its input from the input ports of the interconnection network and queues the output at the output ports of the network, thus, tying together the cache / memory controllers with the interconnection network itself. The official release version of gem5 already has 6 full functional protocols implemented in SLICC (MI example, MESI CMP directory, MOESI CMP directory, MOESI CMP token, MOESI hammer, and Network test). We have focused on the MOESI CMP directory protocol, which is probably the most complete. Thus, this is the protocol we have modified to add prefetching support. Full details on this protocol and the others implemented can be found in [25].

In the Ruby module of the implemented C++ classes, four independent components can be distinguished: the Sequencer, the Cache Memory structure, the Cache Replacement Policies, and the Memory Controller. The Sequencer class is responsible for feeding the memory subsystem (including the caches and the off-chip memory) with load/store/atomic memory requests from the processor. Every memory request, when completed by the memory subsystem, also sends back the response to the processor via the Sequencer. There is one Sequencer for each hardware thread (or core) simulated in the system. The Cache Memory can model set-associative cache structures of variable size, associativity, and replacement policies. L1, L2, and L3 caches in the system (where applicable) are instances of Cache Memory. The Cache Replacement policies are kept modular, separate from the Cache Memory, meaning that different instances of Cache Memory may use different replacement policies. Currently, the release version has two replacement polices (LRU and Pseudo-LRU).

The Memory Controller is responsible for simulating and servicing any request that misses on any of the on-chip caches of the simulated system. The Memory Controller is currently simple, but models DRAM bank contention and DRAM refresh faithfully. It also models a close-page policy for the DRAM buffers.

3.2.3

Interconnection network

As with the previous modules, in the network module, several configurations can be chosen. As can be seen in Figure 3.1, two network options can be selected: (1) a simple network or (2) GARNET (a detailed network). Both of them allow the user to model any network topology. Some of the most common network topologies are already implemented, but the user can easily model any new topology using the network files. The main difference between the two options is again the level of detail. GARNET, in addition to the capabilities of the simple network, can accurately model contention and router resource utilization, and can generate statistics on the power consumption in the network, among other things. In the detailed GARNET configuration, each router in the network works as a 5-stage pipelined virtual channel router, fully modifiable by the user.

Figure 3.3 shows the elements of this typical virtual channel router and Figure 3.4 shows the common 5-stage pipeline stages which the routing process is divided. The most important logics from the router are the following ones:

Fig. 3.3 Design of the 5-stage virtual channel router implemented in gem5.

3.2 The official release version of the gem5 simulator 41

• The input units: The most important elements that compose these units, are the input port and the Virtual Channels (VC). The input port is connected to the output ports of other routers or tho the Network Interface (NI) of the cache controllers. And the VCs buffer the flits that are waiting to be processed.

• Route computation logic: Whenever a flit is injected in the router, this logic uses its routing tables to calculate the output port of this flit.

• VC and switch allocator logic: These logic modules uses the prioritization policies to select the flits that will traverse the router each cycle.

• Crossbar Switch: Each cycle this switch connects the input units selected by the VC and switch allocator logic to the output ports.

• Output ports: The output ports are the links that connect this router to the input units of the neighbor routers.

Moreover, each of these units are directly related to one of the pipeline stages. In the next points we explain the behavior in these stages:

• Buffer Write and Route Compute (BW/RC): In this stage, the incoming flit is buffered to the determined VC and the route computation logic computes its output port.

• VC Allocation (VA): In this stage, the buffered flits in the VCs, allocate for the VCs in the next routers. This process is done in two steps:

1. Each of the input VCs is associated to an input VC in the next router. To build this association, the input VC in the next router must have credits enough to fit the flit that is going to be send. If more than one channel is available for this flit, the prioritization policy decides the VC.

2. The VCs in the next routers with more than one associated input VC, breaks conflicts through the prioritization policy.

• Switch Allocation (SA): In this stage, the router will decide for each output port, which input unit is going to use it. As in the previous stage, this process is done in two steps:

1. Firstly the prioritization policy will chose, for each input unit which of its VC is going to be candidate to traverse the crossbar switch. To be candidate, the VC

must have at least one flit to send and this flit must have allocation in one VC in the input unit of the next router (resolved in the previous stage).

2. As the result of the previous step, an output port can be associated to one or more candidates. In this step, the prioritization policy will break conflicts for each output port.

• Switch Traversal (ST): In this stage, the selected flits will traverse the switch crossbar. • Link Traversal (LT): In this stage, the flits coming from the crossbar will traverse the

links to reach the next routers.

Documento similar