Estado Actual de las Construcciones - UNIVERSIDAD DE SEVILLA

In [20], with the apparition of Network-on-Chip (NoC) as new approaches for high throughput and scalable design of Multi-Processor Systems-on-Chip (MPSoS), FIFO buffers stand as an important component for asynchronous design, namely GALS.

A Muller pipeline is structurally simple, consisting of a simple C element and an inverter gate, once full only every other stage stores data and each stage should tightly interact with neighbouring stages.

A fully-decoupled latch controller can be introduced so that all stages store data when the pipeline is full and each stage handshaking on the input channel completes without any interaction with the output channel. But, having a more complicated controller reduces throughput and slows down the handshaking.

Such pipelines require combinational logics between stages. However, in a FIFO data is stored without processing, furthermore data is only read and written on a single stage. The result is less power consumption in comparison with a pipeline.

An option for a FIFO implementation is a modular FIFO which follows the asP* handshaking protocol and is synthesizable by standard cell logics without any particular asynchronous cell. However, it requires some timing considera- tions due to not following the conventional 4-phase handshake protocol.

Another option is a FIFO based on a fully-decoupled pipeline. The FIFO exhibits the throughput of a fully-decoupled FIFO in addition to some multiplex- ing and demultiplexing penalties.

It’s of note that all presented FIFOs are all self-timed.

Furthermore, [21] proposes a FIFO architecture consisting of a RAM, read and write pointers, full and empty detectors and input and output handshake

controllers. This design completely follows the 4-phase bundled data handshake protocol.

Data is written into the RAM in the slot indicated by the write pointer while the RAM is not full. An ack signal is sent when the RAM is not full, and the request is asserted, additionally the full and empty detectors are updated. When the RAM is not empty a request is sent to the output port indicating new data is available.

Memory is implemented with latches and pass-transistors. However, it could be replaced by a memory plane to increase performance. Addresses are one-hot, which discards the need for an address decoder. Handshake controllers consist of an asymmetric C-element and n AND gates for input channel. Once a request is asserted the signal is then blocked until the FIFO has free space. When free space is available and ack is zero it enables the load signal regardless of the request signal. FIFO requests to send out its data when it is not empty once the ack signal is lowered. Upon reset the read and write pointers should be zero as to point to the first memory slot. When both pointers are the same it indicates that the FIFO is empty or full. To distinguish between full and empty, one extra bit is added to both pointers and they toggle whenever their pointers circulate and become zero again. Since the write pointer is triggered on the ack’s rising edge the not-full flag can be evaluated before the ack signal lowers. However, if the FIFO is full the detector should not change state until the read operation ends, in order to avoid read/write races.

Through accurate SPICE simulation the proposed FIFO and some other de- signs, the aforementioned Muller, Fully-Decoupled, asP* based and Domino- controlled, were simulated with the intent to evaluate them. Verilog was used at gate level for describing some available FIFOs that were then used as buffers.

All of the FIFOs require elaborate and engineering timing assumptions for reliable operation. No wire delays were had into account. A random Galois LFSR sequence of data words are used as to obtain results that don’t depend on certain inputs.

All FIFOs were made from the same library and the conditions were iden- tical. Results for comparisons should therefore be reasonable. Furthermore, dif- ferent depths were tested.

The resulting throughputs seem to be sorted by complexity were the high- est throughput is achieved by the simpler FIFO design, this is, the proposed FIFO. Only the Muller and the Fully-Decoupled pipelines maintain throughput with the increase of depth. However, the Muller and the Fully-Decoupled pipelines have a sharp increase in energy per word with the increase of depth, Domino’s energy consumption is much lower than the remaining due to a lack of Flip- Flops.

The proposed FIFO also seems to have a lower latency than all the others in must conditions, however if the FIFO is empty the Domino-controlled will have a slightly less latency.

For further analysis, the proposed FIFO and the Domino-controlled are em- ployed as buffers in routers of a 4x4 mesh network. The NoC used is fully asynchronous based on QNoC and ASPIN using two cascaded Flip-Flops to connect synchronous IP cores to the asynchronous network.

Each router addressable with a two-dimensional number contains five ports connecting it to its neighbours and to the local IP cores. To route packets between ports the asynchronous NoC uses distributed X-First algorithm to guarantee the in-order-delivery property. Furthermore, each packet is divided in flits and when a header flit of a packet is received the packet is forwarded to the corresponding output port.

Parameters from the SPICE library are extracted and back-annotated in a Verilog HDL library.

The router is modelled in gate level using the mentioned library and the 4x4 mesh network is constructed by behavioural model local IP cores.

Both the proposed FIFO and the domino are modelled and used as buffers of the routers’ input ports. FIFOs are 5-stages deep with 34bit data word length. The results show that the proposed FIFO exhibits less packet latency and more network saturation threshold compared to its Domino controlled counterpart.

In document UNIVERSIDAD DE SEVILLA (página 126-132)