• No se han encontrado resultados

Materialidades y procesos de elaboración

In document Ornella Agustina Petriette (página 47-53)

Capítulo 3. La Industria Textil y de Indumentaria en Argentina

3.3. Materialidades y procesos de elaboración

2.1 history of plds

Apart from the large variety of ‘common’ processor alternatives in today’s computers like x86 or ARM multi-core CPUs with 32- or 64-bit characteris- tics, GPUs or (for pure computing purposes) so called GPGPUs, FPGAs are becoming more and more popular in different fields of computing.

In the early 1970s, programmable logic devices (PLDs) entered the market and extended chip designs (and consequently circuit boards) towards flexi- bility with dynamically configurable elements instead of using solely combi- nations of fixed logic-gates. Unlike the usual application-specific integrated cir- cuits (ASICs) or integrated circuits (ICs) in general which contain a predefined variety of logical functions, PLDs provide the possibility (and necessity) to de- fine (and redefine) a chip’s behavior within the system after its fabrication. A simple PLD can be seen as a programmable box that creates a user-defined output for every input-combination in a specified and reconfigurable way. The ‘size’ of the box along with the number of connections in and out the chip constrain the amount of possible functions. As an alternative to logic based implementations, such a functionality can (and has) statically also been realized by much slower ROM-based approaches before. Due to the fact that the final definition of the chip takes place outside of the factory in a produc- tive environment, such devices introduced the so called ‘fabless’ semiconduc- tor industry. Thus, it is, for example, possible to use a PLD within a system for different demands at different times instead of integrating all necessary logics in separate chips in the system at fabrication time.

Computing systems (or logic gate systems in general) of today still are, in simple terms, machines that produce binary outputs based on binary inputs. Hence, they are physical devices implementing and combining Boolean func- tions. The most basic logic elements are gates with one or two binary inputs and one binary output. The ‘standard’ logic gates are namely the NOT, AND, OR and XOR gates (as well as their logical combinations NAND, NOR and XNOR). By connecting these gates to a system of gates, more complex log- ical functions can be implemented in hardware. The process of combining hundreds, thousands or even hundred thousands of logic gates is covered by the terms large-, very-large- or ultra-large-scale integration (LSI, VLSI, ULSI). Anal- ogously, transistor counts of thousands, hundred thousands or millions/billions per chip refer to the same terms.

PLDs are able to carry out specialized tasks (that may vary over time) while still implementing them in hardware. Thus, they can be more ‘effi- cient’ (with more predictable behavior) than very general processing units (like CPUs) in various ways but they are likewise, in general, less efficient than extremely specialized hardwired elements like ASICs. This holds true at least as long as the desired task is ‘relatively simple’. Efficient in this context can,

2.1 history of plds

for example, mean that a minimum amount of hardware is involved on the chip to process a requested operation or that as little power as possible is consumed. On the one hand, this generally comes (for PLDs) at the price of a lower circuit speed compared to fixed fabricated logic (like an ASIC) as such reconfigurable logic elements cannot be packed as densely and work as fast as their static counterparts. On the other hand, a function in a PLD can be realized and changed without fabrication and within the actual system. Early works like the one of Brown et al. [26] already examined the advantages and drawbacks of Field Programmable Gate Arrays (FPGAs, see Section 2.2), a special class of PLDs which is in the focus of this work. Later works like, for example, the one of Kuon and Rose [117] from 2007 compared ASIC designs’ efficiencies in many details to the one of such FPGAs. The authors compared both hardware types from a 90-nm production generation in terms of logic density, circuit speed and power consumption. Their experimental results showed that the gap between ASICs and FPGAs was still very sig- nificant, for example, concerning needed area for the logic (a factor of 35 was reported to give an idea of the magnitude) or concerning speed (with a summarizing factor of 4). This gap is nowadays more and more closing with heterogeneous FPGAs (see Section2.2.5) and with upcoming FPGA frequen- cies of up to 500 MHz (cp. Lim [134]) enabled by 28-nm production and other technological progress.

Besides the great potential due to the reconfigurability in productive sys- tems, PLDs also offer the possibility of easy and cheap hardware prototyping of chips instead of only simulating their behavior in the construction phase or fabricating actual hardware prototypes. The decision whether to eventu- ally produce ASICs with the functionality of the final PLD prototype or to use a PLD even in the resulting product, potentially the same PLD that was used for prototyping (as it is a common practice for different PLD-driven developments like network routers, modems, DVD players or automotive navigation systems) is not only governed by the option of later reconfigura- bility of the system. It is often also a question of production costs. As ASIC designs have to be fabricated with very high initial and fixed costs (a factory has to be consulted), it is only worth the effort (and the money) if a very large number of chips is finally produced. For specialized products, this is often not the case and an ASIC design could therefore drastically increase the production costs. Instead, a PLD chip can be used for many different ap- plications and can therefore be fabricated and sold in large quantities what can consequently make them cheaper (per unit).

One further advantage of PLD technologies compared to ASICs is a dras- tically reduced time-to-market. A chip that has finally been prototyped can simply be cloned to prefabricated PLDs in a very short time compared to the time that would be needed to produce the respective ASICs. Furthermore,

the possibility to update a chip’s design after delivery is an advantage for cer- tain applications, even when being incorporated in ordinary products like those mentioned before.

Except for the already mentioned FPGAs (which will be further inves- tigated in the remainder of this work), complex programmable logic devices (CPLDs) are the second market dominant type of larger reconfigurable logic today. Smaller units (with only several hundreds of logic gates) are, for instance, programmable array logics (PALs) which are, in general, one-time- programmable or only very difficultly reprogrammable. Their reconfigurable equivalents are generic array logics (GALs) that are, hence, often used for PAL prototyping.

2.1.1 CPLDs and FPGAs

CPLDs are, compared to FPGAs, made of a relatively simple and homoge- neous structure consisting mainly of a configurable matrix of AND- and OR- gates combined with a very small number of flip-flops to store states. The ma- trix is accessible through a large number of in- and output pins from the out- side of the CPLD whereas these pins are often the only elements ‘equipped’ with a (single) flip-flop. The large amount of in- and outputs predestinates CPLDs to be used in a highly parallel manner. Instead of describing the logic directly by connecting logic gates ‘manually’, higher level languages are of- ten used to describe the functionality in an abstract way. The translation from such an abstraction level into the actual netlist (logic description as a circuit diagram) is called synthesis and is, in general, the initial step of a compilation workflow (cp. Section2.4) from a more or less abstract description language into an actual hardware description. Prior to the actual translation into hard- ware, the behavior of the later chip can be simulated in the chip-specific software environment. However, the more complex a chip is and the more degrees of freedom for the actual implementation exist, the more difficult is any prediction. Due to the rather homogeneous structure and also due to a very simple routing architecture, the timing of a CPLD is relatively easy to predict. Inter-logic delays are small and the overall timing is quite con- sistent for several compilation runs of the same functionality from a higher description language.

A further difference between FPGAs and CPLDs is that the latter use elec- trically erasable programmable read only memory (EEPROM memory) to store the configuration of the chip (and on the chip) while the former ones of- ten use static random-access memory (SRAM). One core advantage of using EEPROM (e. g., flash memories are a subdivision of EEPROMs) is that a CPLD is ready to use just after powering it up. An SRAM-based FPGA con-

2.1 history of plds

figuration instead is volatile and needs to be loaded from an external mem- ory (sometimes also an EEPROM) to the FPGA’s SRAM-cells in the boot- ing process. EEPROMs generally have the disadvantage that the number of erase/write-cycles is rather limited as erasing degrades the oxide barrier on the silicon which at some point may lead to failures. This is, for example, de- scribed in the work of Buitenkamp [40]. The article primarily presents a soft- ware technique to extend the operational life of EEPROMs. However, hard- ware advancements could accomplish the same in the future. This technical difference between common CPLDs and FPGAs has already been overcome by some manufacturers providing flash based FPGAs. Such developments only became possible by decreasing the size of flash cells to maintain the relatively high logical density of FPGAs.

Remark 1. Apart from electrically erasable programmable read only memories, there are also EPROM memories that can be erased by ultra-violet (UV) light. However, such EPROMs are not really practicable when a reconfiguration is fre- quently required.

Due to these characteristics, CPLDs are the right choice for relatively sim- ple use cases like critical control applications or generally simple pure combi- natorial designs like glue logics to basically combine/connect other resources of the chip, see Greaves and Nam [78]. This is especially true if the func- tionality will probably not change too often while the system is frequently rebooted and a processing of the chip is desired directly after power up. Due to their simple structure, CPLDs also require only extremely low amounts of power, what is especially important in battery-operated systems. Finally, CPLDs are relatively cheap.

FPGAs (in contrast to CPLDs) base on lookup tables (LUTs) as their prin- cipal building blocks instead of simple logic gates. A LUT with k binary inputs consequently has 2k possible input constellations while the output

for each of these can be specified by the SRAM table. Just like a CPLD, an FPGA needs in- and outputs to communicate with the rest of the system. In addition to these basic elements, FPGAs contain a relatively large number of flip-flops and modern architectures provide more and more heterogeneous on-chip elements such as hardwired processor cores, dedicated random-access memory, digital signal processing elements (DSPs) including multipliers, various clock management systems and support for advanced device-to-device signaling technologies.

Even though the processing time of signals in a CPLD before the actual im- plementation in hardware (resp. in a simulation) is easier to predict than for the more complexly structured FPGAs, final hardware implementations of both systems have great advantages in terms of predictability compared to ordi- nary computing architectures like CPUs as these include many mechanisms

that are difficult to predict, like, for example, caches with hard- and software prefetchers.

Due to their mentioned characteristics, FPGAs are applied in a wide vari- ety of applications like networking hardware, data processing and storage, general instrumentation, telecommunication systems or even as hardware- configured digital signal processors.

Today, FPGAs are not any more only available as expensive niched special- purpose hardware. Embedded in so called Systems-on-a-Chip (SoCs), several FPGA manufacturers offer ‘all-in-one’ solutions which are dedicated espe- cially towards early development and research. These boards often contain a central processing unit (e. g., an ARM CPU), potentially along with other specialized processing elements, memory regions, periphery, graphic proces- sors, audio and further interfaces. The two most dominant manufacturers of FPGAs over the last years have been Xilinx and Altera123(part of Intel since 2015), both providing such relatively cheap development boards especially for researchers in addition to their ‘professional’ products. Other vendors of PLDs in general are Lattice Semiconductor (Vantis (AMD)), Microsemi (Actel), Quicklogic, Lucent, Cypress or Atmel. Altogether, the market of PLDs is con- stantly growing and the role of FPGAs has become even more important in the recent past4.

2.2 field programmable gate arrays

As already stated in the previous section, FPGAs form a special class of PLDs which is moving more and more into the field of ‘mainstream’ accelerators due to their wide applicability and improved programmability. An FPGA’s main feature is its reconfigurability. This can be achieved by the use of differ- ent programmable hardware elements. Many popular FPGA architectures are configurable by SRAM-cells as against EEPROMs (flash memories) on regular CPLDs. However, some manufacturers base their FPGAs on flash memory or on antifuses even though the latter class only allows for a onetime configura- tion of the system. They could therefore more precisely be called configurable instead of reconfigurable. Flash-based FPGAs are technically difficult to re- alize compared to SRAM-based architectures because SRAM-based FPGAs can achieve a much higher density on the chip. Still, as mentioned before, flash-based (as well as antifuse-based) chips are ready to operate directly af-

1http://sourcetech411.com/2013/04/top-fpga-companies-for-2013/(accessed 19 April 2016)

2http://hackaday.com/2015/08/24/two-new-fpga-families-designed-in-china/ (accessed 19

April 2016)

3http://www.fpgadeveloper.com/2011/07/list-and-comparison-of-fpga-companies.html (ac-

cessed 19 April 2016)

4http://www.dailytech.com/Why+Intels+Massive+167B+USD+Plan+to+Purchase+Altera+Makes+

2.2 field programmable gate arrays

ter powering up and have an exceptionally low power consumption. SRAM- based FPGAs are volatile and therefore need separate memory to load the configuration from on start up. This memory can in turn be a flash memory in- or outside the chip or even an external configuration memory such as a harddisk drive or the like. It depends on the circumstances of the use case what kind of an architecture is the best choice.

Remark 2. For the approach presented in this work, it does not matter how the reconfigurability is achieved. Even though the explanations in the following sections assume SRAM, this does not play a role for the introduced model at all. Performance indicators on which it would have an influence (e. g., in the timing prediction model) are provided by external pieces of software, for example, VTR/VPR (see Section2.3) in the later benchmark sections.

2.2.1 Operating principle

Remark 3. The figures in the following section are partially based on schematics from the book ‘Architecture and CAD for Deep-Submicron FPGAs’ by Betz et al. [21] to preserve good comparability for any reader who wants to dive deeper into

FPGA details with the cited book. The equations are also partially related to the book as the described framework in the book has been the basis for the framework in which the methods of the presented work are finally implemented and benchmarked.

The most basic and important elements of an FPGA are the lookup tables (LUTs). A k-input lookup table (k-LUT) is a small memory element with k (binary) inputs and one (binary) output. The table can be programmed arbitrar- ily by specifying a desired output for each of the 2k input combinations.

Thus, 2k cells of (e. g., SRAM) memory are required and such a table can

be programmed into 22k

possible states. Figure3ashows a 2- and a 4-LUT together with one possible configuration for each. The 2-LUT in Figure3ain fact implements the same functionality as an ordinary OR-gate. However, a simple 2-LUT can implement even more functions than the elementary logic gates which have been named in the previous section (see Figure1, the sym- bols are settled in ANSI/IEEE Std 91-1984 [1]).

LUTs are, in general, the dominant operating logic elements of an FPGA architecture.

An important characteristic of FPGAs is that designs implemented into them are generally ‘synchronous’. When the FPGA is operating, a clock (or several clocks) are driving the transition from one state to the next. Hence, the processing of a signal from one point of the design to a subsequent one is aligned to the underlying clock by flip-flops (FF), see Figure3b. A flip-flop (or a latch) is a bistable multivibrator. Thus, it can hold two stable states. Flip- flops can be clock-driven to form a simple 1-bit storage element in sequential

In[0]

In[1]

Out

In[0] In[1] AND OR N AND NOR XOR XNOR MISC. . .

0 0 0 0 1 1 0 1 0. . .

0 1 0 1 1 0 1 0 1. . .

1 0 0 1 1 0 1 0 0. . .

1 1 1 1 0 0 0 1 0. . .

2-LUT

Figure 1: Lookup Table - 2-LUT possibilities

logics. If so, the input data is latched, stored, and the output data is refreshed in each clock-cycle. Due to the insertion of flip-flops into a design, no ad- ditional logic is added but the current state is stored and the execution of different signals passing the chip are harmonized. This also helps to equalize uncertainties (resp. unpredictabilities) or, in general, small variations in the timing. Section2.2.2will discuss the use and importance of flip-flops to steer the timing of an FPGA chip in more detail.

4-LUT D Q FF M U X In[0] In[1] In[2] In[3] clock Out

Figure 2: Basic Logic Element (BLE)

Figure2illustrates a basic logic element (BLE) which is a combination of the mentioned components. To align such a BLE with the clock, the flip-flop can be used to store the result from the LUTs output. However, if no syn- chronization is needed at this point of the logic, the LUTs output (Out) can directly be connected bypassing the flip-flop. This option is realized by the use of a (programmable) multiplexer (MUX). A (k : 1)-MUX can dynamically connect any of its k inputs to the output of the multiplexer. The MUX can be programmed (e. g., through an SRAM cell of size log2(k)) to enable one

of the two strategies (with or without the flip-flop) for each BLE in the circuit. Since a flip-flop is able to store one bit, a so called register of size r (group of r flip-flops) is able to store r bits. The fact that a flip-flop needs a certain amount of time to ‘transport’ the input data to the output introduces a delay called the propagation delay. In an FPGA design, flip-flops are used for a num-

2.2 field programmable gate arrays

ber of purposes. The obvious one is to simply preserve a state within the circuit, e. g., for the synchronization (and combination) of multiple signals. In Section2.2.2it will be shown how the activation (or bypassing) of flip-flops by the multiplexers in BLEs can influence the maximal possible clock-speed of the circuit, e. g., by pipelining the design flow.

In[0]

In[1]

Out

In[0] In[1] Out

0 0 0 0 1 1 1 0 1 1 1 1 2-LUT In[0] In[1] In[2] In[3] Out In[0] In[1] In[2] In[3] Out

0 0 0 0 0 0 0 0 1 1 0 0 1 0 1 0 0 1 1 1 0 1 0 0 0 0 1 0 1 0 0 1 1 0 1 0 1 1 1 0 1 0 0 0 1 1 0 0 1 1 1 0 1 0 0 1 0 1 1 0 1 1 0 0 1 1 1 0 1 1 1 1 1 0 0 1 1 1 1 1 4-LUT

(a) Lookup Table (LUT)

D

Q

FF

(b) Flip-Flop (FF) In[4] In[3] In[2] In[1] In[0] Out (c) Multiplexer (MUX)

Figure 3: Main FPGA building blocks for CLBs

Finally, all these basic elements are hierarchically combined to the com- prehensive logic element of the FPGA, the configurable logic block (CLB). A CLB consists of a number of BLEs sharing I inputs and N outputs, whereas the outputs of the BLEs can again be internally connected to (the same or other) BLEs’ inputs in the CLB (see Figure4). For certain architectures, the content of several BLEs are first combined to a slice and then several of these slices are combined to a CLB, a principally technical difference to improve the performance of the logics’ execution.

The overall hierarchy of the logic units on an FPGA can be summarized as follows:

{LUTs, FFs, MUXs} ⊂ BLEs (⊂ slice) ⊂ CLBs ⊂ FPGA .

In document Ornella Agustina Petriette (página 47-53)