Tema 7- Procesadores Virtuales (FPGAs)

(1)

INSTITUTO TECNOLÓGICO DE MAZATLAN

INGENIERIA ELECTRÓNICA

MATERIA:

MICROPROCESADORES AVANZADOS

TITULAR:

RUFINO JUAN DOMINGUEZ ARELLANO

ALUMNOS:

NUÑEZ ZAMBRANO JOSÉ DE JESÚS

VAZQUEZ AYON ADAN ALBERTO

GRUPO:

(2)

1. FPGAs

1.1 INTRODUCCION Y CONCEPTOS GENERALES

Field programmable gate arrays (FPGAs) are digital integrated circuits (ICs) that contain configurable (programmable) blocks of logic along with configurable interconnects between these blocks.

Depending on the way in which they are implemented, some FPGAs may only be programmed a single time, while others may be reprogrammed over and over again.

Not surprisingly, a device that can be programmed only one time is referred to as one-time programmable (OTP).

The “field programmable” portion of the FPGA’s name refers to the fact that its programming takes place “in the field” (as opposed to devices whose internal functionality is hardwired by the manufacturer). This may mean that FPGAs are configured in the laboratory, or it may refer to modifying the function of a device resident in an electronic system that has already been deployed in the outside world. If a device is capable of being programmed while remaining resident in a higher-level system, it is referred to as being in-system programmable (ISP).

FPGAs occupy a middle ground between PLDs and ASICs because their functionality can be customized in the field like PLDs, but they can contain millions of logic gates1 and be used to implement extremely large and complex functions that previously could be realized only using ASICs.

The cost of an FPGA design is much lower than that of an ASIC (although the ensuing ASIC components are much cheaper in large production runs). At the same time, implementing design changes is much easier in FPGAs, and the time-to-market for such designs is much faster.

What can FPGAs be used for?

(3)

designs or to provide a hardware platform on which to verify the physical implementation of new algorithms. However, their low development cost and short time-to-market mean that they are increasingly finding their way into final products (some of the major FPGA vendors actually have devices that they specifically market as competing directly against ASICs).

By the early-2000s, high-performance FPGAs containing millions of gates had become available. Some of these devices feature embedded microprocessor cores, high-speed input/output (I/O) interfaces, and the like. The end result is that today’s FPGAs can be used to implement just about anything, including communications devices and software-defined radios; radar, image, and other digital signal processing (DSP) applications; all the way up to system-on-chip (SoC) components that contain both hardware and software elements.

To be just a tad more specific, FPGAs are currently eating into four major market segments: ASIC and custom silicon, DSP, embedded microcontroller applications, and physical layer communication chips. Furthermore, FPGAs have created a new market in their own right: reconfigurable computing (RC).

Fusible link technologies

One of the first techniques that allowed users to program their own devices was— and still is—known as fusible-link technology. In this case, the device is manufactured with all of the links in place, where each link is referred to as a fuse.

(4)

When an engineer purchases a programmable device based on fusible links, all of the fuses are initially intact. This means that, in its unprogrammed state, the output from our example function will always be logic 0. (Any 0 presented to the input of an AND gate will cause its output to be 0, so if input a is 0, the output from the AND will be 0. Alternatively, if input a is 1, then the output from it’s NOT gate— which we shall call

!a—will be 0, and once again the output from the AND will be 0. A similar situation occurs in the case of input b.)

The point is that design engineers can selectively remove undesired fuses by applying pulses of relatively high voltage and current to the device’s inputs. For example, consider what happens if we remove fuses Faf and Fbt .

.

Removing these fuses disconnects the complementary version of input a and the true version of input b from the AND gate (the pull-up resistors associated with these signals cause their associated inputs to the AND to be presented with logic 1 values). This leaves the device to perform its new function, which is y = a & !b. (The “&” character in this equation is used to represent the AND, while the “!” character is used to represent the NOT. This syntax is discussed in a little more detail in chapter 3). This process of removing fuses is typically referred to as programming the device, but it may also be referred to as blowing the fuses or burning the device.

Devices based on fusible-link technologies are said to be one-time programmable, or OTP, because once a fuse has been blown, it cannot be replaced and there’s no going back. As fate would have it, although modern FPGAs are based on a wide variety of programming technologies, the fusible link approach isn’t one of them. The reasons for mentioning it here are that it sets the scene for what is to come, and it’s relevant in the context of the precursor device technologies referenced in chapter 3.

Antifuse technologies

(5)

antifuse. In its unprogrammed state, an antifuse has such a high resistance that it may be considered an open circuit (a break in the wire).

This is the way the device appears when it is first purchased. However, antifuses can be selectively “grown” (programmed) by applying pulses of relatively high voltage and current to the device’s inputs. For example, if we add the antifuses associated with the complementary version of input a and the true version of input b, our device will now perform the function y = !a & b.

(6)

The act of programming this particular element effectively “grows” a link—known as a via—by converting the insulating amorphous silicon into conducting polysilicon (Figure 2-6b). Not surprisingly, devices based on antifuse technologies are OTP, because once an antifuse has been grown, it cannot be removed, and there’s no changing your mind.

SRAM-based technologies

There are two main versions of semiconductor RAM devices: dynamic RAM (DRAM) and static RAM (SRAM). In the case of DRAMs, each cell is formed from a transistorcapacitor pair that consumes very little silicon real estate. The “dynamic” qualifier is used because the capacitor loses its charge over time, so each cell must be periodically recharged if it is to retain its data. This operation—known as refreshing—is a tad complex and requires a substantial amount of additional circuitry. When the “cost” of this refresh circuitry is amortized over tens of millions of bits in a DRAM memory device, this approach becomes very cost effective. However, DRAM technology is of little interest with regard to programmable logic. By comparison, the “static” qualifier associated with SRAM is employed because— once a value has been loaded into an SRAM cell—it will remain unchanged unless it is specifically altered or until power is removed from the system. The entire cell comprises a multitransistor SRAM storage element whose output drives an additional control transistor.

Depending on the contents of the storage element (logic 0 or logic 1), the control transistor will either be OFF (disabled) or ON (enabled).

(7)

USO DE TECNOLOGIAS EN ICs

1.2 FPGAs

Around the beginning of the 1980s, it became apparent that there was a gap in the digital IC continuum. At one end, there were programmable devices like SPLDs and CPLDs, which were highly configurable and had fast design and modification times, but which couldn’t support large or complex functions.

(8)

In order to address this gap, Xilinx developed a new class of IC called a field-programmable gate array, or FPGA, which they made available to the market in 1984.

The various FPGAs available today are discussed in detail in chapter 4. For the nonce, we need only note that the first FPGAs were based on CMOS and used SRAM cells for configuration purposes. Although these early devices were comparatively simple and contained relatively few gates (or the equivalent thereof) by today’s standards, many aspects of their underlying architecture are still employed to this day.

(9)

The multiplexer feeding the flip-flop could be configured to accept the output from the LUT or a separate input to the logic block, and the LUT could be configured to represent any 3-input logical function.

For example, assume that a LUT was required to perform the function:

y = (a & b) | !c

This could be achieved by loading the LUT with the appropriate output values.

Note that the 8:1-multiplexer-based LUT illustrated in Figure is used for purposes of simplicity.

The complete FPGA comprised a large number of programmable logic block “islands” surrounded by a “sea” of programmable interconnects.

(10)

The device would also include primary I/O pins and pads (not shown here). By means of its own SRAM cells, the interconnect could be programmed such that the primary inputs to the device were connected to the inputs of one or more programmable logic blocks, and the outputs from any logic block could be used to drive the inputs to any other logic block, the primary outputs from the device, or both.

The end result was that FPGAs successfully bridged the gap between PLDs and ASICs. On the one hand, they were highly configurable and had the fast design and modification times associated with PLDs. On the other hand, they could be used to implement large and complex functions that had previously been the domain only of ASICs. (ASICs were still required for the really large, complex, high-performance designs, but as FPGAs increased in sophistication, they started to encroach further and further into ASIC design space.)

Platform FPGAs

The concept of a reference design or platform design has long been used at the circuit board level. This refers to creating a base design configuration from which multiple products can be derived.

In addition to tremendous amounts of programmable logic, today’s high-end FPGAs feature embedded (block) RAMs, embedded processor cores, high-speed I/O blocks, and so forth. Furthermore, designers have access to a wide range of IP. The end result is the concept of the platform FPGA. A company may use a platform FPGA design as a basis for multiple products inside that company, or it may supply an initial design to multiple other companies for them to customize and differentiate.

1.3 Architectures

Antifuse versus SRAM versus SRAM-based devices

The majority of FPGAs are based on the use of SRAM configuration cells, which means that they can be configured over and over again. The main advantages of this technique are that new design ideas can be quickly implemented and tested, while evolving standards and protocols can be accommodated relatively easily. Furthermore, when the system is first powered up, the FPGA can initially be programmed to perform one function such as a self-test or board/system test, and it can then be reprogrammed to perform its main task.

(11)

In the past, memory devices were often used to qualify the manufacturing processes associated with a new technology node. More recently, the mixture of size, complexity, and regularity associated with the latest FPGA generations has resulted in these devices being used for this task. One advantage of using FPGAs over memory devices to qualify the manufacturing process is that, if there’s a defect, the structure of FPGAs is such that it’s easier to identify and locate the problem (that is, figure out what and where it is). For example, when IBM and UMC were rolling out their 0.09 μm (90 nano) processes, FPGAs from Xilinx were the first devices to race out of the starting gate.

Unfortunately, there’s no such thing as a free lunch. One downside of SRAM-based devices is that they have to be reconfigured every time the system is powered up. This either requires the use of a special external memory device (which has an associated cost and consumes real estate on the board) or of an on-board microprocessor.

Antifuse-based devices

Unlike SRAM-based devices, which are programmed while resident in the system, antifuse-based devices are programmed off-line using a special device programmer.

The proponents of antifuse-based FPGAs are proud to point to an assortment of (not-insignificant) advantages. First of all, these devices are nonvolatile (their configuration data remains when the system is powered down), which means that they are immediately available as soon as power is applied to the system. Following from their nonvolatility, these devices don’t require an external memory chip to store their configuration data, which saves the cost of an additional component and also saves real estate on the board.

One noteworthy advantage of antifuse-based FPGAs is the fact that their interconnect structure is naturally “rad hard,” which means they are relatively immune to the effects of radiation. This is of particular interest in the case of military and aerospace applications because the state of a configuration cell in an SRAM-based component can be “flipped” if that cell is hit by radiation (of which there is a lot in space).

(12)

performed successfully (this is well worth doing when you’re talking about devices containing 50 million plus programmable elements). In order to do this, the device programmer requires the ability to read the actual states of the antifuses and compare them to the required states defined in the configuration file.

Once the device has been programmed, however, it is possible to set (grow) a special security antifuse that subsequently prevents any programming data (in the form of the presence or absence of antifuses) from being read out of the device. Even if the device is decapped (its top is removed), programmed and unprogrammed antifuses appear to be identical, and the fact that all of the antifuses are buried in the internal metallization layers makes it almost impossible to reverse-engineer the design.

Vendors of antifuse-based FPGAs may also tout a couple of other advantages relating to power consumption and speed, but if you aren’t careful this can be a case of the quickness of the hand deceiving the eye. For example, they might tease you with the fact that an antifuse-based device consumes only 20 percent (approximately) of the standby power of an equivalent SRAM-based component, that their operational power consumption is also significantly lower, and that their interconnect-related delays are smaller. Also, they might casually mention that an antifuse is much smaller and thus occupies much less real estate on the chip than an equivalent SRAM cell (although they may neglect to mention that antifuse devices also require extra programming circuitry, including a large, hairy programming transistor for each antifuse).

They will follow this by noting that when you have a device containing tens of millions of configuration elements, using antifuses means that the rest of the logic can be much closer together. This serves to reduce the interconnect delays, thereby making these devices faster than their SRAM cousins. And both of the above points would be true … if one were comparing two devices implemented at the same technology node. But therein lies the rub, because antifuse technology requires the use of around three additional process steps after the main manufacturing process has been qualified. For this (and related) reasons, antifuse devices are always at least one—and usually several—generations (technology nodes) behind SRAM-based components, which effectively wipes out any speed or power consumption advantages that might otherwise be of interest.

(13)

PROGRAMMING TECHNOLOGIES

CLBs versus LABs versus slices

(14)

A Xilinx logic cell

One niggle when it comes to FPGAs is that each vendor has its own names for things. But we have to start somewhere, so let’s kick off by saying that the core building block in a modern FPGA from Xilinx is called a logic cell (LC). Among other things, an LC comprises a 4-input LUT (which can also act as a 16 × 1 RAM or a 16-bit shift register), a multiplexer, and a register.

It must be noted that the illustration presented in Figure 4-7 is a gross simplification, but it serves our purposes here. The register can be configured to act as a flip-flop, as shown or as a latch. The polarity of the clock (rising-edge triggered or falling-edge triggered) can be configured, as can the polarity of the clock enable and set/reset signals (active-high or active-low).

In addition to the LUT, MUX, and register, the LC also contains a smattering of other elements, including some special fast carry logic for use in arithmetic operations (this is discussed in more detail a little later).

An Altera logic element

Just for reference, the equivalent core building block in an FPGA from Altera is called a logic element (LE). There are a number of differences between a Xilinx LC and an Altera LE, but the overall concepts are very similar.

Slicing and dicing

(15)

The reason for the “at the time of this writing” qualifier is that these definitions can —and do—change with the seasons.

The internal wires have been omitted from this illustration to keep things simple; it should be noted, however, that although each logic cell’s LUT, MUX, and register have their own data inputs and outputs, the slice has one set of clock, clock enable, and set/reset signals common to both logic cells.

CLBs and LABs

And moving one more level up the hierarchy, we come to what Xilinx calls a configurable logic block (CLB) and what Altera refers to as a logic array block (LAB). (Other FPGA vendors doubtless have their own equivalent names for each of these entities, but these are of interest only if you are actually working with their devices.)

(16)

There is also some fast programmable interconnect within the CLB. This interconnect is used to connect neighboring slices. The reason for having this type of logic-block hierarchy— LC→ Slice (with two LCs)→ CLB (with four slices)—is that it is complemented by an equivalent hierarchy in the interconnect. Thus, there is fast interconnect between the LCs in a slice, then slightly slower interconnect between slices in a CLB, followed by the interconnect between CLBs. The idea is to achieve the optimum trade-off between making it easy to connect things together without incurring excessive interconnect-related delays.

1.5 FPGA and FPAA vendors

Company Web site Comment

Actel Corp. www.actel.com FPGAs

Altera Corp. www.altera.com FPGAs

Anadigm Inc. www.anadigm.com FPAAs

Atmel Corp. www.atmel.com FPGAs

Lattice Semiconductor Corp. www.latticesemi.com FPGAs

Leopard Logic Inc. www.leopardlogic.com Embedded . FPGA cores

QuickLogic Corp. www.quicklogic.com FPGAs

Xilinx Inc. www.xilinx.com FPGAs

1.6 Procesadores embebidos.

Hard cores

(17)

its hard cores. For example, Altera offer embedded ARM processors, QuickLogic have opted for MIPS-based solutions, and Xilinx sports PowerPC cores.

Of course, each vendor will be delighted to explain at great length why its implementation is far superior to any of the others (the problem of deciding which one actually is better is only compounded by the fact that different processors may be better suited to different tasks. There are two main approaches for integrating such cores into the FPGA. The first is to locate it in a strip to the side of the main FPGA fabric. In this scenario, all of the components are typically formed on the same silicon chip, although they could also be formed on two chips and packaged as a multichip module (MCM).

One advantage of this implementation is that the main FPGA fabric is identical for devices with and without the embedded microprocessor core, which can make things easier for the design tools used by the engineers. The other advantage is that the FPGA vendor can bundle a whole load of additional functions in the strip to complement the microprocessor core, such as memory and special peripherals. An alternative is to embed one or more microprocessor cores directly into the main FPGA fabric. One, two, and even four core implementations are currently available at the time of this writing.

In this case, the design tools have to be able to take account of the presence of these blocks in the fabric; any memory used by the core is formed from embedded RAM blocks, and any peripheral functions are formed from groups of general-purpose programmable logic blocks.

Soft microprocessor cores

(18)

core is provided in the form of an RTL netlist that will be synthesized with the other logic, then this truly is a soft implementation. Alternatively, if the core is presented in the form of a placed and routed block of LUTs/CLBs, then this would typically be considered a firm implementation.

In both of these cases, all of the peripheral devices like counter timers, interrupt controllers, memory controllers, communications functions, and so forth are also implemented as soft or firm cores (the FPGA vendors are typically able to supply a large library of such cores).

Soft cores are slower and simpler than their hard-core counterparts (of course they are still incredibly fast in human terms). However, in addition to being practically free, they also have the advantages that you only have to implement a core if you need it and that you can instantiate as many cores as you require until you run out of resources in the form of programmable logic blocks.

Once again, each of the main FPGA vendors has opted for a particular processor type to implement its soft cores. For example, Altera offers the Nios, while Xilinx sports the Micro- Blaze. The Nios has both 16-bit and 32-bit architectural variants, which operate on 16-bit or 32-bit chunks of data, respectively (both variants share the same 16-bit-wide instruction set). By comparison, the MicroBlaze is a true 32-bit machine (that is, it has 32-32-bit-wide instruction words and performs its magic on 32-bit chunks of data). Once again, each of the vendors will be more than happy to tell you why its soft core rules and how its competitors’ offerings fail to make the grade (sorry, you’re on your own here).

One cool thing about the integrated development environment (IDE) fielded by Xilinx is that it treats the PowerPC hard core and the MicroBlaze soft core identically. This includes both processors being based on the same CoreConnect processor bus and sharing common soft peripheral IP cores. All of this makes it relatively easy to migrate from one processor to the other.

Also of interest is the fact that Xilinx offers a small 8-bit soft core called the PicoBlaze, which can be implemented using only 150 logic cells (give or take a handful). By comparison, the MicroBlaze requires around 1,000 logic cells.

1.7Lenguajes de Programación para FPGAs

Existen varios lenguajes de programación para diseñar circuitos digitales para FPGAs. A continuación se listan algunos de estos.

• VHDL • Verilog

• System Verilog • SystemC • Handel-C • Pure C/C++ • Simulink • LabVIEW

(19)

Conforme los FPGAs fueron creciendo en potencia y velocidad, surgieron versiones de lenguajes de alto nivel conocidos como C y C++ para FPGA.

Otras opciones de más alto nivel de abstracción incluyen Simulink y LabVIEW.

VHDL

VHDL es el acrónimo que representa la combinación de VHSIC (Very High speed Integrated Circuit) y HDL (Hardware description language).

Originalmente VHDL fue patrocinado por el Departamento de Defensa de E.U. y posteriormente transferido al IEEE (Institute of Electrical and Electronics Engineers). El IEEE lo ratificó como el estándar 1076 en 1987, al cual se le llama VHDL-87. En 1993 el IEEE hizo una nueva revisión del estándar (VHDL- 93) y en el 2001 (VHDL-2001).

El VHDL originalmente surgió como un lenguaje para simulación de circuitos digitales y para el diseño se usaban otras herramientas como esquemáticos y Netlist.

Conforme los circuitos digitales fueron haciéndose más complejos surgió la necesidad de poder describir los circuitos con un alto grado de abstracción, no desde el punto de vista estructural, sino desde el punto de vista funcional.[8] Este nivel de abstracción ya se había alcanzado con las herramientas de simulación (como VHDL), ya que para poder simular partes de un circuito era necesario disponer de un modelo que describiera el comportamiento del circuito o sus componentes. Fue entonces cuando se empezó a usar el VHDL para el diseño de circuitos digitales, ya que surgieron herramientas que realizan la síntesis a partir de la descripción en HDL.

El lenguaje VHDL es un lenguaje muy extenso y complejo, sin embargo para la síntesis solo se una pequeña porción del lenguaje. Es esta pequeña porción del lenguaje la que usaremos a lo largo de este curso.

1.8 Tarjetas de Desarrollo

Existen tarjetas de desarrollo de diversos fabricantes que contienen un FPGA y un conjunto de componentes externos como pueden ser: botones, interruptores, LEDs, displays de 7 segmentos, display LCD, memoria RAM, conectores para audio, video VGA, USB, etc.