PARTE II – MARCO CONTEXTUAL
CAPÍTULO 2. Políticas Públicas y la Inclusión Digital en Brasil
2.1. Ámbito Nacional El Gobierno Federal de Brasil y sus proyectos de Inclusión Digital
2.1.1. La actuación del Gobierno Federal de Brasil entre 2000-2010 PALABRAS CLAVE
able accelerates image processing algorithms on image data already
available in the PC’s main memory.
3. The online hardware accelerator and data grabber mode. In the online mode microEnable has direct access to the external image source. The source, usually a digital camera, is directly connected to microEnable via a daughterboard. By appropriate programming of the FPGA the processor acts like a conventional framegrabber. Because the framegrabbing requires only a fraction of the FPGA re- sources the processor additionally executes image processing as in the offline mode. MicroEnable then combines the functionality of a conventional framegrabber with real-time image processing capa- bilities.
Because in the offline mode the image data are transferred twice over the PCI bus, there is a potential bottleneck—in some algorithms RAM system and FPGA are able to process the image at a higher rate than that with which PCI can cope. This problem can be solved using the online mode.
2.4 Programming software for FPGA image processing
The higher the FPGA complexity the more important are fast, reliable, and effective programming tools. State-of-the-art are those hardware description languages (HDL) that have been used in chip design for years. The most common languages are VHDL and Verilog. Currently,
microEnable supports VHDL—the Verilog interface that was announced
in 1998.
For FPGA programming FPGA coprocessors introduce a new chal- lenge due to their tight coupling to a CPU. Usually there is much interac- tion between CPU and an FPGA coprocessor—the extreme is real hard- ware/software codesign where the application is divided into a part running on the FPGA coprocessor and one on the microprocessor. In order to simulate this behavior correctly, a hardware/software cosim- ulator is required. Because this feature is not supported by current HDL tools a programming language called CHDL (C++ based hardware
description language) was developed.
A programmer using an HDL describes the architecture of a circuit representing the application. Compared to the algorithmic description used in common programming languages for software the level of ab- straction is lower, thereby leading to a longer development time for a given algorithm. Although there is an increasing effort to develop high- level languages for FPGA coprocessors there is no commonly accepted language today.
22 2 Field Programmable Gate Array Image Processing The following sections describe the VHDL and CHDL programming tools for microEnable. High-level language support is expected by the end of 1998.
2.4.1 VHDL
The VHDL support of microEnable is a large library to be used with any commercial VHDL compiler. A major part of this library includes I/O modules like master and slave interfaces, DMA channels, and DMA channels using a handshake protocol. For each of these modules there exists a corresponding C routine from the Application Interface Library. A programmer instantiating a DMA handshake module in his VHDL code uses a corresponding DMA routine in his application software. There is no need to deal with microEnable’s address space nor with the timing and function of the local bus. This mechanism simplifies the integration of hardware applets into the application software.
Another important class of modules are interfaces to microEnable’s RAM system and the external connectors. There is no need for pro- grammers to care about pinout or timing of the external RAM or inter- faces. The VHDL library can be used with arbitrary VHDL synthesis and simulation tools.
Compared to schematic entry, VHDL has advantages that relate to its higher level of abstraction and optimal code reuse. Among the draw- backs are the difficulty in generating optimized units exploiting special features of the FPGA logic cells (usually this is only possible by com- piler/vendor specific constraints) and the (usual) lack of a C interface for simulation. While the first argument only applies to a limited class of low-level library elements, the missing C interface is an important drawback during verification of complex algorithms. The big problem here is the verification of the hardware/software codesign—the inter- action between the part running on the hardware and the application software. To express this interaction in a VHDL testbench is a time- consuming and hazardous enterprise.
2.4.2 CHDL
To solve the hardware/software cosimulation problem a new language called CHDL (C++ based hardware description language) was devel- oped. The syntax of CHDL is similar to common HDLs like ABEL or
Altera HDL. Anyone familiar with such languages will learn it in a cou-
ple of hours. CHDL is implemented in C++ classes. Only an ordinary C++ compiler is required to run it.
Compared to VHDL the important advantage is testbench genera- tion. The basic concept behind microEnable’s CHDL simulator is that there is no difference between the testbench generation/verification
2.4 Programming software for FPGA image processing 23 and the final application software (i. e., the part controlling the hard- ware). The whole testbench is written as a normal C++ program access- ing microEnable’s internal address space, initiating DMA transfers, or generating interrupts—exactly the way the application software com- municates with microEnable. The important difference is that all ac- cesses are redirected from the hardware to the simulator. Each access defined in the testbench software will be translated into a clock cycle- based animation of the defined CHDL program. Using this mechanism, it is straightforward to transfer whole images (this may cover some 100,000 clock ticks) via a virtual DMA to the simulator, and to read the processed image back into a buffer via another virtual DMA.
In order to verify the CHDL code the user can now compare this buffer with expected results or simply display the resulting image on screen. It is also possible to watch selected signals defined in CHDL on a waveform display. For algorithms using microEnable’s external RAM the simulator provides routines to virtually access the current content of this memory.
This mechanism has three major advantages:
• The whole testbench generation and verification is programmed in usual C++, which makes the simulation process fast and convenient.
• Even complex interactions between host software and the hardware applet are easily simulated. This feature is very important for hard- ware/software codesign, when applications are divided between the host CPU and the FPGA hardware. This type of simulation is ex- tremely difficult for VHDL testbenches where only the content of a (C generated) test vector file is usually loaded into the simulator.
• The testbench generator software is identical to the software re- quired to control the FPGA coprocessor during the run of the real application. The simulator thus guarantees the correspondence of the application control software with the simulated circuitry. For complex hardware/software codesign applications the interaction between coprocessor and software is not trivial.
These advantages speed up the development of applets strongly and make simulation results more reliable.
In addition to the easy testvector generation CHDL provides the user with the full C++ functionality. Instead of defining single modules it is straightforward to declare whole classes of modules: Libraries thus appear more like library generators.
Module respectively class definitions in CHDL are hierarchical to any level. The language allows direct access to all special features of FPGA cells including placement and routing constraints. This eases the generation of highly optimized modules.
24 2 Field Programmable Gate Array Image Processing The disadvantage, however, is the lower level of abstraction com- pared to VHDL (e.g., there is no behavioral description). This drawback is evident in practice mostly when state machines are defined. For this reason a state machine generator was developed that allows state ma- chines to be defined using a commonswitch casesyntax.
The migration between VHDL and CHDL is simple. Code written in
CHDL can be directly exported to VHDL (for both synthesis and simu-
lation).
CHDL is available for the operating systems Linux, Solaris, and WinNT
4.0.
2.4.3 Optimization of algorithms for FPGAs
FPGAs offer the opportunity to design a hardware that is optimally adapted to a specific problem. Thus the optimization of a given algo- rithm for FPGAs has many more degrees of freedom than on a standard microprocessor. Some of these possibilities are briefly outlined. Multiple processing units. For each task it can be decided how many processing units run in parallel and/or which topology of a pipeline is the best choice. This includes the choice between a few rather complex processing units and a network of simpler units.
Variable bit-length processing. On a standard microprocessor the data types are fixed to multiple bytes. The integer arithmetic/logical unit and multiplication units have a width of 32 bits or wider — much wider than the data from imaging sensors. This waste of resources can be avoided with FPGAs. The bit lengths of any processing unit can be set no wider than the required depth. Thus it is possible to balance speed versus accuracy as it is best suited for a given task.
Special processing units. A standard microprocessor knows only a few standard types of processing units. For integer processing this in- cludes an arithmetic/logical unit, a shifter, and a multiplication unit. With FPGAs any type of special processing unit that might be required for a specific task can be implemented. Such dedicated processing units have a higher performance and/or consume fewer resources than stan- dard processing elements.
Lookup table processing. While multimedia instruction sets have considerably boosted the performance of standard microprocessors, the computation of histograms and lookup table operations cannot be accelerated by such instructions (see Chapter3). These FPGA proces- sors do not show this deficit. It is very easy to implement lookup table
2.4 Programming software for FPGA image processing 25 operations. They can be used for different operations such as multipli- cation with a fixed constant, computing the square, square root or any other nonlinear function.
2.4.4 Integration into a general software system
If an FPG processor is not used as a standalone system, it must be integrated into a general software environment that coordinates the processing on the FPGA with the host system. Basically, two concepts are possible.
On the one hand, an FPGA image processing system that receives data from an image source can be handled as a freely programmable frame grabber (online hardware accelerator, Section2.3.5). The FPGA processor receives the raw image data, processes them and transfers the processed data to the host. Thus the data flow is basically unidi- rectional (except for downloading hardware applets to the FPGA dur- ing initialization, Section2.3.4). The host just controls the processing mode, starts and stops the data flow, and directs the incoming data for further processing.
On the other hand, one or multiple FPGAs could be used as copro- cessors in a host system. In this case, the flow of data is much more complex. The host sends the data to be processed to the FPGA proces- sors and receives the processed data. This bidirectional data transfer is much more difficult to handle—especially if several FPGA processors run in parallel—and requires much more input/output bandwidth.
Thus the concept of the freely programmable frame grabber is by far preferable. It is the method of choice provided that the FPGA pro- cessor is powerful enough to process the incoming image data in real time. This approach fits very well into the modern concepts for frame buffers. Typically the software interface consists of two parts, a config- uration utility and a device driver. The configuration utility is typically an interactive software module that allows the user to configure the frame grabber for the application. The user sets the camera to be used, the image resolution, the trigger signals, etc. and saves these settings in a configuration file. This configuration file is then loaded during ini- tialization of the frame grabber and configures the frame grabber to the preselected values. This concept is ideally suited to choose hard- ware applets for FPGA processors (Section2.3.4). At runtime one of the processing modes built into the hardware applet including a pass- through mode transferring the original image data to the host for direct data control is selected. The mode and frequency with which the image data are acquired and processed are controlled via standard routines to grab and snap images as they are available in any frame grabber software interface. Of course, it should also be possible to load a new applet at runtime for a different type of application.
26 2 Field Programmable Gate Array Image Processing With this concept it is also very easy to perform parallel processing on the FPGA and the host. The software on the host just needs a number of synchronization routines to perform further processing on the host in sync with the incoming preprocessed image data. In addition, it is necessary to direct the incoming preprocessed data into image objects of the image processing software running on the host.
Such features are, for instance, built into the heurisko® image pro- cessing system.4 In this software, objects can be allocated in such a way
that background transfer via DMA over the PCI bus is possible. Such buffers can be individual images or image sequences that can be used as image ring buffers. Background acquisition into such an object is started by theAcqStart()instruction and stopped by theAcqStop() instruction. With the AcqCnt() instruction it can be inquired how many frames have been transferred since the last call toAcqStart(). AcqTest()inquires whether currently a processed frame is transferred andAcqWait(n)waits until the nextnframes are acquired. With these commands it is easy to synchronize further processing with the incom- ing preprocessed image data. As long as the capacity of the PCI bus is not exceeded, the data streams from multiple FPGA processors can be processed in this way.