• No se han encontrado resultados

3.7 Conclusions . . . 100

Review Questions . . . 101

Bibliography . . . 102

3.1

Introduction

In recent years, due to the continuous development in the field of silicon tech- nology, it is possible to implement complex electronic systems in a single integrated circuit. Systems-on-chips (SoCs) have favored the explosion of the market of electronic appliances: small mobile devices, which provide commu- nications and information capabilities for consumer electronics and industrial automation. These devices require complex electronic and high levels of sys- tem integration and need to be delivered in a very short time in order to meet their market window.

The design complexity of these systems requires new design methodologies and the development of a seamless design flow that integrates existing and emerging tools. The International Technology Roadmap for Semiconductors (ITRS) and MEDEA+ Roadmap evidence some key points that electronic design automation companies must consider in order to deal with such design complexity, among them:

• Intellectual Property Reuse

Intellectual property (IP) reuse is becoming critical for an efficient sys- tem development; the need to shorten the time to market is stimulating reusability of both hardware and software. A good way to keep design costs under control is to minimize the number of new designs that are required each time a new SoC is developed: reuse existing design com- ponents where possible.

The development of reusable IPs requires:

– The development of standards, including general constraints and guidelines, as well as executable specifications for intra- and inter- company IP exchange, such as SystemC, XML and UML

– The creation of parameterizable, qualified and validated IPs – The use of hierarchical reuse methodology, allowing the reuse of

the IPs and of the testbenches at different levels of abstraction Furthermore, the IP reuse methodology is indispensable when the design of a system is developed in cooperation between different companies, or when the design center is distributed all over the world and consequently the project management is distributed.

A lot of work has been done on the development of standards for IP qualification. The SPIRIT Consortium developed the IP-XACT specifi- cation to enable rapid, reliable deployment of IPs into advanced design environments. The Virtual Socket Interface Alliance (VSIA) developed the international standard QIP (Quality Intellectual Property) for mea- suring IP quality. OpenCores is the world’s largest community for de- velopment of open source hardware IPs.

• Low Power Design

The continuous progress of micro and nano technologies led to a grow- ing integration and clock frequency increment in electronics systems. These combined effects led to an increase both in power density and energy dissipation, with important consequences above all in portable systems. Some design and technology issues related to power efficiency are becoming crucial, in particular for power optimized cell libraries, clock gating and clock trees optimization, and dynamic power manage- ment. Emphasis is now moving to architectural level (software energy optimization), optimum memory hierarchy organization and run time system management.

• System Level Design Methodologies and On-Chip Communi- cation

The design of complex systems-on-chips and multi-core systems requires the exploration of a large solution space. Current design approaches start with low level models of components and interconnect them when most architectural decisions have been fixed. Multi-core system design methodologies perform architecture exploration at high level, taking into account constraints at this level. Multi-core system design methodologies must select:

– The global communication architecture, which may be multi-level bus architecture, network-on-chip (NoC) architecture or mixed-bus NoC

– Synchronous or asynchronous architectures for local and global communication

– The partitioning of system specification and the allocation of com- ponents, such as software (real time operating system) or hardware IPs to execute them

Transaction level modeling (TLM) [39] has been widely used to explore the space solution at system level in a fast and efficient way.

• Design for Testability and Manufacturability

When the complexity increases the time spent in the verification and validation increases much more than the time spent in the design, a designer must consider, among other specifications, the simplification of the test phase in prototyping and in production. Design methodologies that take these aspects into account are:

– Formal verification

– Hierarchical specification and verification and reuse of test benches at different levels of abstraction

– Reuse of qualified IPs – Virtual prototyping

The chapter is organized as follows. In Section 3.2, system level power models and the state of the art of power analysis tools are presented. Section 3.3 presents a SystemC library, called PKtool, for system level power analysis. Some design considerations and existing analysis tools for network-on-chip are reported in Section 3.4. Section 3.5 presents a SystemC library, called NOC- EXplore, for network-on-chip performance analysis. Section 3.6 presents the application of dynamic voltage scaling techniques in different on-chip commu- nication architectures. Finally Section 3.7 reports the conclusions.

3.2

Low Power Design

The mean energy dissipated during a time period T in a CMOS circuit can be modeled by the following equation

EM(T ) = Edyn+ Eleak+ Esc= = N X i=1 CiVDD2 Di+ VDDIsc,iτiDi+ VDDIleak,iT (3.1) where the first term represents the capacitive switching energy, the second the energy dissipated due to leakage currents, the third term represents the short circuit energy, N represents the number of nodes of the circuit, Ci is the capacitance associated to the i-th node, Di is the number of commutations of the i-th node during the period T , Isc,iτiis the charge lost during commutation of the i-th node due to short circuit effect, Ileak,iis the mean leakage current of the i-th node, and VDD is the supply voltage.

The different techniques, applied at different levels of the design to reduce the power dissipation, have the objective of reducing one or more terms of Equation (3.1). A resume of some design techniques for low power is the following.

• Leakage Current Reduction

The feature size reduction gives, as a drawback, the increment of the sub-threshold current, the bulk leakage current and the leakage cur- rent through the gate oxide. As a consequence the leakage power is no more negligible with respect to the other terms and it can be reduced and controlled using techniques such as multi-threshold MOS transis- tors, silicon on insulator technologies, back biasing, or switching off the complete block when it is inactive.

• Short Circuit Current Reduction

Short circuit current flows in a CMOS gate when both the pMOSFET and nMOSFET are on. The increment of clock frequency makes the com- mutation period of the logic devices comparable with the clock period, increasing the short circuit effect. A reduction of short circuit current is obtained using low level design techniques, trying to reduce the period of time in which both the pMOSFET and nMOSFET are on.

• Capacitance Reduction

From low level design to high level design the objective is the reduction of the complexity and therefore the area required to implement the desired functionality, with the additional objective of the reduction of cost of the silicon and the increment of clock frequency.

• Switching Activity Reduction

With the increment of the number of devices implemented in a single chip, the interconnections increase more than linearly. A great part of the power is actually dissipated by the interconnections with respect to the logic part and the delay due to the interconnections is more relevant with respect to the delay of the logic gates. Placement and routing algorithms should optimize not only the delay, but the power dissipation too. This means that the algorithms should reduce the length of the interconnections of the signals whose switching activity is higher for the particular application for which the hardware will be used. The clock gating technique is used to stop the clock in parts of the circuit where no active computation is required. Some conditions for stopping the clock signal can be found directly from the state machine specification of the circuit [10, 12].

3.2.1

Power Models

System level design and IP modeling is the key to fast SoC innovation with the capability to quickly examine different alternatives early in the design pro- cess, to establish the best possible architecture, taking into account HW/SW partitioning, cost, performance and power consumption trade-offs.

The first necessary step to make toward low-power design is the dissipated power estimation of the system under development. This kind of analysis should be performed in the early phases of the design when some good ideas on optimizing power dissipation can drive the choice between different archi- tectures.

Power analysis at system level is less accurate than at lower levels since the details of the real implementation of the functionality are not defined yet, but conversely the simulation time is much faster, due to the absence of

P o w e r S a vi n g O p p o rt u n u ty P o w e r E st im a ti o n A ccu ra cy S im u la ti o n t im e System Level RTL Gate Level Layout Power analysis and optimization Power analysis and optimization Power analysis and optimization Power analysis and optimization

FIGURE 3.1: Power analysis and optimization at different levels of the design. these details, and the power saving opportunity with an optimization is much higher. This concept is summarized in Figure 3.1.

Essentially two methodologies exist for estimating the power dissipation at different levels of abstraction: simulation-based methods and probabilistic methods.

• Simulation-based methods. The power dissipation is obtained ap- plying specific input patterns to the circuit, see for example [46]. There- fore the estimation depends not only on the accuracy of the model de- scription, but on the input patterns too. The input patterns should be strictly related to the real application in which the circuit will be ap- plied. Simulation-based methods are widely used, since they are strictly related to the timing and functional simulation and test of the system. • Probabilistic methods. These methods require the specification of the typical behavior of the input patterns through their probabilities; in this way it is possible to cover a large number of patterns with limited computational effort [25]. The switching activity, necessary to perform power estimation, is computed from the signal probabilities of the cir- cuit nodes. Approaches to such methods are represented by probabilistic simulation [65, 80], symbolic simulation [40] and simulation of transition densities [63, 64].

Many consolidated and accurate tools estimate power dissipation from RTL to circuit level, but at higher levels there is still a lot of research to be done. Power models are classified on the basis of the level of abstraction of the description of the system and are reviewed in the following.

• Transistor Level Power Estimation

An accurate estimate of power consumption can be carried out at tran- sistor level, simulating the analog behavior of the circuit, analyzing the

supply current, using SPICE-like simulators. The CPU time requested for the simulation is extremely high, making the simulation possible only for circuits with hundreds of transistors and few input patterns. • Gate Level Power Estimation

At gate level it is possible to analyze the behavior of the circuit using digital simulators if one has the details of the single logic gate. The esti- mation of power consumption is obtained by using switching activity and single node capacity using the relationship reported in Equation(3.1). At this level the results of the power estimation strongly depend on the de- lay model used, that may correctly estimate the presence or absence of glitches. In a “zero delay” model all transitions happen simultaneously, glitches are not considered, so power estimation is very optimistic. • RT Level Power Estimation

At register transfer level (RTL) power can be estimated using more complex blocks like multiplexers, adders, multipliers and registers. The source of inaccuracy at this level depends on the poor modeling of dy- namic effect (e.g., glitches), causing an inaccurate estimation of the switching activity, and on the poor description details of the functional blocks and interconnections with a consequent inaccurate estimation of the capacitances.

The improvement in the automatic synthesis tools from RTL description allows us to estimate the power dissipation using a fast synthesis with a mapping into a technology and a library defined by the user.

Some analytical methods at RTL use complexity, or an equivalent gate count, as a capacitance estimate [62, 54]. In this way the power dissipated by a block can be roughly estimated as the number of equivalent gates multiplied by the power consumption of a single reference gate; a fixed activity factor is assumed.

Some methods are based on analytical macromodels (linear, piecewise linear, spline, . . . ) of the power dissipation of each block. The model fits the experimental data obtained from numerical simulations at lower levels or experimental data. The model is affected by an error intrinsic in the model, by an estimation error due to the limited number of exper- iments and by an error due to the dependence of the measurements on the input patterns. The model can be represented as an equation [6, 84] or as a multi-dimensional look-up table (LUT) [58, 45, 72].

• System Level Power Estimation

System level power estimation relies upon the power analysis of the hard- ware and software parts of the system. The components in a system level description are microprocessors, DSPs, buses, peripherals, whose inter- nal architecture is, in general, not defined. Battery, thermal dissipation

and cooling system modeling should also be considered at this level. Because the complete architecture of the system is not defined, power estimation is highly inaccurate; conversely, design exploration opportu- nity is high and so is power optimization.

At this level of abstraction power estimation usually is performed for the evaluation of different system architectures, in order to choose the best one in terms of power consumption too.

To enable power estimation, a model of the power dissipated by each block is created and the coefficients of the model are estimated from the information derived from the lower levels. The system level power model can be derived from the power dissipation of the single CMOS device, as reported in Equation (3.1), and can be represented by the following relationship

E = N CVDD2 D + QscVDDD + IleakVDDT (3.2) where VDD is the supply voltage, D is the average number of commu- tations of the gates of the block, N is the number of gates, C is the average capacitance of the gates, Qsc is the average charge lost due to short-circuit current during commutation, Ileak is the average leakage current of the block.

The average number of commutations D must be calculated during the system level simulation and therefore depends on the specific application and test vector. The coefficients C, Qsc, Ileak are related to the specific technology chosen, N is the number of equivalent gates necessary to implement the block described at system level. If the block described at system level has already been implemented, these coefficients can be obtained from the low level implementation. If the block has not yet been implemented, the complexity of the block, that is, an estimation of the number of gates required for its implementation should be given. Of course, if the detailed architecture of the system is not yet defined, only a rough estimation can be given. An example of this procedure is given in Figure 3.2. From the SystemC code of each module the number of equivalent gates required for the implementation of the module is estimated.

Complexity estimation

SystemC source

code files AND 152

NAND 122 OR 973 XOR 23 FF 186 … … Estimated n. of gates

The mathematical operations on different SystemC types (sc int, sc uint, sc bigint, sc biguint, sc fixed, sc ufixed, sc fix, sc ufix), the bitwise and comparison operators, the assignments and the C++ control instructions (if else, switch case, for and while) are recognized and a module from a library of a reference technology is associated to each operator. A software has been developed to give these results in an automatic way [83].

Instruction-based power analysis has been presented in [79, 41] and ap- plied in many other works [34, 22]. The term “instruction” is used to indicate an action that, together with others, covers the entire set of core behaviors. At system level a core can be seen as a functional unit exe- cuting a sequence of instructions or processes without any information on their hardware or software implementations. Instruction-based power analysis associates an energy model to each instruction, for example, the one reported in Equation (3.2). The power model should be parametric in order to allow the reuse not only of the IP functional description, but of the power model too.

An example is the power model of an I2C driver reported in [22]; in this case two power models have been used: a model that associates a constant value to each block and instruction independently on the data transmitted, and a model with a linear dependence on switching activity, and clock frequency obtained during high level functional simulations. The instruction set of an I2C driver is reported in Figure 3.3.

Idle Master Slave Rx Set Master Tx Reset Set Master Set Slave Reset Tx Rx Wait

1 RsM Reset (Master state) 2 RsS Reset (Slave state) 3 RxM Rx (Master state) 4 RxS Rx (Slave state) 5 SMI Set Master (Idle state) 6 SMM Set Master (Master state) 7 SSI Set Slave (Idle state) 8 TxM Tx (Master state) 9 TxS Tx (Slave state) 10 WaI Wait (Idle state)

I2C instruction set:

FIGURE 3.3: I2C driver instruction set.

The second step of the instruction based power analysis, is the associ- ation of the power model to the functional model, as shown in Figure 3.4. Functional and power models are described in the same language (VHDL, SystemC ...).

The simulation of a complete SoC, that uses system level IP models, can be several hundreds times faster than an RTL simulation, so in a short time it is possible to evaluate hundreds of different configurations

IP Functional model instruction P o w e r T h r e a d

FIGURE 3.4: Power dissipation model added to the functional model.

and architectures in order to reach the desired trade-offs in terms of different parameters like speed, throughput and power consumption. The complete steps for instruction-based power modeling and analysis are reported in Figure 3.5.

System Level Functional Description

System Level Power Model Definition

Instruction Set Definition

Power Model Coefficients Estimation from Simulations

Integration of Power Model with Functional Description

System Functional and Power Simulation

System Architecture Exploration for Performance and Power Optimization

Characterizing simulations

Gate Level Circuit System Level RTL

FIGURE 3.5: System level power modeling and analysis.

3.2.2

Power Analysis Tools

A great effort has been put forth in the development of tools for a complete de- sign flow that can implement a top-down design methodology from high level modeling languages, i.e., C/C++, to silicon, see for example [20]. Some EDA companies started developing design tools with the goal of an automatic or semiautomatic synthesis from a subset of system level languages, for example RT level descriptions generated by SystemC co-simulation and synthesis tools. In recent years low level synthesis has been replaced by behavioral synthe- sis, as proposed for example in CoCentric SystemC Compiler and Behavioral Compiler by Synopsys, PACIFIC by Alternative System Concepts (ASC) and

Documento similar