• No se han encontrado resultados

DATOS IDENTIFICATIVOS Estética: Palabra e imagen

The fan-out of a net refers to the number of inputs driven by a particular output. High fan-out nets (that drive hundreds or even thousands of inputs) need to be handled differently from standard inter-connections. Note: For timing analysis we did adjust the pin limit (setUseDefaultDelayLimit) in order to treat them differently.

Every synchronous circuit has at least one high fan-out net, namely the clock net. For most circuits reset and scan-enable signals have to be distributed to each and every flip-flop as well.

The main problem with high fan-out nets is the large load capacitance that needs to be driven. Each driven input adds its own input capacitance to the total load capacitance and in addition, the intercon-nection required to distribute the signal to all these inputs increases the load capacitance further.

There are three important parameters for such nets:

Transition time This is the time it takes to change the logic level of a node (e.g. 0 → 1). Basically, the more load an output has to drive, the more time is required to charge this load. CMOS drivers consume additional short circuit current during the transition, therefore long transition times are not very welcome. Furthermore, noise on signals with long transition times can result in glitching. Most libraries set an upper limit for the transition time (for the technology we are using this is 1.79 ns for typical libraries). To lower the transition time, a tree of buffers can be inserted so that the total load is shared between the buffers. The lower the desired transition time, the more buffers are required.

Insertion delay The time required for the signal to travel from the driver to the end-points. This delay is usually different for each end-point. Each level of buffers in the buffer tree will add a delay to the signal.

Skew The difference between insertion delays of different end-points. To minimize skew, a balanced buffer tree has to be built. Generally, the lower the desired skew the more buffers are required.

What parameters are most important depends on the type of net:

Clock Our main concern is to reduce the skew, since it will effect our timing. The maximum skew depends on the clock period. As an example, for a 20 MHz clock a clock skew of 0.5 ns is acceptable. But for a 200 MHz clock, the same skew equals to 10% of the clock period and would be to high.

If you over-constrain your skew, you will need a deep (and large) clock tree and your insertion time will rise, which will affect your input and output timing. Therefore you will want to balance the skew against insertion delay and the number of buffers. Constraining maximum insertion delay too low will usually degrade results.

Usually, a tree that gives you an acceptable skew will also give you a decent transition time, so you don’t have to worry about that.

Reset We are interested in propagating the reset within one clock cycle to all flip-flops in our design.

For designs with on-chip reset synchronization this is strictly required. The insertion delay should therefore be less than the clock period, transition times within the bounds imposed by the technology and skew doesn’t matter at all.

Scan Enable Very similar to the reset signal. Usually a slower clock is used for scan testing, therefore we can allow even a larger insertion delay. For transition time and skew the same holds true as for the reset.

In CADENCE SOC ENCOUNTER , clock tree synthesis (CTS) is used to generate optimized buffer trees to drive high fan-out nets. It can be configured to satisfy a variety of constraints.

Student Task 26:

• A sample clock tree synthesis configuration file can be found under src/sample/chip.ctstch\

−sample. The sample file contains three different configurations for a clock, a reset and a scan enable signal.

• Copy this file to the src directory and adapt the ’AutoCTSRootPin’ statements to match your design.

• For educational purposes, change the clock tree specifications as follows: max. skew 0.2 ns, max. insertion delay 4 ns, max. transition time at buffers 0.6 ns and at clock pins 0.4 nsa

Take a closer look at the other two trees too.

a It is usually not a good idea to specify a small max. insertion time such that this becomes a limiting factor for CTS. Results may degrade significantly and for most designs the insertion delay is not very important anyway.

If the design employs a reset synchronization register (the example design has one) the source of the reset tree must be the output of the synchronization register. Note that there is a special option named SetASyncSRPinAsSync YES for the reset tree definition. This allows set and reset pins to be considered as targets for the clock tree optimization.

The scan-enable signal is also a special case. Normally the clock tree synthesis algorithm starts at the AutoCTSRootPin and traces through the netlist in order to find valid endpoints. Per default, combinational gates will be traced through and clock and asynchronous input pins of sequential elements (flip-flops) will be stopped at.

By specifying the NoGating rising option, we can make the tracer stop at the first gate encoun-tered. This is necessary since the scan enable signal is often connected to multiplexers and we want their input pins to be endpoints. Once this option is underway you need to specify theinternal pin of the pad driving the scan-enable signal, otherwise tracing will stop prematurely at the pad cell.

Student Task 27:

• Read in the clock tree specification by selecting Clock →Design Clock ... from the menu. Using the browser select the clock tree specification file you have just modified.

Press LOADSPEC. DON’T PRESS OK yeta. You should now see a summary for all three clock specifications on the console, check it.

• Our netlist may have some buffers on the high fan-out nets we want to build trees on. We need to remove them prior to CTS with the following command:

enc > deleteClockTree -all

a Pressing OK will start the clock tree insertion. We need to make sure that the clock tree specification is correct before we go ahead with this step. If you accidentally pressed OK here, it is advised to restart from the last saved point.

A large number of errors can be discovered by analyzing the pins connected to these nets, even before building a clock tree.

Student Task 28:

• Select Clock →Trace Pre-CTS Clock Tree .... To start the trace, click on the icon on the top left and accept the default trace file name. A summary will be displayed on the console and the content of the trace file visualized in the GUI.

We can see how the trees currently look like and what pins are connected to them. Look also at the trace file directly. Things to look for include:

• Clock, reset, or scan-enable connecting to unexpected input pins, e.g. the reset signal should not connect to pins other than asynchronous set/reset pins of sequential elements.

• Unexpected latches on the clock tree can be discovered this way (G or GB pin).

• Discrepancy between the number of endpoints of clock, reset and scan trees. For our example numbers are as follows:

– clock tree: 443 with 442 flip-flop CK pins + 1 RAM CK pin – reset tree: 441 flip-flop RB pins

– scan tree: 447 with 441 flip-flop SEL pins + 6 mux S pins, to choose between the functional and test (scan chain) output signal.

As we see, 442 flip-flops are clocked but only 441 receive a reset signal, this is due to the reset synchronization register being connected to the external reset signal rather than the internal reset tree. As the reset synchronization flip-flop is also not on the scan chain and we use full scan otherwise the 441 flip-flops on the scan tree match perfectly. You get the idea...

Student Task 29:

• Open the file chip.cts trace and search for Clock Tree to examine the leaf pins.

• If everything looks OK we can proceed with clock synthesis. In the SYNTHESIZE CLOCK

TREEform press OK.

After a few minutes clock tree synthesis will be completed. Detailed reports will be generated under the directory specified on the form (most likely clock report). This directory includes a simple report file (clock.report).

A summary report is also displayed on the CADENCESOC ENCOUNTER console. The first column shows the achieved performance while the second column reports the target specified in the config-uration file.

Student Task 30:

• Check your results (summary and detailed reports). How many buffers were added? How many levels created? What’s the insertion delay? Are all constraints met?

Note 1: You will get a max transition time violation on ClkxCI_PAD/I which can safely be ignored. As we have specified an input transition time of 800 ps on all primary inputs there is no way CTS could fulfill the 600 ps requirement at this point.

Note 2: Unless the ‘‘RouteClkNet YES’’ option was used (more on this later), the timing figures reported are only estimates and might change quite a bit with detailed routing.