Alud del 10-02-1999 - Anexo: Campañas experimentales

in the metal strips on neighboring tracks with a small but positive offset, proved to be problematic in the manufacturing of the chip.

For the placement algorithm, the authors show that sets of 3 transistors that can be placed without a gap can be found by applying Ullmann’s subgraph isomorphism algorithm [Ull76] to a certain graph that models the netlist and one of six “connection subgraphs” of constant size. To find all placements with a minimum number of diffusion gaps the algorithm enumerates sets of such elementary chains that may overlap. Then all possible placements of minimum size are evaluated in terms of routing density and expected stage- like line-end gaps and the best one is returned. In the detailed layout generation after the placement, a maximum independent set is computed to op- timize the number of tracks required on the second metal layer. Finally, a branch and bound method finds a routing on this layer that is optimal in terms of the proposed measure for the manufacturability.

3.6 Stacked Devices

Hill [Hil85] presented the first system, called SC2, that was able to perform

FET folding (cf. Section 2.3.1), also called stacking in the academic literature, and thereby introduced one of the most important features of today’s CMOS layout. SC2 also splits large cells into smaller sub-instances, subsequently

processing each of those independently. The placement algorithm first cou- ples n-FET/p-FET pairs with the same gate net, then rearranges these pairs using a Kernighan-Lin heuristic, and finally flips single transistors in an ex- ponential yet practically feasible branch and bound method such that the number of diffusion gaps is as small as possible. The folding of transistors is done by simply splitting large FETs into smaller ones and connecting them in parallel. The author states that a practical upper limit for the number of FETs is 200–400.

In the development of POLLUX [MD88] not only cell area was targeted, but

the routability was considered as well. The resulting algorithm finds in- put orderings with a small (yet not necessarily optimal) number of diffusion gaps. Solutions are preferred if a routing exists that does not require more than a given number of horizontal tracks. The author’s approach also supports transistors of non-uniform size, including a greedy a priori folding of transistors which are too large to fit in a row. Moreover, additional layout styles using two metal layers and power strips near the cell’s center are proposed.

After folding has been incorporated in very simple ways, if at all, into au- tomatic layout systems during the 1980s, Gatti et al. [GML89] defined what they called “full stacked layout”, thereby initiating the systematic research on transistors with flexible aspect ratios. The authors propose to exploit the

fact that very long transistors, whose geometric form traditionally needed a large aspect ratio, can also be realized using more than one gate contact by connecting the alternating source and drain contacts with comb-like wiring. With this technique, the geometry of the FETs can be closer to a square, a property which usually has a good impact on placement tools. In [GH98] four variants concerning the possible interdependence of folding and placement algorithms were formulated:

• Static placement, static folding: For given orderings of the FET rows, i.e. a placement, and given finger numbers for each FET, find orientations for the FETs such that the number of diffusion gaps, i.e. the cell area, is minimized.

• Static placement, dynamic folding: For a given placement, find finger numbers and orientations for the FETs such that the cell area is minimized.

• Dynamic placement, static folding:For given finger numbers, find a placement, including orientations, such that the cell area is minimized. • Dynamic placement, dynamic folding: Among all possible folding config-

urations and placements, including orientations, find the solution that minimizes the cell area.

A notable example for a system with static placement and dynamic folding is LIB [HHLH91]. The layout generation tool roughly follows the classical

two-row style, but reserves non-rectangular areas between the rows for the routing. After forming initial FET clusters to reduce the instance sizes to a manageable magnitude, an optimal transistor chain is formed for each clus- ter. The placement of the chains is done using a Kernighan-Lin type heuristic to reduce the routing density as much as possible. After a placement has been decided upon, a folding module chooses finger numbers and FET orientations.

An exact algorithm for the problem of static placement and dynamic folding is presented by Her and Wong [HW93]. Among all minimum-width placements the algorithm returns the folded placement which minimizes the cell height, defined as the maximum of the sum of the n-FET’s height and the p-FET’s height on a given track. The runtime of the dynamic programming approach is polynomial in the number of required placement tracks.

An extension of the technique in [WPF87] is presented by Malavasi and Pan- dini [MP95]. It reduces symmetries and combines the algorithm with a branch and bound approach, yielding an algorithm that does not only arrange transistors within the cell but also determines favorable aspect ratios, i.e. folding styles, for every transistor. The algorithm provably optimizes a complicated target function that models cell area as well as favorable electrical properties. XPRESS[GTH96] supports transistor folding in a framework that primarily

3.6. STACKED DEVICES 41 secondary optimization goal. Although the transistor chain generation is exhaustive in nature and thus finds optimal solutions among all possible transistor foldings, the heuristic parts of the method like the pairing of n- and p-FETs, which is performed before the placement algorithm starts, prevents it from finding globally optimal solutions. XPRESSsupports arbitrary netlist

structures but does not modify the given topology.

Instead of just the area, the algorithm from [BR96] minimizes a target function that favors diffusion sharing but also incorporates performance consid- erations in the form of criticality weights for parts of the cell. In fact, area minimization is the special case for uniform criticality weights. The method also addresses specific symmetry constraints appearing in analog circuits. It finds in linear runtime a global minimum of the target function for a given folding of the transistors by covering a modified circuit graph with a minimum number of walks.

CELLERITY[GMD+_{97] supports transistor folding in an exhaustive way: For} every possibility to fold a subset of the transistors, the whole design flow is executed and the layout with the fewest number of tracks is returned. Instead of just allowing the classical 1-dimensional cells with two rows, a three-row image with n-FETs in the center and p-FETs near the top and bottom cell bor- der as well as a four-row image with up to two vertically neighboring n-FETs near the bottom and two p-FETs in the top half are supported. The software is able to place external ports within the cell’s boundary and includes a maze router that is not limited to a single routing layer.

An algorithm that finds in timeO(n2log n), where n is the number of transis-

tors, a folding configuration with a provably optimal cell area is presented in [KK97]. In this work, the width of the 1-dimensional cell is only influenced by the number of fingers in the two FET rows and the height is only influenced by the highest n-FET and the highest p-FET. Diffusion sharing is not considered in this model.

Assuming a dual netlist structure in which n-FETs and p-FETs always come in pairs, [Ber01] describes a dynamic programming approach that optimally solves the problem of dynamic placement and dynamic folding as an extension of the branch and bound method introduced in [BYFPW89]. The pri- mary optimization target is the cell width and the secondary target models the desire to use for each FET, if possible without increasing the total width, as few fingers as possible.

Theoretical results concerning the computational complexity of several variants of the problem to find minimum-width transistor chains have been found by Weyd [Wey11]. He augmented the linear-time solvable problem as considered in [ML86] with several aspects employed in transistor-level layout since then, including folding (with prescribed or variable finger numbers) and more complex rules regarding the distance between neighboring FETs. We revisit and extend these results in Section 4.2.

In document Anexo: Campañas experimentales (página 84-91)