• No se han encontrado resultados

Dimensiones de análisis y categorías de desarrollo para estándares de calidad en programas y

B. Transición al mundo del trabajo y desarrollo de carrera

III. Dimensiones de análisis y categorías de desarrollo para estándares de calidad en programas y

Since with the ESD structures at the pad a lot of capacity is added to the output node, the low-pass characteristics of the transmission channel increase. A usual technique for compensation is the integration of a passive coil circuit at the output

Chapter 6 Multi-Gigabit Transmitter 0 1000 2000 3000 4000 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Seg m en t Im p ed an ce Tuning Vector S1 S2 S4 S8

Figure 6.8: Variation of the impedance as a function of the tuning vector for

four different segment types.

node for bandwidth extension. Within this context the application of T-coils has been reasonable in the past, since they offer a even all-pass filter. Unfortunately the group delay is altered and therefore leads to output signal distortion. Since these overshoots consist of high-frequency content, they are also attenuated depending on the transmission channel. The topic which trade-off between bandwidth and signal distortion is necessary for a transmission channel is still under research at the computer architecture group.

6.3 Design

The development of a SST transmitter leads to very complex wiring structures. Especially the rather digital circuits benefit from a digital verification framework to ensure functional correctness. To keep track on the large mixed-signal design a design and verification methodology has been used, which allows faster system simulation by using different abstraction level models for full custom circuits [68]. In this chapter the design methodology is described and the use of real number

6.3 Design

models in the context of the transmitter full-custom part is elaborated to reduce simulation times and ease system verification.

6.3.1 Methodology

A SerDes implementation is an extensive development of high-speed mixed-signal circuitry, which also requires a well defined methodology for design and verification. For digital designers already many metric-driven verification frameworks exist, like the Universal Verification Methodology [6]. If it comes to analog design verification, usually only a SPICE simulation of a circuit against corners is done to meet a specification. This requires very well defined interfaces between these modules to ensure a correct behavior of the whole design later. In case of a SerDes development the complexity is even higher, because many rather small full-custom cells are combined to fulfill a superior function, like many current sources are interconnected to deliver a DAC or filter functionality. The verification of such a setup is very difficult, since the SPICE simulation time of analog circuits is magnitudes higher than a digital HDL simulation using EDA tools for digital design verification. Furthermore, while digital simulators solve logical expressions and show event-based behavior, analog simulations require a resolution of the continuous time, voltage and current values at every time step. To solve this problem, also different modeling languages are available, which allow a reasonable abstraction and thereby simulation in a shorter time frame. Nonetheless, a challenge is here to have an accurate representation of the analog circuit, and to keep models and implemented circuits in sync during the whole design process. Therefore, for the SerDes development a top-down methodology has been used, where the design is subdivided into several different cell types. They can just contain hierarchical structure information, which is described in Verilog, or synthesizeable logic, which is also described in Verilog. Further, cells modeling analog behavior can be integrated by a structural module, like depicted in fig. 6.9. The representation within such a leaf cell can contain different abstraction levels for several kinds of simulation. A functional simulation for example reveals erroneous connections and can be run very fast. Every leaf cell finally also has a transistor implementation, but the verification of the reference against the model is much easier to perform, since the same block-level verification test benches can be run on both for comparison. This methodology has been developed in

Chapter 6 Multi-Gigabit Transmitter Leaf Cell Leaf Cell Verilog Module Structural Module 2 Structural Module 1

Figure 6.9: Design hierarchy example using the top-down methodology from [68].

The leaf cells can contain different abstraction level models, while structural cells are only used for connectivity and do not contain any behavioral description.

[68], and allows a very efficient mixed-signal design-flow, regarding specification, implementation and verification.

6.3.2 Behavior Modeling

The verification and simulation of mixed-signal circuits is a challenging task, because transistor-level SPICE simulations are too time consuming on system level. Especially if all possible analog and digital interactions need to be covered. Thus, the creation of simulation models, which describe the analog circuit behavior, is mandatory. Unless they provide a small simulation speed-up, languages like Verilog-A are not suitable for mixed-signal simulation, because of the continuous- time modeling semantics and the need for a SPICE simulator. With Verilog-AMS or VHDL-AMS also event-driven circuit characteristics can be described. Moreover, compared to Verilog, with the wreal type discrete floating-point real-numbers can be used to represent voltage levels but still allow a use of the digital simulation environment. Therefore real number models are the most effective way to abstract

6.3 Design

Figure 6.10: Model accuracy versus performance gain for mixed-signal simulation

[15].

analog circuit behavior for a complex design simulation and verification, like depicted in fig. 6.10. To overcome the limitation that the wreal type can only carry one RNM value, SystemVerilog can be used. With the latest standard (IEEE 1800-2012 LRM) [1] also the use of User-Defined Types (UDTs) and User-Defined Resolutions (UDRs) is possible, which allows for multi-value nets and user-defined records. UDRs are functions which specify how the UDTs are combined and resolved, especially in case of multiple drivers.

To model the FIR behavior of the transmitter, where many driver segments are connected together depending on the emphasis settings, the real-number modeling capabilities of SystemVerilog have been used. The electrical package delivered within the Cadence Design Tools provide a UDT called EEstruct, which consists of three real values for voltage, current and resistance [19]. This allows to represent the weighted impact of a segment correctly, while current and impedance information are passed on. At the output node, where the segments are connected together, the resolution function calculates the final values conform to Kirchhoff’s laws. Special adaption of the package was required regarding the resolution functions since by default it can not be distinguished between current directions, but this feature is necessary for the FFE modeling. To verify the functional and electrical behavior of the whole mixed-signal circuit, like emphasis and impedance,

Chapter 6 Multi-Gigabit Transmitter

the output voltage levels are compared to a reference transmitter which sends the same pattern, while assertions monitor the impedance.

As a simple example, the EEnet description of a driver segment in SystemVerilog is presented. The type of segment is determined by the subsequent added resistor in series. 1 module MGT_TX_OBUF_DRIVER #( 2 parameter NUM_DRV_TUNE_BITS = 4 3 ) ( 4 input wSup VDD, 5 input wSup VSS , 6 input wire IN ,

7 input wire [NUM_DRV_TUNE_BITS−1:0] CFG_DRV_TUNE_P,

8 input wire [NUM_DRV_TUNE_BITS−1:0] CFG_DRV_TUNE_N,

9 output TX_EEnet OUT

10 ) ; 11 12 localparam RS_PFET = 1 5 0 0 . 0 ; 13 localparam RS_NFET = 1 5 0 0 . 0 ; 14 localparam I_PMOS = 2275 e −7; 15 localparam I_NMOS = 2275 e −7; 16 17 r e a l r_pfet , r_nfet ; 18 r e a l vout , rout , i o u t ; 19 20 wire d r i v e r _ i d l e ; 21 wire [ 3 : 0 ] tune_pfet ; 22 23 a s s i g n tune_pfet = ~CFG_DRV_TUNE_P; 24 25 // d e t e c t e l e c t r i c a l i d l e c o n d i t i o n 26 a s s i g n d r i v e r _ i d l e = ( (CFG_DRV_TUNE_N == {NUM_DRV_TUNE_BITS{1 ’ b0}} 27 && 28 CFG_DRV_TUNE_P == {NUM_DRV_TUNE_BITS{1 ’ b1 }}) 29 | | ( |CFG_DRV_TUNE_P===1’bx ) 30 | | ( |CFG_DRV_TUNE_N===1’bx ) ) ? 1 ’ b1 : 1 ’ b0 ; 31 32 always @( ∗ ) begin 33

34 // c a l c u l a t i n g output r e s i s t a n c e depending on tune vector

35 r_nfet = (RS_NFET / CFG_DRV_TUNE_N) ; 36 r_pfet = (RS_PFET / tune_pfet ) ; 37

6.4 Implementation

38 // a s s i g n i n g output voltage , current and r e s i s t a n c e values

39 casex ({{IN , d r i v e r _ i d l e }}) 40 2 ’ b10 : begin 41 vout <= VDD.V; 42 rout <= r_pfet ; 43 i o u t <= I_PMOS; 44 end 45 46 2 ’ b00 : begin 47 vout <= VSS .V; 48 rout <= r_nfet ; 49 i o u t <= −I_NMOS; 50 end 51 2 ’ bx1 : begin 52 vout <= ‘Z ; 53 rout <= ‘Z ; 54 i o u t <= 0 ; 55 end 56 endcase 57 end 58

59 a s s i g n OUT = ’{ vout , iout , rout , 0 };

60

61 endmodule

In this case on driver segment consists of an input for the digital values, two different tuning inputs for PFETs and NFETs, and the UDT output TX_EEnet, which delivers voltage, current and resistance information. Depending on the tuning vector r_nfet, respectively r_pfet are calculated, which together with the subsequent termination resistor determine the output impedance of the segment. Finally the individual voltage contribution is calculated with a resolution function, depending on current and resistance. If all tuning bits are unset, the segments output switches to high-z, which is equitable to electrical idle.

6.4 Implementation

The implementation of the transmitter is done using the benefits from the top down methodology and the real number based modeling of the FIR filter. Speed limitations from the the design kit technology require the division into a full digital

Chapter 6 Multi-Gigabit Transmitter Phase 0 Phase 90 Phase 180 Phase 270

X

00 01 TAP 1 TAP 0 TAP 2 TAP 3 TAP 1 TAP 0 TAP 2 TAP 3 TAP 1 TAP 0 TAP 2 TAP 3 TAP 1 TAP 0 TAP 2 TAP 3 02 03 04 05 06 07 08 09 10 11 12 13 14 15 12 04 08 00 Parallel Data In 13R 14R 15R delayed 11 03 07 15R 10 02 06 14R 05 01 09 13R 13 05 09 01 12 04 08 00 11 03 07 15R 10 02 06 14R 14 06 10 02 13 05 09 01 12 04 08 00 11 03 07 15R 15 07 11 03 14 06 10 02 13 05 09 01 12 04 08 00

Figure 6.11: The switch matrix connects the 16 bit parallel input data to the

four taps, while every tap contains four different phases.

semi-custom design part with synthesizable logic, consisting of a complex switch matrix and a multiplexer tree to support a segmented driver, and a full-custom multi-segment output buffer. In this section the SSTL design structure is described and techniques are elaborated how data dependent jitter is reduced and impedance tuning is realized. A big advantage of this mixed-signal design is, that all complex circuitry could shoved easily into the pure digital part since the structural verilog files interconnecting the full custom modules.

6.4.1 Switch Matrix

To feed the segments of the FIR with the correct bits, a switch matrix has been described to distribute the 16 bit parallel input data to the four phases of the quarter rate architecture (000, 090, 180, 270), while providing the four FFE taps (PRE, MAIN, POST1, POST2) at the same time. This means, that the signal bunch for one phase consists of signal groups for the four taps. It can be recognized, that in the phase 90 group subsequent bits of the phase 0 group are used and so on. The input bits 13, 14 and 15 are additionally registered, because the delayed

6.4 Implementation TAP 1 TAP 0 TAP 2 TAP 3 Tap Select Select Polarity FF FF L FF FF L FF FF L MUX MUX MUX Phase 0 Phase 90 Phase 180 Phase 270 Full Segment 16x clk / 2 clk

Figure 6.12: The digital segment structure of MUX trees.

values are also required. The data are selected in a manner, that by multiplexing them onto one bit stream, the correct sequence is obtained.

6.4.2 MUX Segments

The second unit in the digital part consists of 16 MUX segments, while every full segment is subdivided into four phases. This structure is depicted in fig. 6.12. The tap selection the individual assignment of a segment to one of the four taps, depending on the FIR settings. If the cursor should contribute with negative weight, also data inversion is possible by configuration. The multiplexer tree serializes the particular tap data into the quarter rate data streams, which are required by the analog core. The data are re-timed by flip-flops, while in one path the data are also saved with a latch to ensure a timing-correct selection by the subsequent multiplexer. The first MUX stage is also switched with a divided clock compared to the second MUX stage. Thereby the data rate is doubled. To ease the clock distribution within the digital part, the sequential logic runs with the same clock and therefore the output data of all segments are in phase. Since this part is fully described in digital logic, the fulfillment of setup and hold times is ensured by constraints and the semi-custom design flow timing analysis.

Chapter 6 Multi-Gigabit Transmitter OBUF Driver Latch Sync Data Buffer Clock Buffer Retime Stage Output Buffer N

Quadrature Clock Generation 4 4 4 4:1 MUX 4 Phases Tune_config Output Buffer P 16x R 4 4 Q Q Buf 4

Figure 6.13: The full-custom core driver, consisting of a retime stage and the

pseudo-differential output buffers. Segment impedance is deter- mined by the source series resistor.

6.4.3 Core Driver

The core driver represents the full-custom top module, which is subdivided into retime stages and the pseudo-differential pairs of output buffers. A retime stage takes the data from the semi-custom digital design interface and synchronizes them into the full-custom domain. Thereby the four data phases of the quarter rate are retimed with a particular clock to achieve enough positive slack to be safely selected by the output multiplexer. Additional data and clock buffers are needed, since input loads of the MUX and driver are very high compared to the standard cell logic flip flops used in the digital domain. The subsequent resistor determines the type of segment and its contribution factor to the output signal. Since this part of the PHY is designed without any simulation models for timing analysis, the fulfillment of setup and hold times needs to be verified for every corner. Moreover, since the transmitter also has to be able to run with different speeds, a fixed time relation between the digital part and the full-custom part is not possible. For that reason, a synchronization stage is used to shift the data in the digital domain to the correct position for safely sampling the data in the full-custom domain.

6.4 Implementation CLK_000 CLK_090 CLK_090 CLK_180 CLK_180 CLK_270 CLK_270 CLK_000 DATA_1 DATA_2 DATA_3 DATA_4 CLK_090 CLK_180 OUT CLK_000 DATA_1 DATA_2 N1 OUT CLK_090 N1 CLK_180 N2 N2 N3 N4 N1 driving N1 reload and boost 0 1 0 1 1 N1 driving N1 reload N1 boost N2 driving

Figure 6.14: The 4 to 1 multiplexer implemented with transmission gates and

using feed-forward charge injection to reduce the effective MUX output load.

The 4:1 multiplexer is built-up of two transmission gates in a row in the data path for every phase. Every transmission gate consists of two MOSFETs and needs complementary clocks. The two transmission gates in one path run with different clocks, with a phase difference of exactly one UI. Thereby the particular data value is sliced and driven to the output node of the MUX. A general problem of a quarter rate MUX is its very high output capacity. This leads to a maximum slew rate of 15 ns for the given process in the case that all other transmission gates have a different digital value and must be reloaded, even if no further output load is connected. Also a modification in size, resulting in higher currents, does not affect rise times, since capacities also increase in the same magnitude. Moreover, this estimation is only valid for ideal settings, like ideal clock pulses and power supply, but under meaningful simulation conditions, rise times are far more worse and have strong impact on the subsequent OBUF driver and the output signal. When using a transmission gate MUX also another problem comes into play, like depicted in fig. 6.14. Since the second pass gate in a row only drives a value for one UI, but is enabled for two UIs, the actual output load is even more big. If a clock pattern is sent, an output transmission gate also has to reload the internal node of the previous data path. Since the load can not be reduced, but the problem

Chapter 6 Multi-Gigabit Transmitter tuneP<0> tuneP<1> tuneP<2> tuneP<3> tuneN<0> tuneN<1> tuneN<2> tuneN<3> VDD VSS

Figure 6.15: The output buffer driver consists of a pre driver (green), the main

driver (blue) and the stacked impedance tuning transistors above and below.

only exists if a reload is necessary, a feed-forward charge injection technique has been used to simultaneously recharge the intermediate node. Thereby for every particular data path a second feed forward path is added using subsequent bit information to reload the intermediate node if necessary. In fig. 6.14 the principle of charge injection is explained for the path DATA_2. While CLK_090 activates the second pass gate in path DATA_1, only the first half of the clock period the intermediate node N1 drives the output node. During the second half, the pass gate is sill open, but N1 needs to be recharged from N2. To reduce the apparent load, a feed-forward charge injection path has been added to boost the same value to node N1 and improve output rise times. This technique has also been proposed in [37] to reduce output ISI by 77 %.

The subsequent output buffer driver consists of a predriver and main driver stage in series to improve rise times and deliver enough current to driver output loads. Although a larger buffer chain could further improve rise times, there is also a risk of increased duty cycle distortion and phase differences between the segments, since no further retiming is done. Impedance tuning is realized by two rows of four stacked transistors in parallel with binary weighted output resistances above and below, like depicted in fig. 6.15. By turning off tuning transistors it is possible to increase the output resistance of the output buffer driver. Since there are differences in the switching characteristics of NFETs and PFETs, they are tunable

Documento similar