logo资料库

CMOS—D触发器原理.pdf

第1页 / 共5页
第2页 / 共5页
第3页 / 共5页
第4页 / 共5页
第5页 / 共5页
资料共5页,全文预览结束
1454 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL 24, NO 5, OCTOBER 1989 show that the total augmented area overhead is nearly 15 to 25 percent over the original PLA, but the chip yield can be im- proved sigruficantly It should be mentioned that the signal So = 0 is to isolate the shift registers from the orignal PLA’s dunng the normal operation. Thus, the use of shift registers will not affect the propagation delay in the design. In addition, since the shft registers are used only for fault-hagnosis purposes, they may possibly be shared by many PLA’s to reduce the chip area overhead Most existing testable PLA designs for fault detection employ shift registers that are capable of only WRITE operation and consume larger chip area than the proposed shift registers that are capable of both WRITE and READ operations. Therefore: the proposed shift register can be implemented for the testable PLA design in less c h p area clk d clk REFERENCES Master Latch I Slave Latch (4 [1] C L Wey, M K V a , and F Lombardi, “On the design of a redundant programmable logic array (RPLA),” IEEE J Solid-state Circuits, vol SC-22, no 1, pp 114-117, Feb 1987 [2] C L Wey, “On yield considerations for the design of redundant pro- grammable logic arrays,” IEEE Trans Comput a d e d Des , vol 7, no 4, pp 528-535, Apr 1988 [3] N Wehn, M Glesner, K Caesar, P Mann, and A Roth, “A defect- tolerant and fully testable PLA,” in Proc 2Sth ACM/IEEE Des Au- tomut Conf (Anaheim, CA), June 1988, pp 22-27 [4] S Y Kuo and W K Fuchs, “Fault diagnosis and spare allocation for yield enhancement in large reconfigurable PLA’s,” in Proc Int Test C o n f , Sept 1987 [5] F Somenzi and S Gal, “Fault detection in programmable logic arrays,” [6] R Tieuer, H FujlWara, and V K Agarwal, “Implementing a built-in Proc IEEE, vol 74. no 5, pp 655-668, May 1986 self-test P I A design,” IEEE Des Test Comput , pp 37-48, Apr 1985 [7] C L Wey and T Y Chang, “Design of programmable logic arrays with diagnosability,” Dept of Elec Eng , Micbgan State U n ~ v , Tech Rep , Oct 1988 Behavior Analysis of CMOS D Flip-Flops H JONATHAN CHAO, MEMBER, IEEE, AND CESAR A JOHNSTON Absfruct -In this paper, we analyze two D flip-flops (DFF’s) generally considered to be the fastest (and most widely used), and compare their speed performance and their robustness against clock skew when a two- phase clocking scheme is applied. The effect of clock skew on their speed and proper logic operation is analyzed,and verified with SPICE simulation. I. INTRODUCTION Synchronous pipelined structures have been extensively used to obtain high-speed digital systems [l]. Pipelined structures are used to reduce the propagation delay between two registers (D flip flops, or DFF’s) where data and signal flow are synchronized with clocks. The clock’s minimum period is constrained by the DFF’s setup and delay times as well as the delay of any combina- tional circuits between two DFF’s. Therefore, when building synchronous pipelined structures, we want to reduce the DFF’s setup and delay times as much as possible so that more stages of combinational logic can be inserted between the pipelined regis- ters. Manuscript received February 3, 1989, remsed Apnl 11, 1989 The authors are with Bellcore. Morristown, NJ 07960 IEEE Log Number 8929744 1 , Master Latch I Slave Latch (b) Fig 1 Two dynamic DFF structures (a) DFFA and (b) DFFB In this paper, we try to find the fastest and most robust CMOS DFF. Since dynamic DFF‘s provide better speed performance than static DFF’s, we consider only dynamic DFF’s. In some applications, such as those where circuits may be exposed to high radiation, static DFF’s are used in spite of their slower speed and larger silicon area. Several static DFF’s along with their charac- teristics can be found in [2]-[4]. A variety of CMOS dynamic DFF configurations are presently available In this paper two DFF’s generally considered to be the fastest and most widely used are chosen and shown in Fig 1. They both use two-phase clock signals, clk and c2k. The dynamic DFF in [5] with inverted output and using a single-phase clock will not be considered here because of its lower speed compared with these two. Most conventional CMOS DFF’s employ two nonoverlapping clocks for proper operation But when a chip’s operating speed is above 100 Mbits, it is difficult to generate two nonoverlapping clocks and control the clock skew properly in a very large-scale chip because of the cumulative statistical vanations of compo- nents in the clock distribution path. Thus, when the speed performances.of these two DFF’s are compared, their robustness against clock skew (the extent of acceptable clock skew) should also be taken into account. The effect of clock skew on their speed and logic operation will be analyzed and verified with SPICE simulation. One possible solution to the clock-skew prob- lem is the use of four clock phases, but this requires considerable silicon area for the clock lines and thus will ,not be considered in this paper. The two DFF’s, consisting of a pair of master and slave latches, are referred to as DFFA and DFFB, respectively, in ths paper. All circuits were laid out, compacted, and then extracted based on a commercial 2-pm CMOS production process When these circuits were laid out, their diffusion areas were shared whenever possible; for instance, at DFFA’s node m (Fig. l(a)) 001 8-9200/89/1000-1454$01 .OO 01 989 IEEE Authorized licensed use limited to: IEEE Xplore. Downloaded on January 7, 2009 at 06:53 from IEEE Xplore. Restrictions apply.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 24, NO. 5, OCTOBER 1989 DFFl Logic DFF2 . I I Fig. 2. Basic pipelined structure. the diffusion areas of the inverter and transmission gate were shared. All p transistors have a width of 27pm with the exception of transmission gates, where they are 12 pm. All n transistors have the same width, 12 pm. The physical layout areas in square micrometers (pd) of the DFF‘s are listed below: DFF I Mask Area (pm2) DFFAl DFFB 5162 3888 It can be seen that DFFB requires less layout area than DFFA, since in its layout most of its transistors’ diffusion areas can be shared. The behavior analysis of the two DFF‘s using two-phase clocks is given in Section 11. Section I11 shows experimental results to support the analysis. A summary is given in Section IV. 11. BEHAVIOR ANALYSIS The maximum operating speed of a chip can be determined by analyzing the critical path of a pipelined structure circuit, as shown in Fig. 2, which consists of two positive edge-triggered DFF’s and some combinational logic in between. An input signal a is latched to o on the clock‘s ( c k ) rising edge. Before the next clock’s rising edge, the p signal should be stable at DFF2’s d input for proper operation. This requires that the clock‘s period T be no less than the sum of DFFl’s delay time, the combinational logic delay time, and DFF2’s setup time, or 2 td.DFFl + top + ts.DFF2 (1) where t, is the delay time, to, is the propagation time from node o to node p , and t, is the setup time. If DFFl and DFF2 have identical circuitry, as is the usual case, for a given combinational logic circuit the pipelined structure speed is dependent on the t, + t , value, also called the DFF’s speed figure and used to compare different DFFs’ performance. A. DFFA Timing Analysis In order to latch the d value at node x in Fig. l(a) before clk goes to low, the timings shown in Fig. 3 have to meet the following condition: delay time t, is given by t , = th/, - txq. Therefore, DFFA’s speed figure is t, + t, = t,, + txq. 1455 . ( 4) (5) The above derived equations are valid as long as the master-slave structure of DFFA is preserved. But DFFA’s proper operation might be affected under certain conditions. For example, con- sider the case when the input d changes right at t3 as shown in Fig. 3. If thl is not less than the propagation delay time from input d to node y(tdy), the new d value will be latched at y goes to high (t,) and appear at the output q after t,. before Consequently, the connection between d and q breaks the mas- ter-slave structure, causing a “race” problem. The race problem may also occur during tl to t2. Although we allow for d changing during t, to t,, t,, must not be greater than the time t,, in order to avoid the race problem. The conditions for proper DFFA operation can then be written as follows: From the above analysis, DFFA’s speed figure seems insensitive to clock skew. But if we model a transistor as a linear resistor with two diffusion capacitors, we find that the DFFA speed performance is aflected by clock skew. The linear resistor value is in the range of kilohms for turned-on transistors and megohms for off transistors. Let us assume that DFFA is driven by an inverter as shown in Fig. 4(a) and that the inverter is in the pull-up mode. The equivalent circuit for the input inverter and its driving transmission gate are given by Fig. 4(b), where Reff is the effective turned-on resistance of the transmission gate, C’ is an inverter’s input capacitance, and C, is the diffusion capacitance. During to to t, (see Fig. 3), both Pl and NI transistors are turned on; Reff is the parallel combination of their turned-on resistances and is lower than either one of them. During the time from t, to t,, Pl is turned off and Reff is approximate to N,’s turn-on resistance. We can see that the RefE value increases due to clock skew and this increases the delay time from node d to x, tdx. This effect is illustrated in detail as follows. The delay time from node a to d ( t u , ) can be found in [6] and is approximately tu, = K [ R,(C, +C, + C, + C,)] ( 7) where K is a constant and R, is the equivalent pull-up resis- tance. Similarly the t,, delay time is given by tax c K [ ~ p ( c d +C + C, + c g > + Reff(c, + ~ g > ] . (8) + thh d t , The t,, ( = tu, - to,) is then given by (2) Thus, the setup time t, is given by t , = td, - thh. ( 3) If tbh is greater than t d l , the setup time is negative. This means that the input d changes between tl and t, and has been latched at node x. Since we allow the setup time to be negative, the hold time of the DFF is zero. At time t,, the d value latched at node x is stable and will appear at the q output after the period of time txq. Thus, the Equation (9) shows that tJ, is dependent on R,,,. But since Reef is affected by clock skew, DFFA’s speed figure is consequently affected as well. B. DFFB Timing Analysis For DFFB, shown in Fig. l(b), the cases of distinct inputs (low and high) are analyzed separately since they lead to different results. Authorized licensed use limited to: IEEE Xplore. Downloaded on January 7, 2009 at 06:53 from IEEE Xplore. Restrictions apply.
1456 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL 24, NO 5, OCTOBER 1989 high = Time of both clocks low t t dX = Propagation time from d to x t xq = Propagation time from x io q Fig 3 Clock traces for DFFA behawor analysis ; clk clk i clk (b) Fig 4 Equivalent circmt for DFFA x 6 t d x * ~ L E H - -4 t,, Fig 6 Clock traces for DFFB behavior analysis w t h input = h g h the clock rising edge (t,). Thus, the setup time is (10) Since at time tl the value latched at the intermediate node x is high, the slave latch is enabled and the output q starts to evaluate the x value at tl. Thus the delay time is t, = t,, . td = txq (11) If the input d changes to high between t, and t, and is latched at t,, the setup time is negative; this belongs to the case of the input d equal to high, whch will be discussed later. Hence, the hold time th is defined to be zero. As long as the total number of inversions between the two clocked CMOS (C2MOS) [7] stages,is even, there will be no clock-race problem [8]. Here, we will briefly illustrate that DFFB is indeed race-free. Considering the situation where input d changes to high between t, and I,, the x value will not be affected because during this period both master and slave latches are “low” enabled, which means that only a low value will turn on the latch. We conclude that the function of DFFB with input equal to low is immune to clock skew and no race problem is present. 2. Input = Hzgh: Clock traces and time segments (to to t d equal to high are given in Fig. 6. For this case, the master latch is enabled from to to t, In order to latch the d inverse value at node x at t,, the input d has to be stable for a time t,, before Fig 5 Clock traces for DFFB behawor analysls wlth input = low t d 1. Input = Low: Clock traces and time segments (to to t s ) for input d equal to low are given in Fig. 5. Betwe master and slave latches are enabled if the intermediate node x are high. For d equal to low, the master latch is enabled before t,. In order to latch the d inverse value at node x at t,, the input d has to be stable for a time td, before Authorized licensed use limited to: IEEE Xplore. Downloaded on January 7, 2009 at 06:53 from IEEE Xplore. Restrictions apply.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 24, NO. 5, OCTOBER 1989 period T, or t, = T - td, . 1457 (15) clk odd number of inverters clk the clk falling edge (tr). Thus, the setup time is Once the d inverse value is latched at x, it takes t xq to appear at the output q. Hence, the delay time is from time t , Based on our definition of the setup time, which can be negative in some cases, the hold time t, is zero. For the situation where input d changes to low between t3 and t,, the x value is changed to high as shown in Fig. 6. Since during this period both master and slave latches are “low” enabled, the new updated value at node x will not propagate to q. Therefore, there is no “closed” path between the input d and the output q in any circumstance and the master-slave structure is always preserved. As a result of finding two different sets of timing values for DFFB, the speed figure has more than one value. The worst-case value, t,, + thh + t,, (or t,, + th/ + txq in some cases), is obtained by adding the t, in (10) and the t, in (13). We conclude that DFFB’s speed figure is dependent on thh but no face problem is present, unlike DFFA which can fail to perform properly due to breakdown of the master-slave structure. 111. SIMULATION RESULTS A way of testing DFF speed in a pipelined structure is shown in Fig. 7. A divide-by-two counter is formed by feeding DFF output q to an odd-number inverter chain and back to the DFF input. The physical layout of the pipelined structure associated with each DFF was created and extracted for SPICE simulation. The frequency of the clk signal is increased up to the point when the divide-by-two function fails, which corresponds to the config- uration’s maximum operating frequency. From the timing dia- gram in Fig. 7, we can find the DFF‘s setup time t, and the DFF’s delay time plus the propagation delay through the inverter chain tdr. The ring-oscillator inverter chain delay to,, was found to be 2.59 ns, half of the ring-oscillator period. This value is then subtracted from t,, (in Fig. 7) to obtain the DFF‘s delay time td, or Similarly, t, is obtained by subtracting t,, from the clock’s In the case of no dock skew, SPICE simulation results for each DFF are summarized in the following table. It is noted that the DFFA is faster than the DFFB by 0.1 ns. I 0.81 I DFF I t~ (ns) I td (ns) I t~ + td (ns> DFFAI 0.8 DFFB 0.84 1.61 1.71 0.87 The speed performance, affected by the clock skew in both directions (leading or lagging), was also verified by SPICE simu- lations. The procedure we followed to find the maximum accept- able clock skew for the DFF‘s was: find the maximum operating frequency for each DFF with no clock skew; retain this frequency-delay or advance the signal clk with respect to clk until the divide-by-two function fails. The maximum acceptable clock skew for the DFF’s, operating at their maximum clock rates, is given in the table below, where tskr is for the case of clk leading and tskl is for the case of clk lagging: m t - x - j r DFF max. tskr (ns) max. tskl (ns) DFFB 0.4 0.3 The result shows the magnitude of the clock-skew effect on the maximum operating speed. Although in terms of speed perfor- mance DFFB is more sensitive to clock skew than DFFA because its speed figure contains the clock overlapped interval, while operating at lower speed, DFFB’s functionality is not as affected by clock skew as DFFA’s. For example, by operating the test circuit at a clock rate of 100 MHz, the circuit with DFFA fails at a skew of 2.6 ns while DFFB showed no race problem. The reason that the effect of leading clock skew t9 DFFA differs from that of lagging clock skew is due to the fact that the size of the p and n transistors of the transmission gate are identical, and this leads to different effective resistances for the transmission gate when both clk and clk are high or both low. Both DFFA and DFFB can be driven by a single-phase clock if an inverter is used to generate the clk signal locally. Simdations showed that their speed performance is not degraded. IV. S U m R Y Two high-performance dynamic DFF‘s (shown in Fig. l), generally considered to be the fastest (and most widely used), were analyzed to understand their maximum speed performance and to determine any race problems related to clock skew. A divide-by-two test circuit was used as a benchmark to compare the DFF’s. If a single-phase clock is used in a chip and clk is generated locally, DFFA is recommended due to its fastest-speed perfor- mance, although its use carries a slight silicon real-estate penalty. Examples using DFFA are Batcher and banyan chips (2-pm CMOS) for a fast packet switch network (140 Mbit/s), where clk is generated locally with an inverter to drive two DFFA’s [9]. On the other hand, if a two-phase clock is distributed throughout the Authorized licensed use limited to: IEEE Xplore. Downloaded on January 7, 2009 at 06:53 from IEEE Xplore. Restrictions apply.
1458 entire chp, since the balance between clk and clk capacitive loading is difficult to control, DFFA is not suggested for use unless the designer can assure that the clock-skew amount is bounded. Instead, DFFB is recommended because its logic oper- ation is immune to clock skew and it offers a smaller silicon area. . For example, DFFB’s have been successfully employed in a SONET-like Framer chip (2-pm CMOS) [lo], which is applicable in a broad-band ISDN to facilitate high-speed data interface at a Inside the chip, clk and clk are bit rate up to 210 Mhit/s broadcast throughout the chip through a four-stage buffer, driv- ing about at 10-pF capacitive load. REFERENCES IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL 24, NO. 5, OCTOBER 1989 ’“out I I I I / / 0 0 7 Bits 8Bits Fig 1 Two-stage high-resolution D/A converter J R Jump and S R Ahuja, “Effective pipehmng of digital systems,” IEEE Trans Computers, vol C-27, no 9, pp 855-865, Sept 1978 L Spaanenburg, W Pollok, and W Vermeulen, “Novel switched logc CMOS latch building block,” Electron Lett vol 21, no 9, pp 398-399, Apr 1985 H Hatano, K Dol, and J Iwamura, “A 256-channel C2MOS LSI time switch using shift-register pipehne multiplexer,” IEEE J Solid-State Circuits, vol SC-22, no 2, pp 251-254, Apr 1987 0 W R Orton, “Novel CMOS latch with clock hysteresis,” E!ectron Lett vol 23, no 23, pp 1221-1222, Nov 1987 Y Ji-Ren I Karlsson. and C Svensson, “A true single-ohase-clock dynamic CMOS circuit techmque,” IEEE J Solid-State dircuits, vol SC-22, no 5, pp 899-901, Oct 1987 J Rubinstein, P Penfield, and M A Horowitz, “Signal delay in RC tree networks,” IEEE Trans Computer-Aided D e s , vol CAD-2, no 3, pp 202-211, July 1983 Y Suzuki, K Odagawa, and T Abe, “Clocked CMOS calculator cir- cuitry,” IEEE J Solid-Stute Circuits, vol SC-8, no 6, pp 462-469, Dec 1973 N F Goncalves and H J D Man, “NORA A racefree d y n m c CMOS technique for pipelined logic structures,” IEEE J Solid-State Circuits, vol SC-18, no 3, pp 261-266, June 1983 C Day, J Giacopelli, and J Hickey, “ApplicaQons of self-routing switches to LATA fiber ootic networks.” in Proc Int Switchine S V ~ D (Phoenix, U), Mar 1987‘ H J Chao, T J Robe, and L S Smoot, “A 140 Mbit/s CMOS LSI framer chip for a broad-band ISDN local access system,’’ IEEE J Solid-Stat; Circuits, vol 23, no 1, pp 131-141, Feb 1988 - , / High-Resolution Low-Power CMOS D/A Converter JOHN W YANG AND KENNETH W MARTIN, SENIOR MEMBER, IEEE Abstract -A very low-power, high-resolution, medium-speed D/A con- verter is described. The converter was realized using a standard analog CMOS technology. It achieved 15-bit monotonicity and less than 0.07-percent overall linearity at a clock frequency of 100 kHz, without requiring any trimming or calibration. The measured SNR was 85 dB, the power dissipation was less than 10 mW, and the distortion for a sinusoidal output was less than 0.04 percent. The D/A converter is intended for battery-powered speech and music synthesis applications where high dy- namic range, low power, and low cost are all important. I. INTRODUCTION Many electromc systems require D/A converters with good dynamic range, resolution, and monotonicity, but do not neces- sarily require high absolute accuracy and linearity. Examples of Manuscript received January 26, 1988, revised January 4, 1989 This work was supported in part by the National Science Foundahon under Grants ECS-8105166 and ECS-8451260 and by the state of C&forma/Hughes A r - craft Company Microelectronics Research Grant D860134 The authors are with the Integrated Circmts and Systems Laboratory, Department of Electrical Engineenng, Umversity of Califorma, Los Angeles, CA 90024 IEEE Log Number 8929553 ’ this include low-cost speech and music synthesizers, high-resolu- tion graphics plotters, control systems, and servo systems. In many applicabons, power dissipation and cost are also important factors. A CMOS D/A converter is a suitable choice for these applications. One example of such a D/A converter is described in [l]. An alternative, independently developed, converter is described here. It has a voltage output, all of the required diptal and clock generation circuitry is included on chip using a reason- ably small area, and it does not require trimming or calibration. The converter is based on a combinabon resistor-stnng, capaci- tor-array approach, and achieves 15 bits of monotonicity and an 85-dB signal-to-noise ratio (SNR) for a 100-kHz clock frequency. The measured hnearity errors are less than 0.07 percent and the total distortion is less than 004 percent for sinusoidal outputs The total power dissipation is less than 10 mW, which makes the D/A converter an ideal candidate for battery-powered applica- tions. , 11. D/A ARCHITECTURE The architecture chosen for the D/A converter is based on a two-stage approach where the most significant bits (MSB’s) select two adjacent nodes of a resistor string and the least significant bits (LSB’s) control the binary-weighted programmable capacitor array (PCA) in a precision switched-capacitor (SC) amphfier [2]. In addition, a sign bit can be used to control the clock phases of amphfier to allow for either inverting or noninverting operation. This approach is similar to that used previously [3] for a successive-approximation A/D converter. If the PCA is mono- tonic, it is guaranteed that the entire D/A converter will be monotonic. However, the absolute linearity of the D/A is re- stricted to the accuracy of the voltage division of the resistor string (assuming the PCA is linear within one half of an LSB). A simplified schematic of the D/A converter is shown in Fig. 1. The SC gain amplifier is a precision amplifier that has been reported prevlously [4], [5]. It has many desirable features that are especially important in this application. Perhaps most impor- tant, the l/f output noise of the amplifier is reduced by an 0018-9200/89/1000-1458$01.00 01989 IEEE Authorized licensed use limited to: IEEE Xplore. Downloaded on January 7, 2009 at 06:53 from IEEE Xplore. Restrictions apply.
分享到:
收藏