logo资料库

采用65nm制程技术设计的56Gbps PAM4 CDR接收器.pdf

第1页 / 共4页
第2页 / 共4页
第3页 / 共4页
第4页 / 共4页
资料共4页,全文预览结束
Design of 56 Gb/s PAM4 Wire-line Receiver With Ring VCO Based CDR in a 65 nm CMOS Technology Fangxu Lv1,2, Jianye Wang1*, Dengjie Wang2, Yongcong Liu1, Ziqiang Wang2* 1 Air and Missile Defense College, Air Force Engineering University, xi’an 710051, China 2 The Institute of Microelectronics, Tsinghua University, Beijing 100084, China E-mail: wangziq@mail.tsinghua.edu.cn; lvfangxu1988@163.com.cn; Abstract This paper presents a 56 Gb/s 4-level pulse amplitude modulation (PAM4) wire-line receiver, which employs a quarter rate architecture. By employing a ring voltage control oscillator (VCO) based clock and data recovery (CDR) with separate proportional path, the complexity, power consumption and area can all be reduced. To reduce the noise of the detector and improve the stability of the CDR, both the major and minor transitions with the central crossover point are utilized to extract the phase error. The receiver is designed in a 65nm CMOS technology and supplied with 1.2V. The simulation results show that the proposed PAM4 receiver can work at 56 Gb/s with 76 mW consumption. 1. Introduction The rapid development of many applications, such as internet of things (IoT), big data and cloud computing leads to bandwidth exponential growth demand on computer systems, communication networks, and the bulk of consumer electronics [1]. This increasing bandwidth demand has pushed serial-link towards data-rates of 40 Gb/s or beyond [2-3]. Nevertheless, as data rate increases, the bandwidth constraints imposed by the channel, package, and ESD become more severe, which severely limits the development of the non-return-zero (NRZ) signaling. The 4-level pulse amplitude modulation (PAM4) is drawing widely attentions due to its two-fold bandwidth efficiency than NRZ signaling[4]. However, the design of the PAM4 receiver is more difficult than that of the NRZ receiver, especially when data rate reaches 56 Gb/s. Clock and data recovery (CDR) is the most challenging part of the PAM4 receiver design, since it has to deal with multi-level. All of the three slicer’s outputs have side crossover point in transition, which inevitably introduces extra noise of phase detector (PD) and leads to the CDR unstable. In [4], the CDR only uses the middle slicer output’s signal to extract the phase error, because it has more central crossover point transitions than others. However, the middle slicer output also has sidle crossover point transition. In this paper, by employing a transition filter, all useful transitions generated by the three slicers with central crossover point are utilized to extract the phase error. In addition, to reduce the complexity, power consumption and area, a ring VCO based CDR with a separated proportional path is applied to provide local clock. 2. Architecture Figure 1 illustrates the PAM4 receiver architecture. It includes time includes a continuous equalization, slicer, sampler, decoder and CDR blocks. The equalization, which linear equalization (CTLE) with an amplifier (AMP), is utilized to compensate the insert loss of the channel and amplify the 4-level signal. The slicer contains three differential comparators with three different reference levels. In the slicer, the 4-level signal is converted into 3 path parallel NRZ signals, which are thermometer-coded. Each of the 3 path NRZ signals is sampled by two sets of four cascade samplers. Thus, the data and edge information of each path can be obtained with the 8-phase clocks. The CDR with a separated proportional path adopts a ring VCO. The phase error between the input data and local clock is extracted by a detector consisting of a filter, 16 BBPDs and a major vote. The main function of the filter is removing the transitions with sidle crossover point, which introduces extra noise in the phase detector and deteriorates the stability of the CDR. The ultimate lead-lag information generated by the major vote adjusts the VCO until the phase of the VCO is aligned with that of the input data. In the PAM4 decoder, the 3 path digital thermometer codes of DA, DB and DC, which are retimed by the recovered clock, are converted into two parallel 7 Gb/s NRZ signal outputs (MSB and LSB). Figure 1. PAM4 receiver architecture. 3. Circuit Design 3.1. Analog front end The analog front end in the receiver is the combination of a CTLE and an amplifier. The CTLE is used to compensate the different attenuation of different frequency components of PAM4 signaling. The following differential CML based amplifier is used to linearly amplify the input signal and shift its output common mode voltage to match following circuits. Figure 2 describes the presented CTLE, which is implemented by RC degeneration and peaking inductor. The variable resistor consists of a NMOS transistor and a poly resistor. The variable PAM4 DigitalDecoderTransition filterMSB7 Gb/s4DADBDCCPRing VCO7 GHzProportional Integral 8 phase Clock Gen.8X4INDLSB7 Gb/s4CDRSlicerCTLE/AMPVASamplerVBVCDecoderData Sampling X 4Edge Sampling X 4Data Sampling X 4Edge Sampling X 4Edge Sampling X 4BBPDX 4BBPDX 4Data Sampling X 444BBPDX 4BBPDX 4Major Voter56 Gb/s978-1-5090-6625-4/17/$31.00 ©2017 IEEE 553
capacitance consists of two NMOS transistors. The boost gain of the CTLE can be continuously changed through adjusting the gate voltage of the transistors at low frequency. In addition, the peaking inductor is utilized to extend the bandwidth and the boost gain of the CTLE without increasing power consumption. Figure 3 depicts the simulated frequency response of the presented CTLE. When the VCTLE changes from 700mV to 950 mV, the boost gain of the low frequency component varies from -8 dB to 5 dB. The following amplifier can realize 5.2 dB boost gain. Figure 2. Schematic of the presented CTLE. Figure 3. Frequency responses for different control voltages. 3.2. Slicer The slicer uses three different reference voltages to convert the 4-level signal into a 3 bits thermometer-coded, which are parallel in NRZ signaling. As depicted in Figure 1, the slicer includes three full differential comparators to realize signal conversion. The detail of comparator is described in Figure 4. It utilizes two certain reference voltages to amplify the voltage difference between the reference and input signal. The relationship between the PAM4 signal and reference voltages is shown in Figure 5. Figure 4. Schematic of the full differential comparator. Figure 5. Relationship among the PAM4 and references. 3.3. D Sampler Figure 6 describes the details of the sampler, which consists of a StrongARM latch and a RS latch. Comparing with the traditional CML based latch, the StrongARM latch is more power efficiency since it adopts return-to-zero (RZ) logic which does not have static power consumption. The working operation of the StrongARM latch is illustrated in [5]. The RS latch is used to convert the RZ to non-return-to-zero (NRZ), which is useful for following process. Figure 6. Schematic of the StrongARM based sampler. 3.4. Transition Filter Figure 7 shows an eye diagram of PAM4 at 56 Gb/s. Comparing with the eye diagram of the NRZ, it has four levels, which means it contains 12 types of transition labeled in Figure 7. Considering the phase detector design, all these transitions can provide phase information of the input data. However, parts of the 12 type transitions have sidle crossover point, leading to extra noise of the phase detection and increasing the unstableness of the CDR. This phenomenon will get worse at large channel loss due to the sidle crossover point deviating larger with the central crossover point. In order to mitigate this problem, the transitions of the central crossover point should only be used to detect, which means the transitions of the sidle crossover point should be removed. Table I lists the transitions of each comparator output. After analyzing, the useful transitions are noted yes at their sidle. The transition filter is to select the good transition at each transitions comparator ( ) can be selected respectively by 4 blocks of digital logic. the minor ), and the major transition ( ; output. Thus, ; VinVoutRLCLRLLLCLVCTLEVCTLE = 700mVVCTLE = 800mVVCTLE = 850mVVCTLE = 950mVVCTLE = 900mVVCTLE = 750mVVipVinVrnVrpVBIASVipVinVrpVrnCKNONOPIPINRS LatchStrongARM LatchCKNCKN①④⑤⑧⑨⑫③⑩554
frequency step of the separated proportional path, Icp is the charge pump current, Kvco is the turning gain of the VCO in Hz/V, and is the baud rate. In order to guarantee the stability, the should be designed largely [7]. In addition, the jitter tolerance should also be analyzed and designed. Paper [6], shows the analytic expression for the maximum input phase jitter that causes the onset of the slew rate limiting (2) , is the input sinusoidal jitter frequency. The Where tolerable sinusoidal jitter amplitude should be greater than 0.1UI at for most standards. Numerical simulation result indicates that large can get large jitter tolerance but large also leads to large self-generated hunting jitter, which will deteriorate the recovered clock performance. Figure 8 shows the architecture of the presented ring VCO based CDR, which is a quarter structure. It includes sampler, transition filter, BBPD, major veto, differential charge pump, proportional logic, integral capacitance, ring VCO and multiphase clock generator. The three sets samplers are utilized to extract the data and edge information of the three input data path. The transition filter employs 4 block digital logic illustrated in part 3.4 to get four sets data and edge information of the good transition, then fed into four sets BBPDs respectively. The major vote uses 16 sets information to generate ultimate lead/lag information, which is used to generate the control voltage through the differential charge pump and integral capacitance. At the same time, the control signal of the proportional path is also generated by using the head/lag information in a digital logic circuit. Both the integral control voltage and the proportional signal will adjust the phase of the VCO output until the phases of the VCO and the input data are aligned. Figure 9 describes the details of the presented ring VCO with separated proportional path. The ring VCO, containing two delay cells, covers a wide operation range of 5.1-9.4 GHz, which is controlled by the integral voltage and the DAC. The DAC is the coarse frequency selector. The frequency change of the proportional path is realized through changing the voltage of the two sets varactors, which are located in the interval of the two delay cells. In addition, the gains (Gain1 and Gain2 in Figure 9) of the two paths (integral and proportional) are adjustable to balance jitter generate and jitter tolerance. In this design, covers 20~60 MHz, and Kvco covers 0.5~1.2 GHz/V. The details of the delay cell and clock buffer are shown in Figure 10. The delay cell is a typical pseudo-differential CMOS inverter. A cross-coupled inverter pair is employed at the pseudo-differential CMOS inverter output stage to keep the clock’s balance duty cycle. In addition, the AC-coupled clock buffers with small parasitic capacitance are employed (see Figure 10(b)) to reduce the influence of the parasitic capacitance. Figure 7. Transition waveforms with label. Table 1. Transition of each comparator Path NUM Level Central Path NUM Level Central transition crossover transition crossover 2->1 Yes VC 3->1 No 4->1 No 1->2 Yes 1->3 No 1->4 No 3->1 No VB 4->1 Yes 3->2 Yes 4->2 No 4->1 4->2 No No 4->3 Yes 1->4 2->4 No No 3->4 Yes 1->3 No 2->3 Yes 1->4 Yes 2->4 No VA VB Figure 8. Architecture of the proposed CDR. 3.5. CDR Design Considering the CDR design, jitter generate, jitter tolerance and the stability of the whole loop should also be carefully considered and designed. The stability factor is defined as the ratio of the phase change from the proportional path to the integral path [3] (1) Where C is the integral capacitor, is the bang-bang Level 1123456879101112Level 2Level 3Level 4①③②⑥③⑨④⑩⑦⑪⑩⑫②⑦③⑧⑤⑩⑥⑪MajorVoteBBPD X 4 DIFF CPlogicIntegral TA7G IP7G IN7G QP7G QN7G IPTATATA7G QP7G INCK7_0 P/NCK7_90 P/NCK7_180 P/NCK7_270 P/N{-1, 0, 1}162222E S x 428 Gb/s VB7 GHz 7G IP7G IP7G QP7G QP7G QPBBBBMultiphase Clock Gen.D S x 4E S x 4D S x 4E S x 4D S x 4FilterBBPD X 4 BBPD X 4 BBPD X 4 1431058912VAVCVCORingCFSProportional 28 Gb/s 28 Gb/s 22bbbndorderbaudcpvcoCffiIKbbfbaudf2ndordermax(s)j2max322/(s)bbbbjbbbbssfffssf2sjff/1667cbaudffbbfbbfbbf555
Figure 9. Ring VCO with separated proportional path. (a) (b) Figure 10. Details of (a) delay cell, (b) clock buffer Figure 11. PAM4 receiver layout 4. Experimental results The PAM4 receiver is design in a 65nm CMOS technology and its layout is shown in Figure 11. The core circuit occupies area of 0.375 mm2 and consumes 76 mW under 1.2 V supply. When the receiver inputs PRBS7 based PAM4 data, parts of simulation results are described below. Figure 12 depicts the changing process of the control voltage with time in locking process, which is the difference of the double-end control voltages (VC_P, VCN) shown in the subset of Figure 12. The result indicated that the CDR is locked at 391.9ns. Figure 5 shows the eye diagrams of the recovered 7 GHz clock with and without the transition filter after the CDR locked. By employing the transition filter, the jitter of the recovered clock can be reduced from 11.25ps to 7.03 ps. Figure 13. Eye diagram of the 7 GHz clock after CDR locked, (a) with transition filter, (b) without transition filter. 5. Summary A 56 Gb/s wire-line PAM4 receiver with a ring VCO based CDR including a separated proportional path is presented in this paper. The different gains of the integral and proportional path can balance the jitter tolerance and the jitter generate. The simulation results indicated that the PAM4 receiver can work at 56 Gb/s. In addition, by employing the transition filter, the jitter of the recovered clock can be reduced by one third. Acknowledgments This work is supported by National Science Technology Major Project (No. 2016ZX01012101). References [1]. U. Singh et al., “A 780 mW 4 _ 28 Gb/s transceiver for 100 GbE gearbox PHY in 40 nm CMOS,” IEEE J. Solid-State Circuits, vol. 49, no. 12, pp. 3116–3129, Dec. 2014. [2]. M.-S. Chen and C.-K. K. Yang, “A 50-64 Gb/s serializing transmitter with a 4-tap, LC-ladder-filter-based FFE in 65 nm CMOS technology,” IEEE J. Solid-State Circuits, vol. 50, no. 8, pp. 1903–1916, Aug. 2015. [3]. P.-C. Chiang et al., “60Gb/s NRZ and PAM4 transmitters for 400GbE in 65nm CMOS,” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pp. 42–43, Feb. 2014. [4]. J. Lee, et al., “Design of 56 Gb/s NRZ and PAM4 SerDes transceivers in CMOS technologies”, IEEE J. Solid-State Circuits. Vol. 50, No. 9, pp. 2061-2073, 2015. [5]. Kim J, Leibowitz, B S, Ren J, et al., “Simulation and analysis of random decision errors in clocked comparators”, Circuits and Systems I: Regular Papers, IEEE Transactions on, 56(8): 1844-1857, 2009. [6]. Wang S, et al. “Design considerations for 2nd-order and 3rd-order bang-bang CDR loops” CICC Dig. Tech. Papers, Sep. 2005. [7]. R.C. Walker, “Designing bang-bang PLLs for clock and data recovery in serial data transmission systems,” in Phase-Locking in High- Performance Systems: pp. 34- 45, 2003. Figure 12. Integral path voltage in locking process. DelayCellDelayCellPLBBBB7G_IPiDACIntegralCFSPHGain17G_IN7G_QP7G_QNAMPGain2INIPOPONINVINV750 umCTLEAMPSampler & FilterSlicerMultiphaseClock Gen.VCOBBPDDiff CP & Cap500 umVC_PVC_NVC_PVC_N-VC=TJ: 11.25 psPeriod: 142.86psPeriod: 142.86psTJ: 7.03 psW/ Transition FilterW/O Transition Filter556
分享到:
收藏