# A Wideband All-Digital CMOS RF Transmitter on HDI Interposers With High Power and Efficiency

Nai-Chung Kuo, Student Member, IEEE, Bonjern Yang, Student Member, IEEE,
Angie Wang, Graduate Student Member, IEEE, Lingkai Kong, Member, IEEE,
Charles Wu, Member, IEEE, Vason P. Srini, Elad Alon, Senior Member, IEEE,
Borivoje Nikolić, Fellow, IEEE, and Ali M. Niknejad, Fellow, IEEE

Abstract—This paper demonstrates a wideband CMOS all-digital polar transmitter with flip-chip connection to three high-density-interconnection PCB interposers. The interposers are designed to extract power from a CMOS open-drain inverse Class-D power amplifier core. For a wide frequency range from 0.7 to 3.5 GHz, continuous-wave output power higher than 25.5 dBm and drain efficiency (DE) above 40% are demonstrated. The low-band package achieves a peak power of 29.2 dBm at 1.1 GHz with DE of 60%, the mid-band package outputs 28.8 dBm at 1.5 GHz with DE of 56%, and the highband package generates 26 dBm at 3 GHz with DE of 49%. The amplitude modulation (AM) is achieved by digitally modulating the switch conductance of the inverse Class-D core, and the on-chip phase modulation is achieved by digitally weighing the in-phase and quadrature bias currents in the IQ mixer. Detailed modulation tests, involving 64 quadrature amplitude modulation (QAM) and 20-MHz WLAN and LTE signals, exhibit excellent power and efficiency at 0.6, 1.2, 1.8, 2.4, 3, and 3.6 GHz. The associated specifications on spectral masks and error vector magnitudes are satisfied.

Index Terms—Interposer, inverse Class-D, phase modulation, polar transmitter, RF-DAC, transformers.

# I. INTRODUCTION

THE desire for efficient RF transmission has been driving the development of switched power amplifiers (PAs), including Class-D [1]–[3], switched capacitor [4]–[6], inverse Class-D [7]–[14], and Class-E [15]–[18] topologies. The improved peak efficiency is from the reduced overlap between the current and the voltage waveforms at the device output. Since the input square-wave drive to the switches usually does not contain any amplitude information, the RF amplitude is usually modulated by varying the device periphery of an inverse Class-D power core [7]–[14], power-combining two out-phasing Class-D [1], [2] or Class-E [17], [18] power cells, or supply modulation [5], [13], [16], [19]–[22]. To efficiently output signals with a high peak-to-average power

Manuscript received April 2, 2017; revised June 30, 2017; accepted July 13, 2017. Date of publication August 7, 2017; date of current version November 3, 2017. This work was supported in part by the DARPA RF-FPGA Program under Grant HR0011-12-9-0013 and in part by the NSF-EARS Program. (*Corresponding author: Nai-Chung Kuo.*)

N.-C. Kuo, B. Yang, A. Wang, L. Kong, E. Alon, B. Nikolić, and A. M. Niknejad are with the Berkeley Wireless Research Center, University of California at Berkeley, Berkeley, CA 94720 USA (e-mail: naichungkuo@berkeley.edu).

C. Wu is with Keysight Technologies, Inc., Santa Clara, CA 95051 USA. V. P. Srini is with Nokia, Berkeley, CA 94704 USA.

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TMTT.2017.2731309

ratio (PAPR), a switch-based transmitter (Tx) must have both high peak and back-off efficiencies. The device peripheryto-amplitude modulation (AM) scheme results in a back-off efficiency similar to that of a linear Class-B PA, where the drain efficiency (DE) degrades by 3 dB when the output power is 6 dB lower than the peak power. Owing to increased loss in the combiner, AM achieved by the out-phasing combining of two nonlinear Class-D/E power cores also degrades back-off efficiency. Several works have been dedicated to increasing the efficiency by improving the power combiners [2], [18] or employing multiple supplies to alleviate the out-phasing angle [17]. Additionally, supply-to-AM eliminates the wasted voltage headroom in the back-off operation. Typical methods employ linear/buck regulators [20]-[22] for envelope tracking (ET) or a lower supply voltage (Class G) for a lowpower operation [5], [13], [16]. Recently, a 0.5-to-3.5-V supply modulator with an efficiency of 80% was reported, targeting 10-MHz LTE signals [20], [22]. The main challenge for ET techniques is the supply-modulator efficiency, which drops as the bandwidth increases [23]; similarly, ignoring the cost of implementing the Class-G supplies results in an optimistic estimation of the Tx efficiency. Also, supply interferences due to the bulk-converter switching spur/noise or glitches in the switched-supply [24] degrade the signal integrity.

Another approach improves both the peak and the back-off efficiencies via better passives. It has been demonstrated that off-chip passives, fabricated on IPD [15], [25], LTCC [11], [25] or PCB [20], [22], enhance the PA power and efficiency, because they can be designed with thicker metals and low-loss substrates. Although fully integrated solutions with on-chip passives allow the CMOS die to directly connect to the PCB antenna, a substantial amount of buffer space on the coarse-pitched PCB must be allocated to fan-out the wirebond connections [7]-[9], and the wires introduce loss and impedance variation to the Tx. Directly interfacing a flip-chip package to a high-density-interconnection (HDI) PCB motherboard (antenna board) results in a better signal integrity and more compact packaging [1], [2], [5], [26], but it is cost ineffective since the motherboard is usually larger than the chip by two orders of magnitude. Alternatively, in this work, the CMOS die is flip-chip connected to HDI interposers which then disperse the signals to a coarsepitched ball-grid array (BGA) on the back side of the interposers. The array signals are matched to connections on the coarse-pitch PCB motherboard. The interposer is still an

0018-9480 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications\_standards/publications/rights/index.html for more information.



Fig. 1. Block diagram of the realized all-digital RF transmitter with on-chip amplitude modulator and PM and on-interposer transformer.

order of magnitude larger than the chip, so substantial design space for high-quality passives and SMD components is available.

Another attractive aspect of employing both a digitally modulated Tx and on-interposer passives is that the Tx operating frequency can be reconfigured simply by switching between interposers that have different output passive networks. Notice that digital transmitters employ a square-wave local oscillator (LO) drive and do not need an input matching network, while a linear PA requires multiple tuned passive networks. As a result, operation at different frequencies can be optimized via modifying the design of the Tx output passive network on the interposer, while the attached CMOS Tx chip remains the same. Compared to using multiple CMOS designs, this provides a more efficient method to support the wide range of frequency bands required from modern communication systems.

This work demonstrates three HDI PCB interposers: low band (LB), mid band (MB), and high band (HB), targeting three frequency bands, covering collectively from 0.7 to 3.5 GHz. The three packages are served by only a single and generic 2.5-V all-digital Tx in a 65-nm CMOS. Thanks to the high efficiency of the inverse Class-D core and the high-quality passives, the continuous-wave (CW) peak powers are 29.2 dBm at 1.1 GHz (LB), 27.7 dBm at 2.3 GHz (MB), and 26 dBm at 3 GHz (HB), with drain efficiencies (DE), respectively, of 60%, 54%, and 49%. The Tx also performs well with 62.5 MS/s 64 quadrature amplitude modulation (QAM), 20-MHz WLAN, and 20-MHz LTE signals at six test frequencies between 0.6 and 3.6 GHz. For the WLAN signal, the Tx has an output power of 21.6 (20.7) dBm at 1.8 (2.4) GHz with a DE of 27% (25%) and system efficiency (SE) of 23% (21%). The measured EVM is -31 dB, substantially better than the specification of -25 dB, and the spectral mask is not violated. For the LTE signal, the peak performance occurs at 1.8 GHz with an output power of 24 dBm and SE of 30%. The measured EVM and adjacent channel leakage ratios (ACLR<sub>1,2</sub>) are -27, -34, and -39 dB, respectively, better than the specifications with margin. In addition to excellent power and efficiency when transmitting both CW and modulated signals over a wide frequency range, this work also features an all-digital input interface, with both an 8-b inverse Class-D amplitude modulator and a tunable 8-b IQ-mixer-based phase-modulator (PM) realized on silicon.

This paper is arranged as follows. Section II provides an overview of the Tx system. Section III introduces the employed inverse Class-D PA and AM with new analysis improved from our previous work [7]. Section IV introduces the PM, and Section V characterizes the transformer designs on the three HDI PCB interposers. Measurement results are summarized in Section VI, and more details on the Tx periphery circuits and discussions on various mismatch effects are in the Appendix.

# II. ALL-DIGITAL RF TRANSMITTER SYSTEM OVERVIEW

Fig. 1 illustrates the block diagram of the designed alldigital RF transmitter. Unlike most reported digitally modulated transmitters that send amplitude/phase or I/Q codes to the chip in parallel with numerous I/O pins (see [7]-[10]), this design adopts a flip-chip chip-to-interposer connection to allow for a compact package, high-quality passives, and better signal integrity. As a result, the number of available I/O pins is limited by the 200- $\mu$ m bump pitch, so two highspeed 2.5-Gb/s LVDS receivers, including a preamplifier and a strong-arm comparator, are designed to receive the AM and PM serial-code streams. Two 1-to-10 deserializers (DeSer) following the LVDS receivers regenerate the parallel 8-b AM and PM codes (two of the ten bits are not used) with a symbol rate of 250 MS/s. Two clock receivers, one at 1.25 GHz and one at 250 MHz, are implemented for use by the LVDS receivers and deserializers. Both the rising and falling edges of the 1.25-GHz clock are utilized to sample the 2.5-Gb/s LVDS signals. More details on the Tx periphery circuits are in the Appendix.

The 8-b amplitude modulator outputs 256 amplitude states, realized by dividing the inverse Class-D core into 19 parallel cells, composed of 4 binary cells, and 15 thermal cells. The binary cells have relative sizes of 1, 2, 4, and 8, and the thermal cells have a relative size of 16. The unit-cell size on each differential side is 6  $\mu$ m/65 nm for the thin-oxide nMOS switch device and 18  $\mu$ m/280 nm for the cascode thick-oxide device. In the LO path, a static divider is used to create the in-phase and quadrature LO signals for the

8-b IQ-mixer-based PM, so the LO receiver has to operate at twice the Tx frequency. The supply voltage is 2.5 V for the power core and 1 V for the driver, PM, and LO/CLK receivers. The on-chip amplitude and phase paths have a delay mismatch lower than 100 ps, which degrades the performance very little (see the Appendix). The operations of the amplitude modulator and PM are covered, respectively, in Sections III and IV.

## III. DESIGN AND ANALYSIS OF THE INVERSE CLASS-D PA

# A. Inverse Class-D Performance With Ideal Switches

The inverse Class-D power core, illustrated in Fig. 2(a), features nonoverlapping voltage and current waveforms and an ideal maximum DE of 100% (if the switch resistance  $R_{\rm ON}$  is zero). Compared to the Class-D PAs based on CMOS inverters and thus suffer from the associated  $CV^2 f$  parasitic loss [1]-[3], the parallel drain capacitance in an inverse Class-D power core can be absorbed into the load resonator, and better performance can be expected at higher frequencies. In practice, the nonzero switch  $R_{\rm ON}$  degrades both the power and efficiency, and causes overlap between the current and voltage waveforms. The output power and DE of an inverse Class-D cell are functions of the switch and the load resistance, the LC tank, and the dc feed inductance, as given by the analytical expressions presented in [7, eqs. (11) and (12)]. The analysis was conducted by solving the circuit KCL at all harmonics, where the switches were modeled by a timevarying conductance alternating between zero and the switch conductance  $(1/R_{ON})$ . The provided expressions contain infinite series and are difficult to interpret, so they are revised here to provide a more general and accessible analysis, where the load is represented by a complex number. Similar to [7], the load impedance is assumed to be zero for odd-order harmonics and infinite for even-order harmonics. According to [7, Fig. 11], the load impedances at even harmonics do not significantly affect results. After some algebraic manipulations, the output (fundamental) power and DE of an inverse Class-D core are expressed by

$$P_{\text{out\_ClassD-1}} \approx \frac{V_{\text{DD}}^2}{4} \times \frac{20R_L}{(5R_{\text{on}} + R_L)^2}$$
(1)

$$DE_{ClassD-1} \approx \frac{1}{1 + 5R_{ON}/R_L}$$
(2)

where  $R_L$  is the load resistance  $(Z_L = R_L + jX_L)$ . Equations (1) and (2) suggest that the performance of the inverse Class-D power core is more convenient to be characterized in terms of the load impedance  $Z_L$  rather than the load admittance  $(Y_L = G_L + jB_L)$ . The approximations are valid assuming  $4\omega_0 C_s R_{ON} \ll 1$ . This assumption implies that the switch capacitance can be neglected and is valid for switches fabricated in modern CMOS processes (e.g., 65-nm CMOS) and operated at RF frequencies (e.g., 4 GHz).

Equations (1) and (2) indicate that the output power and DE are independent of the load reactance  $X_L$ . The simulated power and DE for Fig. 2(a) schematic are shown in Fig. 2(b) and (c), respectively, to the load impedance. The ideal switches have an ON-resistance of 1  $\Omega$  ( $R_{ON} = 1 \Omega$ ), and the



Fig. 2. (a) Schematic of the inverse Class-D cell ( $V_{\text{DD}} = 2.5$ ,  $R_{\text{ON}} = 1$ ). (b) Power (in dBm) versus  $Z_L$ . (c) DE versus  $Z_L$ . The switch current/voltage waveform with  $Z_L$  of (d) 10  $\Omega$ , (e) 100  $\Omega$ , and (f) 10 + 10j  $\Omega$ .

supply voltage is 2.5 V. If  $R_{\rm ON} \neq 1 \ \Omega$  or  $V_{\rm DD} \neq 2.5 \ V$ , Fig. 2(b) and (c) is still applicable. In such a case, the horizontal and vertical axes represent, respectively,  $R_L/R_{ON}$ and  $X_L/R_{ON}$ , and the output power should be scaled by  $(V_{\rm DD}/2.5)^2/R_{\rm ON}$ . It is observed that  $X_L$  does not affect the power and efficiency. To be clear, the time-domain current and voltage waveforms are shown in Fig. 2(d)-(f), respectively, corresponding to  $Z_L$  of 10, 100, and  $10 + 10j \Omega$ . Because the current is zero during the second half-cycle, the switches only dissipate power in the first half-cycle. With zero-load reactance ( $X_L = 0$ ), the peak voltage during the second half-cycle approaches a maximum value of  $\pi V_{DD}$  as  $R_L$  increases (or as  $R_{ON}$  decreases), and the voltage in the first half-cycle approaches zero. As shown in Fig. 2(f), with a nonzero-load reactance  $(X_L \neq 0)$ , the voltage swing can exceed  $\pi V_{DD}$  and also become negative during the second half-cycle. However, the current/voltage waveforms in the first cycle and, thus, the switch power dissipation are equivalent to those associated with  $Z_L = R_L + j0$ . Although using a nonzero  $X_L$  results in a higher fundamental voltage, the fundamental voltage is no longer in-phase with the fundamental current, and the resulting output power (and DE) does not change.

As predicted by (1), with a fixed  $R_{\rm ON}$ , the output power is lower when  $R_L$  takes either a very high or a very low value. Applying the AM–GM inequality to (1), it can be shown that the maximum power of  $V_{\rm DD}^2/4R_{\rm ON}$  occurs when  $R_L = 5R_{\rm ON}$ . Unfortunately, the DE corresponding to the maximum power is only 50%. To achieve a higher efficiency, the load resistance must be significantly higher than the switch resistance. A practical design uses an  $R_L \sim 10 R_{\rm ON}$ , resulting in an output power that is 0.5 dB from the maximum value and an efficiency of 67%.

Compared to a Class-D power core with the same supply and switch resistance, the inverse Class-D core has a



Fig. 3. (a) Model of a Class-D power core. (b) Efficiency comparison of a Class-D and inverse Class-D power core versus  $x \equiv R_L/R_{ON}$ .

higher output power. Without using pMOS devices, the inverse Class-D can also achieve a better efficiency. The Class-D power core, illustrated in Fig. 3(a), is modeled by a load resistance  $R_L$  with both of terminals connected to a square-wave voltage source that has a source resistance of  $R_{ON}$ . The two voltage sources are complementary and swing between 0 and  $V_{DD}$ . The Class-D (fundamental) power and efficiency can be easily derived by

$$P_{\text{out\_ClassD}} \approx \frac{4V_{\text{DD}}^2}{5} \times \frac{R_L}{(2R_{\text{ON}} + R_L)^2}$$
 (3)

$$DE_{ClassD} \approx \frac{4}{5} \times \frac{1}{1 + 2R_{ON}/R_L}.$$
 (4)

According to (3), the maximum power achievable by the Class-D cell, reached when  $R_L = 2 R_{ON}$ , is  $V_{DD}^2/10 R_{ON}$ . This maximum power is 4 dB lower than that of an inverse Class-D cell. Moreover, for the same  $R_{ON}$  and  $R_L$ , the inverse Class-D cell exhibits up to 6-dB higher power. Pout\_ClassD-1/Pout\_ClassD is a function of  $x \equiv R_L/R_{ON}$  and is 1.9, 4.9, and 6 dB, respectively, for x = 1, 5, and 10. DEs for the two topologies are shown in Fig. 3(b) as functions of x. It shows the Class-D core exhibits a higher DE with a lower x, while the inverse Class-D cell obtains a higher DE if x > 10. Notice that an inverse Class-D cell does not need a pull-up pMOS device and can employ a larger switch with lower switch resistance  $(R_{ON})$ but similar parasitic capacitance. Finally, the calculated output powers (in dBm) of an inverse Class-D cell and a Class-D cell, both with a 2.5-V supply voltage, are shown, respectively, in Fig. 4(a) and (b) as functions of  $(R_L, R_{ON})$ . The efficiency, which only depends on  $x \equiv R_L/R_{ON}$ , is also annotated. Fig. 4 provides a preliminary guide to the design of Class-D and inverse Class-D power cores.

For the ease of design, the AM of an inverse Class-D power core is usually realized by digitally modulating the switch conductance ( $G_{ON} = 1/R_{ON}$ ) with uniform steps [7]–[14]. According to (1) and (2), both the output power and efficiency degrade when  $G_{ON}$  decreases from the peak value  $G_{on_max}$  ( $G_{on_max} = 1/R_{on_min}$ ), while the exact roll-off characteristics depend on both  $G_{on_max}$  and  $R_L$ . Fig. 5 shows the simulated back-off power, efficiency, output voltage ( $V_{out}$ ), and output voltage step of an ideal inverse Class-D power core with  $R_L$  of 1, 10, and 100  $\Omega$ . The switch conductance is swept from 0.01 to 1 S with a step size of 0.01 S. The curves match the calculated results. It is worth noting



Fig. 4. (a) Inverse Class-D and (b) Class-D output power and DE as functions of  $R_L$  and  $R_{ON}$  ( $x \equiv R_L/R_{ON}$ ).



Fig. 5. Back-off (a) power, (b) DE, (c) output voltage, and (d) voltage step of an inverse Class-D cell with 2.5-V supply and  $R_L$  of 1, 10, and 100  $\Omega$ . The switch conductance is swept with a step size of 0.01 S.

that the voltage curve shown in Fig. 5(c) is a scaled version of the efficiency curve shown in Fig. 5(b). This is because  $V_{\rm out} \approx V_{\rm DD}^* DE^* \pi$  according to (1) and (2), and the relation  $P_{\text{out}} = V_{\text{out}}^2/2 R_L$ . It has been explained that using  $R_L = 10R_{\text{on}_{\min}}$  results in both good peak power and peak efficiency. However, the voltage steps increase and the AM resolution decreases with decreasing switch conductance, when fewer parallel cells operate. According to (1), to linearize the output voltage response by designing the load,  $R_L \ll 5R_{\rm on\ min}$  has to be satisfied. This inevitably leads to a very low efficiency and is not a preferred method. Instead, digital predistortion is usually employed with a lookup table to find the proper AM code (AM<sub>code</sub>) associated with a given output magnitude [7]–[14]. This method is favored and easier to implement over designing a nonlinear  $G_{\rm ON}$ -to-AM<sub>code</sub> response.



Fig. 6. (a) Schematic for characterizing the designed CMOS core, and drain voltage and current with (b)  $Z_L = 10 \ \Omega$  and (c)  $Z_L = 10 + 10j \ \Omega$ .

## B. Practical Performance With nMOS Switch

The analysis above uses an ideal switch model, and the derived power and efficiency are independent of the load reactance. This ideal characteristic does not hold when the switches are realized by nMOS devices, since the switch cannot remain in the off state without conducting current when the drain swing becomes exceedingly negative, so the voltage and current waveforms depicted in Fig. 2(f) do not exist in practice. Instead, under a negative voltage with magnitude exceeding the device threshold, the switch conducts a negative current and limits the negative voltage swing, degrading both the output power and the efficiency. Additionally, the abrupt voltage jump at the transition between the first and second half-cycles must be smoothed by the device parasitic capacitance, injecting a large current into the load. This also degrades the performance. The inverse Class-D cell used in this work, illustrated in Fig. 6(a), is an nMOS device with a size of 1.53 mm/65 nm. A cascode thick-oxide device withstands a voltage swing that can in theory reach  $\pi V_{DD}$ . Using the cascode device, the simulated switch ON-resistance is 0.6  $\Omega$ . The simulated voltage and current waveforms are shown in Fig. 6(b) and (c), versus load impedances of 10 and  $10 + 10i \Omega$ , respectively. It is noted that Fig. 6(b) resembles Fig. 2(d), while the current waveform in Fig. 6(c) is very different from that in Fig. 2(f). The voltage can no longer swing deeply below zero and two injection pulses appear such that the overlap between the current and voltage waveforms increases and both power and efficiency are reduced. The simulated peak power and DE of the designed inverse Class-D power core are shown, respectively, in Figs. 7 and 8, versus the load impedance  $Z_L$ . The power performance is fairly wideband, and the power contours in Fig. 7(a) and (b), simulated, respectively, at 0.6 and 3.6 GHz, look similar to each other. Fig. 7 confirms that the presence of an imaginary impedance degrades the output power. Similarly, Fig. 8 shows the DE also depends on the load reactance.

Explained by (1), when  $R_L$  is sufficiently high (e.g., 10  $\Omega$ ), the output power starts to degrade as  $R_L$  increases. As the output power decreases, the dc power dissipated by the device during the transit between the two half-cycles can no longer be neglected and hurts the efficiency. On the other hand, the DE also goes lower when the operating frequency increases, where the dissipated power during switch transitions is higher. Finally, to achieve both good output power and DE for the



Fig. 7. Simulated peak power for the designed inverse Class-D core versus the load impedance  $(Z_L)$  at (a) 0.6 GHz and (b) 3.6 GHz.



Fig. 8. Simulated DE for the designed inverse Class-D core versus the load impedance  $(Z_L)$  at (a) 0.6 GHz and (b) 3.6 GHz.

designed inverse Class-D core, a load impedance of about  $8 + 7j \Omega$  is preferred according to the simulation. This impedance is marked on Figs. 7 and 8, corresponding to an output power of about 31.5 dBm and DE of about 70%. The load is inductive to compensate the device parasitic capacitance.

### IV. DESIGN OF THE CMOS PHASE MODULATOR

The schematic of the digital PM or interpolator (PI) is depicted in Fig. 9(a). The PI core is similar to a Gilbertcell double-balanced IQ mixer [27, Ch. 6]. The four dc bias currents are digitally modulated and denoted by I I+, I I-,  $I_{O+}$ , and  $I_{O-}$ . The differential bias currents for the in-phase and quadrature mixers are  $(I_{I+} - I_{I-})$  and  $(I_{Q+} - I_{Q-})$ , respectively. The mixer output's fundamental current (at the LO frequency  $\omega_{LO}$ ) has an in-phase component proportional to  $(I_{I+} - I_{I-})$  and a quadrature component proportional to  $(I_{Q+} - I_{Q-})$ . Thus, the mixer output's fundamental voltage at  $\omega_{LO}$ , denoted by  $V_{mixer_fund}$ , is proportional to  $[(I_{I+} - I_{I-})\cos(\omega_{LO}t) + (I_{Q+} - I_{Q-})\sin(\omega_{LO}t)]$ . Notice that the mixer output voltage, denoted by  $V_{\text{mixer}}$ , includes higher harmonics so that  $V_{\text{mixer}} \neq V_{\text{mixer}_{\text{fund}}}$ . The current DACs for both the in-phase and quadrature mixers are 7 b, as illustrated in Fig. 9(b), and are composed of three binary cells (with a relative size of 1, 2, and 4) and 15 thermometer cells (with a relative size of 8). The total current in the current DAC is  $127I_{DAC}$ , where  $I_{DAC} \sim 5.5 \ \mu A$  is the unitcell current. The 7-b current DAC used by the in-phase (and quadrature) mixer is able to output 128 differential current



Fig. 9. Schematic of (a) digitally modulated PM (b) current DAC in the IQ mixer, and (c) LO integrator.

states from  $127I_{DAC}$  to  $-127I_{DAC}$  with a uniform step size of  $2I_{DAC}$ . The two digitally modulated currents  $(I_{\_I+} - I_{\_I-})$ and  $(I_{\_Q+} - I_{\_Q-})$  are designed not to generate redundant phases at the PI output. The PI has an 8-b resolution, where seven bits are used to select  $I_{\_I+}$  from 0 to  $127I_{DAC}$ and  $(I_{\_I+} - I_{\_I-}) = 2I_{\_I+} - 127I_{DAC}$ , and one bit is used to select  $(I_{\_Q+} - I_{\_Q-})$  between  $(128I_{DAC} - |I_{\_I+} - I_{\_I-}|)$ and  $(|I_{\_I+} - I_{\_I-}| - 128I_{DAC})$ . The table in Fig. 10(a) shows the two mixer currents  $(I_{\_I+} - I_{\_I-})$  and  $(I_{\_Q+} - I_{\_Q-})$  versus the 8-b PM code  $(PM_{code})$ . Fig. 10(b) shows  $V_{mixer\_fund}$  as a complex number in the phasor domain. It is observed that the output phase covers the full  $2\pi$  radius. The output phase as a function of PM<sub>code</sub> can be expressed by

Phase(
$$V_{\text{mixer_fund}}$$
) =  $\tan^{-1}\left(\left|\frac{1+2x}{127-2x}\right|\right) + \frac{\pi}{2} \operatorname{floor}\left(\frac{\mathrm{PM}_{\text{code}}}{64}\right)$ 
(5)

where  $x = (PM_{code} \mod 64)$ . Fig. 10(c) shows the output phase and phase step versus the  $PM_{code}$ . Notice that the phase step is not constant. The phase resolution has the finest value of  $0.9^{\circ}$  /step when the  $PM_{code} = 0$ , 64, 128, and 192 and the coarsest value of  $1.8^{\circ}$  /step when the  $PM_{code} = 32$ , 96, 160, and 224.

To drive the switch amplifier, the mixer output voltage  $V_{\text{mixer}}$  is converted to a digital waveform by a CML–CMOS converter, which eliminates the amplitude information. The converter is composed of a two-stage differential amplifier followed by self-biased inverter chains (see the Appendix). This nonlinear process translates  $V_{\text{mixer}}$  to zero and  $V_{\text{DD}}$ , respectively, for negative and positive inputs. The output square wave of the converter is denoted by  $D_{\text{out}}$ . For two  $D_{\text{out}}$  waveforms, the (fundamental) phase difference is equal to the difference in their zero-crossing timing multiplied by  $\omega_{\text{LO}}$ . If all of the harmonics at the mixer output are filtered such that  $V_{\text{mixer}} = V_{\text{mixer_fund}}$ , the CML–CMOS converter preserves the input phase, which means that the fundamental phase



Fig. 10. (a) Normalized mixer I/Q bias current, (b) normalized mixer output fundamental voltage  $V_{\text{mixer_fund}}$ , and (c)  $V_{\text{mixer_fund}}$  phase and phase step versus PM<sub>code</sub>.

difference between two  $D_{\text{out}}$  waveforms is equal to that of the corresponding two  $V_{\text{mixer}}$  waveforms.

Unfortunately, if harmonic components appear at the mixer output ( $V_{mixer}$ ), the CML–CMOS converter can significantly distort the phase response. Since the PM has to operate over a wide frequency range, the mixer cannot employ a bandpass or low-pass filter at its output to reject the harmonics [9]. It will be explained in detail that in order to achieve a linear phase response at the CML–CMOS output ( $D_{out}$ ), the two integrators at the mixer inputs are critical, creating triangle I/Q LO waveforms with appropriate magnitudes. The integrator schematic is illustrated in Fig. 9(c).

If the integrators are not employed, the LO inputs of the I/Q mixers, as well as the in-phase and quadrature currents flowing into the mixer load, are square waves. The in-phase and quadrature (differential) mixer currents are simulated with square-wave LO drives swinging between 0.4 and 0.8 V. The simulated I/Q current waveforms are shown in Fig. 11(a) for  $PM_{code} = 0, 16, 32, and 48$ . Since the two current waveforms resemble square waves with two distinct states and sharp transitions, the weighted sum of the I/Q currents and  $V_{\text{mixer}}$ have zero-crossing points that match that of the dominant current waveform and do not necessarily change with PM<sub>code</sub>. The corresponding  $V_{\text{mixer}}$  waveforms for the four PM codes are shown in Fig. 11(b). The two  $V_{\text{mixer}}$  corresponding to  $PM_{code} = 0$  and 16 have similar zero-crossing points, while  $V_{\text{mixer}}$  for PM<sub>code</sub> = 32 has zero-crossing points associated with very low slope (  $dV_{\text{mixer}}/dt$ ). As a result, the PI performs poorly without the integrators, where the phase steps are extremely fine at most of the PM codes (e.g., PMcode = 16) and very coarse at  $PM_{code} = 32$ . The simulated phase steps for



Fig. 11. Exemplary situation without the integrators (a) mixer I/Q (differential) currents, (b) mixer output ( $V_{\text{mixer}}$ ), and (c) phase steps for  $V_{\text{mixer}\_fund}$ and CML–CMOS output ( $D_{\text{out}}$ ).



Fig. 12. Integrator output I/Q waveform with frequency of (a) 0.6 GHz and (b) 3.6 GHz. Four integrator currents are tested.

 $V_{\text{mixer_fund}}$  and  $D_{\text{out}}$  are shown in Fig. 11(c) to PM<sub>code</sub> from 0 to 64. The phase response of  $V_{\text{mixer_fund}}$  is close to the ideal response (5), but the phase response of  $D_{\text{out}}$  is significantly distorted and is not useful.

The designed integrator sink current  $I_{int}$  is digitally tunable with 13 current states (SEL from 1 to 13) from 60 to 780  $\mu$ A with a step size of 60  $\mu$ A. The CMFB is critical for adjusting the pMOS current source in accordance with the sink current and maintaining the assigned dc level of 0.6 V. The integrator output in-phase and quadrature waveforms at 0.6 and 3.6 GHz are shown, respectively, in Figs. 12(a) and (b) for four integrator configurations, SEL = 1, 5, 9, and 13. To achieve both good phase linearity and phase noise, the magnitude must be held at a moderate level. Generally speaking, using a low



Fig. 13. (a) Phase step for the 0.6-GHz CML–CMOS output  $(D_{out})$  to PM<sub>code</sub>. (b) Simulated  $D_{out}$  noise.



Fig. 14. 0.6-GHz mixer (a) output I/Q current composition and (b) output voltage ( $V_{\text{mixer}}$ ) for six sample phase codes. Integrator SEL = 13.

integrator current results in a better phase nonlinearity but worse phase noise performance. On the other hand, the phase linearity is expected to degrade if the magnitude of the mixer input waveform is excessively high, such that the mixer differential pairs operate mostly with full current steering as if driven by square waves. The full PM, illustrated in Fig. 9(a), is simulated at 0.6 and 3.6 GHz. It is found that SEL = 5 ( $I_{int} \sim 300 \ \mu A$ ) and SEL = 13 ( $I_{int} \sim 780 \ \mu A$ ) are, respectively, appropriate for the two operation frequencies.

For PM operation at 0.6 GHz, the simulated phase step of  $D_{out}$  is shown in Fig. 13(a), and the simulated noise at a frequency offset of 10 MHz is shown in Fig. 13(b). The simulations are conducted (in one of the four quadrants) with PM<sub>code</sub> swept from 0 to 64 and integrator configurations SEL = 1, 5, 9, and 13. The phase step of  $V_{\text{mixer fund}}$  (simulated but not shown here) is close to ideal (5), regardless of the integrator current. The simulation and the results presented in Fig. 13 warrant the use of SEL = 5, which achieves relatively good noise and phase linearity. Using SEL = 13results in a very nonlinear phase response with phase step as coarse as 4° /code at a  $PM_{code}$   $\sim$  30, and the noise also degrades around  $\mathrm{PM}_{\mathrm{code}}$   $\sim$  30. To explain clearly, the 0.6-GHz mixer I/Q currents are shown in Fig. 14(a) and  $V_{\text{mixer}}$  is shown in Fig. 14(b). The simulations are conducted with SEL = 13 at six phase codes ( $PM_{code} = 0, 12, 24,$ 36, 48, 60). The current waveforms have many low-slope



Fig. 15. 0.6-GHz mixer (a) output I/Q current composition and (b) output voltage ( $V_{\text{mixer}}$ ) for six sample phase codes. Integrator SEL = 1.

regions (e.g., at 0.3 ns), corresponding to complete mixer current steering. The  $V_{\text{mixer}}$  slope at the zero-crossing points is lower when the I/Q currents are comparable (e.g.,  $PM_{code} = 24$ ). The zero-crossing point of  $V_{\text{mixer}}$ , with a low slope, is sensitive to the circuit noise and control variables and yields both high phase noise and a coarse phase step when processed by the CML–CMOS converter. The abrupt jumps in the mixer current are caused by charge sharing during the switch transitions.

It is also observed in Fig. 13(b) that using SEL = 1 when the PI operates at 0.6 GHz results in the worst phase noise, at around -125 dBc/Hz. The mixer currents and  $V_{\text{mixer}}$  for SEL = 1 are, respectively, shown in Fig. 15(a) and (b). The mixer LO drive has a peak-to-peak value of only 50 mV with SEL = 1 [see Fig. 12(a)], so the mixer differential pair operates in the linear region, and the mixer I/Q currents follow the input drive and are also triangle waveforms, as shown in Fig. 15(a). It is observed in Fig. 15(b) that the zerocrossing point of  $V_{\text{mixer}}$  shifts pretty uniformly with the phase code, which implies a linear  $D_{out}$  phase response. The good phase linearity has been confirmed by simulation. However, the  $V_{\text{mixer}}$  peak-to-peak magnitude is only 0.1 V and the slew rate (SR) at the zero-crossing point is only 0.12 V/ns. A 22-dB higher SR of 1.5 V/ns is shown in Fig. 14(b) at (SEL = 13,  $PM_{code} = 0$ ), accounting for a noise improvement of 17 dB (-142 dBc/Hz).

The same simulations are conducted at a PI frequency of 3.6 GHz. The simulated  $D_{out}$  phase step and noise are shown, respectively, in Fig. 16(a) and (b). At such a high frequency, the maximum reachable peak-to-peak magnitude at the integrator output is only 90 mV [see Fig. 12(b)]; therefore, the mixer differential pair always operates in the linear region, with the IQ mixer currents following the input signals. It is observed in Fig. 16(b) that using SEL = 1 results in a high  $D_{out}$  noise. In this case, both the mixer output and input magnitudes are lower than 20 mV, and the integrator output voltage, previously shown in Fig. 12(b), is dominated by the coupled voltage from the integrator input and does not look like a triangle waveform. Obviously, the highest integrator



Fig. 16. (a) Phase step for the 3.6-GHz CML–CMOS output  $(D_{out})$  to PM<sub>code</sub>. (b) Simulated  $D_{out}$  noise.



Fig. 17. 3.6-GHz mixer (a) output I/Q current composition and (b) output voltage ( $V_{mixer}$ ) for six phase codes. Integrator SEL = 13.

current with SEL = 13 should be used. The corresponding mixer IQ currents and  $V_{\text{mixer}}$  are shown in Fig. 17(a) and (b), respectively. An SR of around 1.5 V/ns is obtained at the zero-crossing point. Although this SR is almost the same as the SR achieved with  $\omega_{\text{LO}} = 0.6$  GHz, SEL = 13, and PM<sub>code</sub> = 0, the simulated noise degrades by 7 dB to -135 dBc/Hz due to the more frequent noise injection. Generally speaking, the  $D_{\text{out}}$  noise is proportional to the zero-crossing SR multiplied by the operation frequency  $\omega_{\text{LO}}$ .

In summary, a high SR for the mixer output ( $V_{\text{mixer}}$ ) at the zero-crossing point contributes to good phase noise and linearity. To achieve this, the integrator output should be a triangle waveform with a proper magnitude (e.g.,  $V_{\text{pp}} =$ 200 mV), such that the mixer differential pairs operate in the linear region to the maximum extent.

# V. INTERPOSER DESIGNS

Three transformers are designed on five-layer HDI PCBs. The three interposers target three frequency bands, LB, MB, and HB, respectively, centered at 0.9, 1.5, and 2.5 GHz. The substrate material is TU768 with a dielectric constant of 4.3 and loss tangent of 0.019. The minimum metal width and spacing are 50  $\mu$ m, and the copper thickness is 17  $\mu$ m. While an edge-side transformer with the minimum spacing cannot achieve a sufficient coupling factor, the thin dielectric



Fig. 18. 3-D illustrations for (a) LB, (b) MB, and (c) HB transformers on HDI interposers.



Fig. 19. Transformer on PCB interposer (a) lumped model and (b) extracted parameters.

thickness of 60  $\mu$ m supports the use of broadside-coupling transformers. From Section III-B, the designed inverse Class-D power core prefers a load impedance of  $8+7j \Omega$  for good power and efficiency, so the transformers are designed with a 1:2 turns ratio, which ideally presents a load impedance of 12.5  $\Omega$  to the PA core. The one-turn primary winding connected to the CMOS chip uses the second PCB metal layer, and the secondary winding, connected to the PCB motherboard and the 50- $\Omega$  load, uses the first and the third metal layers.

3-D illustrations of the designed LB, MB, and HB transformers are shown in Fig. 18. The passive structures are simulated by a full-wave electromagnetic simulator (ADS Momentum). The widely used simplified lumped model [28] is shown in Fig. 19(a), and the extracted parameters for the three designed transformers and some variations are provided in Fig. 19(b). The parameters are extracted based on the simulated Y-parameters, and include the primary and secondary-winding inductances  $(L_p \text{ and } L_s)$  and quality factors  $(Q_p \text{ and } Q_s)$ , the magnetic coupling factor  $(k \equiv M/\sqrt{L_s L_p})$ , and the coupling capacitor (C<sub>c</sub>). The maximum power gain  $G_{p,max}$  is calculated. Although  $G_{p,max}$ is extensively used in microwave circuit designs, this merit, which allows an arbitrary load impedance, does not properly characterize the transformer in this work, where only an SMD capacitor  $(C_s)$  is placed in parallel with the 50- $\Omega$  load. Alternatively, a new figure-of-merit G<sub>TXF,max</sub> is used here, defined by the maximum power gain  $G_p$  for the present load condition. The simulated G<sub>TXF,max</sub> for the three transformers are approximately -0.5 dB, very low-loss, and close to  $G_{p,max}$ . The simulation shows that the quality factors of the transformer inductors can be higher than 40.

Parasitic extraction reveals a 2-pF parasitic capacitor on each of the PA output nodes and a coupling capacitance of 0.8 pF. The total differential capacitor is 1.8 pF, denoted by  $C_p$ . The compact flip-chip connection prevents the use of an external parallel SMD capacitor at the transformer primary side, so  $C_p$  holds a constant value over the three designs. Since only a single CMOS die is used with the three packages, the transformer footprint must shrink on the MB and HB packages to reduce the inductance and accommodate higher operating frequencies. The LB, MB, and HB transformers are designed with primary inductances of 4, 2, and 1 nH, respectively, and the achieved coupling factors are, respectively, 0.79, 0.72, and 0.60. In general, achieving a good coupling factor is more difficult when the transformer has a smaller footprint. The low coupling factor can be the main loss contributor, in addition to the metal resistance. Using the same inductance for the primary winding, multiple test structures are simulated to achieve the highest coupling factor and, therefore, better bandwidth and insertion loss. Increasing the transformer size enhances the coupling factor, but the inductance also increases and results in a lower operating frequency. To compensate, a wider trace width has to be used. This approach can increase the coupling factor k to some extent, at the cost of a higher parasitic capacitance between the two windings.

In our design, which targets a low transformer loss, the two *LC* tanks at the transformer primary and secondary windings use different resonance frequencies  $(L_p C_p \neq L_s C_s)$ . The corresponding analysis is provided to relate the transformer input impedance or equivalently the PA load impedance  $(Z_{in})$  to the transformer parameters. (Recall that an inductive  $Z_{in}$  of  $8+7j \Omega$  is desired.) For the ease of analysis, the transformer

coupling capacitor is neglected, as in many previous works [28]. The low winding resistances are also ignored. With the model illustrated in Fig. 19(a),  $Z_{in}$  is equal to a resistive load  $R_p$  in parallel with two reactive loads  $jX_{p1}$  and  $jX_{p2}$  (i.e.,  $Z_{in} = \frac{R_p}{jX_{p1}}/jX_{p2}$ ), where  $R_p$  and  $jX_{p1,2}$  can be derived by

$$R_{\rm p}(\omega) = \frac{L_p R_{\rm ant}}{k^2 L_s} (P^2 + Q^2) \tag{6}$$

$$jX_{\rm p1}(\omega) = \frac{-R_p(\omega)}{R_{\rm ant}} \times \left[\frac{1}{j\omega C_s}//j\omega L_s(1-k^2)\right]$$
(7)

$$jX_{p2}(\omega) = \left(j\omega C_p + \frac{1}{j\omega L_p(1-k^2)}\right)^{-1}$$
(8)

where  $P \equiv 1 - \omega^2 L_s C_s (1 - k^2)$  and  $Q \equiv \omega L_s (1 - k^2)/R_{ant}$ .  $R_{ant}$  is the 50- $\Omega$  antenna load. In conventional designs with  $L_p C_p = L_s C_s$ , the resonance frequencies with a purely real input impedance can be obtained by solving  $X_{p1}(\omega) + X_{p2}(\omega) = 0$ . The three solution frequencies are expressed by

$$\omega_{1,2} = \frac{1}{\sqrt{L_s C_s}} \sqrt{\frac{1}{1-k^2} - \frac{\eta}{2} \pm \sqrt{\frac{\eta^2}{4} + \frac{k^2}{(1-k^2)^2} - \frac{\eta}{(1-k^2)}}}$$
(9)

and  $\omega_3 = 1/\sqrt{L_s C_s (1-k^2)}$ , where  $\eta \equiv L_s/(C_s R_{ant}^2)$ . If  $\eta$  is sufficiently low,  $\omega_1$  can be approximated by  $1/\sqrt{Ls Cs (1+k)}$ ,  $\omega_2$  can be approximated by  $1/\sqrt{Ls Cs (1-k)}$ , and  $\omega_1 < \omega_3 < \omega_2$ . As  $\eta$  increases,  $\omega_1$  increases while  $\omega_2$  decreases. The two frequencies eventually become equal (lower than  $\omega_3$ ) and then transform into a complex conjugate pair. This known behavior is not applicable here because  $L_p C_p \neq L_s C_s$ .

To estimate the transformer input impedance  $Z_{in}$  graphically,  $R_p$ ,  $X_{p1}$ ,  $X_{p2}$ , and  $X_{p1}//X_{p2}$  expressed in (6)-(8) are shown in Fig. 20 versus the operating frequency. The LB design parameters  $k_{\rm M} = 0.79$ ,  $L_p = 4$  nH,  $C_p = 1.8$  pF,  $L_s = 13.1$  nH,  $C_s = 2.3$  pF, and  $R_{ant} = 50 \Omega$  are used. Several critical points are annotated parametrically. First of all,  $R_p$  at low frequency, denoted by  $R_{p0}$ , can be derived by  $R_{\rm ant}L_p/L_sk^2$ . As the frequency increases,  $R_p$  degrades and reaches its minimum value of  $R_{p,min} = R_{p0}(x - x^2/4)$ at  $\omega_x = \omega_0 \sqrt{1 - 0.5x}$ , where  $\omega_0 = 1/\sqrt{LsCs(1-k^2)}$  and  $x \equiv \eta(1-k^2)$ . With the previously mentioned parameters,  $R_{\rm p0} = 24.5 \ \Omega, \ x = 0.86, \ \omega_0 = 9.4e9 \ (1.5 \text{ GHz}), \ \omega_x = 7.1e9$ (1.1 GHz), and  $R_{p,min} = 16.5 \Omega$ . As the operating frequency increases beyond  $\omega_x$ ,  $R_p$  increases rapidly and can be approximated by  $R_p \approx R_{p0}\omega^4 L_s^2 C_s^2 (1-k^2)^2$ . Second,  $X_{p1}(\omega)$  is equal to the scaled input reactance of a capacitance  $C_s$  in parallel with an inductance of  $L_s(1-k^2)$ , where the frequencydependent scaling factor is  $-R_p(\omega)/R_{ant}$ .  $X_{p1}(\omega)$  is negative for  $\omega < \omega_0$ , positive for  $\omega > \omega_0$ , and eventually approaches the cubic approximation  $R_{p0}L_s^2C_s(1-k^2)^2\omega^3/R_{ant}$ . Finally,  $X_{p2}(\omega)$  is equal to the input reactance of a capacitor  $C_p$ in parallel with an inductor of  $L_p(1-k^2)$ . The resonance frequency is  $\omega_p = 1/\sqrt{L_p C_p (1-k^2)}$  (3.1 GHz) with the parameters used.

The input impedance is the three components in parallel:  $R_p//jX_{p1}//jX_{p2}$ . At  $\omega_x$ , where  $R_p$  reaches  $R_{p,min}$ ,



Fig. 20.  $R_p$ ,  $X_{p1}$ ,  $X_{p2}$ , and  $X_{p1}//X_{p2}$  as functions of the operating frequency. The input impedance  $Z_{in} = R_p//jX_{p1}//jX_{p2}$ .



Fig. 21. Simulated input impedances for LB, MB, HB<sub>A</sub> ( $C_s = 1.5$  pF), and HB<sub>B</sub> ( $C_s = 0.5$  pF) versus the operating frequency.

 $(X_{p1}//X_{p2}) \approx \omega_x L_p(1-k^2/2)$ . At  $\omega_0$ , where  $X_{p1}$  is infinite,  $X_{p1}//X_{p2} = X_{p2} \approx \omega_0 L_p(1-k^2)$  and  $R_p = xR_{p0}$ . Finally, the load impedances at  $\omega_x$  and  $\omega_0$  are, respectively,  $[R_{p,\min}//\omega_x L_p(1-k^2/2)]$  and  $[xR_{p,\min}//\omega_0 L_p(1-k^2)]$ . Plugging in the LB design parameters,  $Z_{in}$  is calculated at  $\omega_x$  and  $\omega_0$  to be  $10 + 8j \ \Omega$  and  $7 + 10j \ \Omega$ , respectively. At both frequencies,  $Z_{in}$  is close to the preferred value.

The simulated input impedances of the three packages, employing the simulated S-parameters and with the operating frequency swept, are shown in Fig. 21. Some operating frequencies are annotated (in GHz). The input impedance is designed to be around  $8+7j \Omega$  at the operating frequencies of each of the three packages. The center frequencies of the LB, MB, and HB packages are 1.1, 1.8, and 2.5 GHz, respectively. According to the input impedance response shown in Fig. 21,



Fig. 22. Simulated (a) peak power and (b) DE for the designed transmitter incorporating the on-interposer passives.



Fig. 23. (a) Chip photograph. (b) Front and back views of the LB package. (c) Photographs of the LB, MB, and HB interposers after die attachment.

the LB, MB, and HB packages are expected to have good power and efficiency from 0.4 to 1.7 GHz, 0.9 to 2.4 GHz, and 1.6 to 3.2 GHz, respectively.

To improve the performance at higher frequencies (e.g., 3.5 GHz), a smaller  $C_s$  of 0.5 pF can be used for the HB package (HB<sub>B</sub>). It can be inferred that although  $L_p C_p \approx L_s C_s$  with  $C_s = 0.5$  pF, there should only be a single resonance frequency with  $\text{Imag}(Z_{\text{in}}) \approx 0$  according to (9). Also, the transformer loss at 2.5 GHz should be higher with  $C_s = 0.5$  pF. Observing the two HB curves in Fig. 21, it is noticed that the filled square-symbol curve, associated with  $C_s = 1.5$  pF (HB<sub>A</sub>), has an input impedance of about  $10 + 8j \Omega$  across a wider frequency range (i.e., from 1.6 to 3.2 GHz) than that of the empty diamond-symbol curve, associated with  $C_s = 0.5$  pF (HB<sub>B</sub>). However, as the frequency further increases, the HBA curve has its reactance increasing faster versus frequency than the HB<sub>B</sub> curve, and the output power and efficiency should be higher for the HB<sub>B</sub> package at higher frequencies (e.g., 3.5 GHz).

Fig. 22(a) shows the simulated peak power of the designed LB, MB, HB<sub>A</sub> ( $C_s = 1.5$  pF) and HB<sub>B</sub> ( $C_s = 0.5$  pF) packages. The simulated results with some variations

introduced in the passives designs, as listed in Fig. 19(b), are also shown in Fig. 22(a). Fig. 22(a) shows that higher power can be obtained with a higher coupling factor *k*. Fig. 22(b) shows the simulated DE of the used designs. The simulation exhibits peak power/DE of 30.4 dBm/68%, 29.8 dBm/70%, and 29.7 dBm/68%, respectively, for the LB/MB/HB packages at 1.1, 1.6, and 2.3 GHz. The second HB solution (HB<sub>B</sub>) with Cs = 0.5 pF improves both power and efficiency for frequencies higher than 3.4 GHz. In simulation, the LB, MB, and HB<sub>B</sub> packages together achieve a wide bandwidth from 0.4 to 4 GHz with output power higher than 26 dBm and DE above 26%.

# VI. MEASUREMENT RESULTS

The chip photograph and the front/back view of the LB package are, respectively, shown in Fig. 23(a) and (b). The chip is fabricated in the TSMC 65-nm CMOS LOGIC process without MIM capacitors, and a top metal thickness is only 0.9  $\mu$ m. The active area on chip is about 1.7 mm<sup>2</sup>, and the number of available flip-chip bumps is limited by the chip area to less than 30. This limitation warrants streaming the 8-b amplitude and phase codes into the chip via two serial 2.5-Gb/s



Fig. 24. Measured peak power and DE/SE for the LB, MB,  $\rm HB_A,$  and  $\rm HB_B$  packages.

LVDS inputs. An on-chip scan chain controls the static setup of the transmitter, including the bias currents for the integrator, phase interpolator, and the LO/CLK receivers. As shown in Fig. 23(b), the interposer is about  $7 \times 14 \text{ mm}^2$ , limited by the  $7 \times 14$  BGA on its bottom, which is used to interface with the PCB motherboard through a spring pin socket (Ironwood CBT-BGA 6001). Fig. 23(c) shows photographs of the three interposers after die attachment. The interposers have a lot of unused area on the top and inner layers to accommodate the transformer designs. This warrants the proposed strategy of moving the passive structures out of the chip for better power and efficiency and a more generic CMOS transmitter design.

The measured peak power, DE, and SE of the LB, MB,  $HB_A$ , and  $HB_B$  packages with a supply voltage of 2.5 V are shown in Fig. 24. The HB<sub>A</sub> package uses a  $C_s$  of 1.5 pF, and the HB<sub>B</sub> package uses a  $C_s$  of 0.5 pF. The SE includes the power consumption of all components in the all-digital Tx. The power is measured via a spectrum analyzer calibrated with a power meter. In this measurement, both the AM and PM codes are set to the maximum value of 255 ( $AM_{code} =$  $PM_{code} = 255$ ). The two LVDS signals are generated by an FPGA board (Xilinx VC707) connected to the PCB motherboard. The signals are fed into the chip via the socket and the interposer. The LO and two clocks signals are sinusoidal and generated by three signal generators sharing the same 10-MHz reference. The LB package has its best power/DE/SE of 29.2 dBm/60%/56% at 1.1 GHz. The corresponding peak performances for the MB and HB<sub>A</sub> packages are 28.8 dBm/56%/52% and 28.4 dBm/58%/52%, respectively, at 1.5 and 2.4 GHz. The  $HB_B$  package is able to output 25.5 dBm at 3.5 GHz with a good DE of 40% and SE of 35%. The collective bandwidth of the three packages (LB/MB/HB<sub>B</sub>) covers from 0.7 to 3.5 GHz, with peak power higher than 25.5 dBm and the DE higher than 40%.

Fig. 25 shows the normalized output magnitude and phase of the MB package at 2.4 GHz. The AM code is swept



Fig. 25. Normalized Tx output magnitude and phase versus the AM code.



Fig. 26. Phase responses at (a) 0.6 GHz with SEL = 5, (b) 0.6 GHz with SEL = 13, and (c) 3.6 GHz with SEL = 13.

quasi-statically from 0 to 255, while the PM code is set at  $PM_{code} = 0$ . In this measurement, the LVDS signal is dc for the phase and a bit sequence with a repetition period of 4 ns for the AM stream. The compressive AM–AM response results from the inverse Class-D AM scheme, where the switch conductance is proportional to the AM code. The AM–AM response is very similar to the illustrative curve shown in Fig. 5(c) with  $R_L = 10$ . For a desired transmitter complex output, the measured AM–AM table is consulted to select the proper AM code corresponding to an approximated output magnitude, while the AM–PM distortion associated with the selected AM code will be compensated later by the PM.

Fig. 26 shows the measured phase response of the PM. In this measurement, the PM code is swept quasi-statically



Fig. 27. Measured 64-QAM constellation and Tx performance at the six testing frequencies 0.6, 1.2, 1.8, 2.4, 3, and 3.6 GHz.

from 0 to 255, while the AM code is set at 255. Fig. 26(a) and (b) shows the output phase and phase step when  $\omega_{\rm LO} = 0.6$  GHz, with the integrator current set, respectively, at 300  $\mu$ A (SEL = 5) and 780  $\mu$ A (SEL = 13). The phase responses cover the full 360°, and the phase steps exhibit repetitive patterns every 90° (or 64 PM codes). Larger steps are found around PM codes of 32, 96, 160, and 224, and smaller steps are found around PM codes of 0, 64, 128, and 192. These nonlinear behaviors are expected with the designed PM and have been explained in Section IV. It can be observed by comparing Fig. 26(a) and (b) that the phase response at SEL = 5, with a maximum phase step of about 2.5° /step, is more linear than that at SEL = 13. The performance deviation caused by the integrator setup has also been explained previously in Section IV. At 3.6 GHz, a higher integrator current should be used, and the measured phase response with SEL = 13is shown in Fig. 26(c). The obtained maximum phase step is also about 2.5° /step. Fig. 26 demonstrates the wideband capability of the phase interpolator, which is able to work from 0.6 to 3.6 GHz. For a desired complex output, the PM-PM table is consulted to correct the phase distortion introduced by the AM and to approximate the required output phase. The PM-AM effect is very minor and can be neglected.

After characterizing both the amplitude modulator and PMs, modulation tests, including those for 64 QAM and 20-MHz WLAN and LTE, were performed at six RF frequencies with the LB, MB, and HB<sub>B</sub> packages. The LB, MB, and HB<sub>B</sub> packages are responsible for supporting frequencies of 0.6/1.2 GHz, 1.8/2.4 GHz, and 3/3.6 GHz, respectively. It is important to note that the transmitter AM–AM, AM–PM, and PM–PM responses, together with the dc settings, depend on the operating frequency. Therefore, each operating frequency has its dedicated AM–AM, AM–PM, PM–PM, and dc setup tables.

The measured 64-QAM constellations (via a Keysight spectrum analyzer and VSA software) at the six frequencies are



Fig. 28. Simulated EVM and noise for an ideal and quantized (by the all-digital Tx) WLAN signal versus the signal PAPR.

shown in Fig. 27. The modulation symbol rate is 62.5 MS/s, corresponding to a data rate of 375 Mb/s. The measured average power, DE and SE, and constellation error vector magnitude (EVM) are annotated in Fig. 27. Good power, efficiency, and linearity in terms of EVM are achieved at the six frequencies. For example, the MB package generates an output power of 23.3 dBm at 2.4 GHz with a DE of 34% and an EVM of -33 dB. The modulation output power is lower than the peak power due to a signal PAPR of 3.7 dB.

To demonstrate a wideband and multistandard Tx capability, the three packages are programmed to output a 20-MHz WLAN (802.11.g) signal at the six testing frequencies. The OFDM-based signal has a high PAPR value of 8.7 dB, which, without further compression, will result in a low transmitter power and efficiency. Recall that, in inverse Class-D designs with the output magnitude controlled by the switch conductance, the transmitter has its DE proportional to the output voltage. Thus, the transmitter has only 37% of its peak efficiency when operated with an 8.7-dB power backoff. To improve both the output power and efficiency, the ideal WLAN signal is scaled such that some of its output magnitudes are higher than the maximum transmitter output ( $AM_{code} = 255$ ). The signal is then clipped by replacing the out-of-bounds magnitudes with the maximum transmitter output. Scaling and clipping the signal causes its PAPR to decrease at the expense of increased signal distortion, degraded EVM, and a higher out-of-band noise floor. Fig. 28 shows the simulated EVM and noise (at an 80-MHz offset) of a WLAN signal versus the degree of signal scaling/clipping in terms of PAPR. The simulated EVM and noise of the signal, as approximated by the quantized outputs of the amplitude modulator and PMs, are also shown. The AM-AM, AM-PM, and PM-PM tables measured at 2.4 GHz are used in the simulation. Following the approximation, the simulated noise degrades roughly 15 dB to approximately -120 dBc/Hz. The noise is dominated by the quantization noise of the alldigital transmitter and does not increase with PAPR, unlike the original signal. On the other hand, the transmitter output resolution is sufficient to approximate the original signal with little EVM degradation if the original signal has an EVM higher than -40 dB. To achieve a good output power and efficiency, with the EVM still satisfying the specification of



Fig. 29. Measured WLAN performance (power, DE/SE, EVM) at 2.4 GHz and other frequencies (i.e., 0.6, 1.2, 1.8, 3, and 3.6 GHz).



Fig. 30. Measured WLAN spectrum and demodulated 64-QAM constellation at 2.4 GHz (MB).

-25 dB with sufficient margin, the transmitter is programmed to approximate a WLAN signal with a PAPR of 6.2 dB.

Finally, the measured output power, DE/SE, and EVM of the three packages (LB, MB, and HB<sub>B</sub>) at the six frequencies are shown in Fig. 29. At the standard frequency of 2.4 GHz, the transmitter achieves a high output power of 20.7 dBm with an EVM better than -30 dB. The DE and SE are, respectively, 25% and 21%. If the transmitter is centered alternatively at 1.8/3 GHz, similar output power and efficiency can be obtained, while the EVMs still meet specification. The measured output spectrum at 2.4 GHz and the demodulated 64-QAM constellation from the OFDM subcarriers are shown in Fig. 30. The spectrum meets the required mask, but the noise level is high, at -118 dBc/Hz with an 80-MHz offset. The noise level are about 15 dB higher than that of the stateof-the-art noise-filtering all-digital Tx [9], which employs a higher symbol rate of 1 GS/s and mixed-signal filtering.

The 20-MHz 64-QAM LTE uplink signal is used for the third modulation test. Similarly, the LTE baseband signal has a high PAPR of 8.9 dB and must be scaled and clipped to support good output power and efficiency. Although the LTE single-carrier frequency-division multiple access technique allows using a lower PAPR than the WLAN OFDM, excessively scaling/clipping the signal can still result in a violation of linearity and ACLR requirements. Fig. 31(a) shows the simulated EVM and spectrum noise of the original LTE signal versus the



Fig. 31. Simulated (a) EVM noise and (b) ACLR for an ideal and quantized 20-MHz LTE signal versus the signal PAPR.



Fig. 32. Measured 20-MHz LTE power, DE/SE, ACLR, and EVM at the six testing frequencies from 0.6 to 3.6 GHz.

signal PAPR. The counterpart EVM/noise at the Tx output is also plotted, again using the measured AM–AM, AM–PM, and PM–PM tables at 2.4 GHz. To achieve an EVM of -30 dB, the signal PAPR can be as low as 4 dB, and the dominant quantization noise is at approximately -117 dBc/Hz. Regulations on adjacent channel leakage power ratio (ACLR) also require attention and are simulated versus the signal PAPR, before and after the transmitter quantization. The results are shown in Fig. 31(b). To achieve an ACLR<sub>1</sub> lower than -30 dB and ACLR<sub>2</sub> lower than -36 dB with sufficient margin, the transmitter being measured is programmed to output an LTE signal with a PAPR of 4.1 dB.



Fig. 33. Measured LTE spectrum and demodulated 64-QAM constellation at (a) 1.2 GHz (LB), (b) 2.4 GHz (MB), and (c) 3.6 GHz (HBB).

 TABLE I

 REPORTED CMOS CW PERFORMANCES WITH PEAK POWER > 26 dBm (ORDERED BY OPERATING FREQUENCY)

| Reference |                 |    | Freq.   | CMOS  | V <sub>DD</sub> | Peak      | DE    | SE or               | On-Chip | Output          | Power                 | Modulation    |
|-----------|-----------------|----|---------|-------|-----------------|-----------|-------|---------------------|---------|-----------------|-----------------------|---------------|
|           |                 |    | (GHz)   | Proc. | (V)             | Power     | (%)   | PAE                 | Metal   | Off-Chip        | Core                  | Test          |
|           |                 |    |         | (nm)  |                 | (dBm)     |       | (%)                 |         | Network         |                       |               |
| [31]      | 12' TMTT        |    | 0.93    | 90    | 2.0             | 29.4      | 28    | 26                  | 3.2 µm  | None            | Class AB              | 10-MHz LTE    |
| [11]      | 14'ASSCC        |    | 1.2     | 65    | 2.5             | 27.1      | 51    | N.A.                | 0.9 µm  | LTCC Interposer | Class D <sup>-1</sup> | 20-MHz LTE    |
| [15]      | 09' TMTT        |    | 1.7     | 180   | 3.3             | 33.8      | N.A.  | 50                  | 2.3 μm  | Balun on IPD    | Class E               | None          |
| [20]      | 13' TMTT        |    | 1.85    | 180   | 4.5             | 30.2      | N.A.  | 48                  | N.A.    | Balun on PCB    | Class AB              | 10-MHz LTE    |
| [29]      | 15' JSSC        |    | 1.9     | 40    | 1.5             | 28.0      | N.A.  | 34                  | N.A.    | None            | Doherty               | 20-MHz LTE    |
| [16]      | 09' JSSC        |    | 2.0     | 130   | 3.3             | 29.3      | N.A.  | 69 <sup>&amp;</sup> | 4 µm    | None            | Class E               | 20-MHz WLAN   |
| [10]      | ] 14' RFIC      |    | 2.25    | 65    | 2.6             | 29.1      | N.A.  | 42                  | N.A.    | Use GSSG probe  | Class E               | 20-MHz 64QAM  |
| [17]      | 7] 12' JSSC     |    | 2.4     | 65    | 2.5             | 27.7      | N.A.  | 45                  | N.A.    | Discrete Balun  | Class E               | 20-MHz WLAN   |
| [18]      | [18] 14' ESSCIF |    | 2.4     | 45    | 2.4             | 29.5      | 46.7  | N.A.                | N.A.    | None            | Class E               | 20-MHz LTE    |
| [30]      | ] 16' RFIC      |    | 2.48    | 65    | 3.3             | 26.7      | 48    | 39                  | 3.4 µm  | None            | Class AB              | 20-MHz WLAN   |
| [14]      | 16' JSSC        |    | 2.6     | 65    | 3.0             | 28.1      | 41    | 35                  | N.A.    | None            | Class D <sup>-1</sup> | 8-MS/s 256QAM |
| [13]      | 16' JSSC        |    | 3.7     | 65    | 3.0             | 26.7      | 40    | N.A.                | N.A.    | None            | Doherty               | 1-MS/s 16QAM  |
| This      |                 | LB | 0.7/1.1 | 65    | 2.5             | 27.7/29.2 | 40/60 | 38/56               | 0.9 µm  | HDI PCB         | Class D <sup>-1</sup> | 63-MS/s 64QAM |
| Work      |                 | MB | 1.5/2.3 |       |                 | 28.8/27.7 | 56/54 | 53/49               |         | Interposer      |                       | 20-MHz WLAN   |
|           |                 | HB | 3.0/3.5 |       |                 | 26.8/25.5 | 37/40 | 33/35               |         |                 |                       | 20-MHz LTE    |

<sup>&</sup> Loss in the supply modulator not included

The measured Tx output power, DE and SE, EVM, and ACLR<sub>1,2</sub> on both sides of the RF carrier are summarized in Fig. 32. For all the six test frequencies, the output power is higher than 20 dBm with a DE above 25% and SE above 20%. The measured EVM and ACLRs meet specification. Among the six testing frequencies, the MB package at 1.8 GHz achieves a high output power of 24 dBm with an excellent SE of 30%. The measured spectra and the demodulated 64-QAM constellations at three testing frequencies, 1.2, 2.4, and 3.6 GHz, are shown in Fig. 33. The spectra meet the ACLR specifications with substantial margin, and the output noises are approximately -115 dBc/Hz at an 80-MHz frequency offset for all the tested frequencies.

Table I compares the CW performance of this work to reported CMOS Txs/PAs with peak power higher than 26 dBm. To provide a fair comparison, one must be reminded that the works employing discrete baluns or tested with GSSG probes will suffer some degradation in power and efficiency (e.g., 1 dB) if an on-chip or on-package

balun is realized and the associated loss is included. Using a moderate supply voltage of 2.5 V, the achieved peak power of the work is among the highest, and the achieved SE is arguably the best, considering that the power consumptions of all components in the all-digital transmitter have been accounted for. Table II compares the WLAN performance of this work to state-of-the-art Txs/PAs with output power higher than 17 dBm. With the balun loss included, both the achieved power (20.7 dBm) and SE (>21%) are also excellent. Finally, the reported CMOS LTE Txs/PAs with output power higher than 20 dBm are summarized in Table III. The prevailing technique is the envelop tracking (ET) with off-chip buck convertor [20]-[22]. Unlike an all-digital Tx, the input signal of an ET amplifier is already a scaled version of the output RF signal as conventional linear PAs. As can be seen, the 24-dBm output power and 30% SE achieved by this all-digital Tx are comparable to that achieved by the stateof-the-art ET amplifiers transmitting 10-MHz and 20-MHz LTE bandwidth.

Freq. DE/(SE EVM Modulator All Reference Power Meet Output (GHz) (dBm) or PAE) (dB)Mask Off-Chip Digital (%)Network Τv N.A./22.6\* [16] 09' JSSC 2.0 19.6 -32 Yes None Class G + ET (Off-chip AM/PM) No [1] 11' JSSC 2.4 19.6 N.A./21.8 -25 Yes Discrete Balun Outphasing AM (Off-chip PM) No 11' JSSC 2.25 17.7 [4] N.A./27 -32 No Discrete Balun Switched-Cap AM (Off-chip PM) AM only 12' JSSC 2.4 20.2 Yes Class G + Outphasing AM (Off-chip PM) [17] N.A./27.6 -31 Discrete Balun No 12' ISSCC Yes 2.4 20 [2] 22/18.6 -25 Discrete Balun Outphasing AM (On-chip PM) Yes 13' ISSCC 2.4 18.8 17/15 -25 Yes Digitally-modulated IQ Combining Yes [8] None [3] 15' RFIC 2.118.3 N.A./13 -26 Yes Discrete Balun **RF-Pulse Width Modulation** Yes [12] 16' ISSCC 2.5 19.5 14.5/10 -30 Yes Discrete Balun# Digitally-modulated IQ Combining Yes

Yes

Yes

Yes

TABLE II Reported 20-MHz WLAN CMOS Transmitters With Power > 17 dBm

<sup>&</sup>Loss in supply modulator not included <sup>#</sup>Balun loss included

20.1

21.6

20.7

21/18

27/23

25/21

-25

-31

-31

2.48

1.8

2.4

[30]

This

Work

16' RFIC

MB

MB

TABLE III

None

HDI PCB

Interposer

REPORTED 10/20-MHz LTE CMOS TRANSMITTERS WITH POWER > 20 dBm (ORDERED BY OPERATING FREQUENCY)

| Reference |              | Freq.  | BW    | Power | DE/   | EVM       | Meet  | Output  | Modulator    | All                          |         |
|-----------|--------------|--------|-------|-------|-------|-----------|-------|---------|--------------|------------------------------|---------|
|           |              |        | (GHz) | (MHz) | (dBm) | SE or PAE | (dB)  | ACLR    | Off-Chip     |                              | Digital |
|           |              |        | /QAM  |       | (%)   |           | Spec. | Network |              | Tx                           |         |
| [31]      | 12'          | TMTT   | 0.93  | 10/16 | 25.1  | N.A./15   | -25   | Yes     | None         | Linear PA                    | No      |
| [6]       | 16'          | RFIC   | 1.8   | 10/64 | 20.9  | N.A./15.2 | -29   | Yes     | GSSG probe   | Switched-Cap AM and PM       | Yes     |
| [19]      | 13' ISSCC    |        | 1.8   | 20/64 | 21.3  | N.A./18   | -22   | Yes     | None         | Linear PA                    | No      |
| [20]      | ] 13' TMTT   |        | 1.85  | 10/16 | 26    | N.A./34.1 | -31   | Yes     | Balun on PCB | ET (Off-chip AM and PM)      | No      |
| [22]      | 22] 15' MWCL |        | 1.85  | 10/16 | 27.5  | N.A./42.4 | -32   | Yes     | Balun on PCB | ET (Off-chip AM and PM)      | No      |
| [29]      | 9] 15' JSSC  |        | 1.9   | 20/16 | 23.4  | N.A./23.3 | -23   | Yes     | None         | Doherty                      | No      |
| [21]      | ] 14' ISSCC  |        | 1.95  | 20/16 | 25.6  | N.A./32.2 | N.A.  | Yes     | None         | ET (Off-chip AM and PM)      | No      |
| [18]      | 14' E        | SSCIRC | 2.4   | 20/64 | 22.8  | 21/N.A.   | N.A.  | Yes     | None         | Out-phasing AM (Off-chip PM) | No      |
| This      |              | LB     | 1.2   | 20/64 | 24.5  | 32/29     | -27   | Yes     | HDI PCB      | Class D <sup>-1</sup> AM     | Yes     |
| Work      |              | MB     | 1.8   |       | 24.0  | 34/30     | -27   | Yes     | Interposer   | (On-chip PM)                 |         |
|           |              | MB     | 2.4   |       | 23.1  | 28/24     | -28   | Yes     |              |                              |         |

# VII. CONCLUSION

This work demonstrates a frequency-reconfigurable CMOS all-digital transmitter with integrated phase and amplitude paths. The same CMOS chip is flip-chip connected to three PCB HDI interposers targeting three coarse frequency bands. The combined operating bandwidth of the three packages covers a wide range from 0.7 to 3.5 GHz, where a CW power higher than 25.5 dBm and DE above 40% can be achieved. The three packages have their peak power higher than 28 dBm and peak DE of 60%. The three packages have been tested with digital modulation schemes, including 64 QAM, 802.11.g WLAN, and 20-MHz LTE, at frequencies from 0.6 to 3.6 GHz, verifying the design's ability to support universal standards adaptation and frequency reconfiguration. Compared to reported designs, the transmitter exhibits excellent output power and efficiency, and features an all-digital input interface.

## APPENDIX

## A. Introduction on the Tx Periphery Circuits

The schematics of the transmitter periphery circuits, including the receivers for the sinusoidal LO and clocks and the 2.5-Gb/s LVDS receivers, are illustrated in Fig. 34. With the two strong-arm comparators, the LVDS receiver is able to exploit both the rising and falling edges of the 1.25-GHz LVDS Rx clock. The comparators alternatingly output their bit decisions, which are then held and retimed by the subsequent latch and flip-flops. They are then presented to two 1-to-5 deserializers on the falling edge of the 1.25-GHz clock. The two deserializers are serial registers that take in input data at the rising edge of the 1.25-GHz clock and output 10 b in parallel at the rising edge of the 250-MHz DeSer clock. The same receiver topology is used for both the LO and the LVDS Rx and DeSer clocks. The receiver is based on the CML-CMOS converter, composed of a two-stage differential amplifier followed by self-biased inverter chains. The common-mode rejection provided by the differential amplifier is critical to reject coupling from the transmitter output into the clock receivers through ground bounce.

Linear PA

Class D<sup>-1</sup> AM

(On-chip PM)

# B. Transmitter Performance to Various Mismatch Effects

Mismatch effects occurring in the inverse class-D cells, PM bias currents, and the delays of the AM and PM paths all have influence on the Tx performance. First, mismatch can cause the 19 parallel-connected inverse class-D cells to deviate

No

Yes



Fig. 34. Schematics for the LO and (LVDS Rx and DeSer) clock receivers and the LVDS data receiver (for the AM and PM streams).



Fig. 35. Mismatch study (a) Tx output magnitude and (b) magnitude step versus the AM code. (c) Tx performance (with an ideal PM).

from their nominal size and lead to reduce RF-DAC accuracy. Notice that even without mismatch, the output voltage to the AM<sub>code</sub> response, illustrated in Fig. 5(c), is not linear and requires digital predistortion to approximate the output magnitude. Therefore, the mismatch in the parallel-cell sizes can be partially corrected by the predistortion. The output voltages to the AM<sub>code</sub> are simulated (at 1.8 GHz) and shown in Fig. 35(a) for the nominal case and three mismatch scenarios, where a substantial 50% size increase occurs to the fourth cell A(3), the eighth cell A(7), and the twelfth cell A(11). The corresponding voltage steps are shown in Fig. 35(b), and the simulated Tx performances are summarized in Fig. 35(c). An infinite-resolution PM is used in the simulation. Without mismatch, the A(3) cell has a relative size of 8 and A(7) and A(11) have relative sizes of 16. The A(11) cell is turned ON when  $AM_{code} > 127$  and affects the performance very little because the voltage step is very small there. The A(7)mismatch causes an output voltage gap between 0.66 and 0.7 V, and the (quantization) noise floor increases noticeably, while the in-band performance degrades little. The mismatched A(3) cell is turned ON when Mod(AM<sub>code</sub>, 16) > 7, and the AM response no longer increases monotonically in this case. Although the predistortion can exploit all the available output magnitudes, the noise floor goes higher due to the multiple voltage gaps that cannot be approximated accurately. In general, voltage gaps at lower magnitudes are less tolerable since the modulated signals have a high PAPR. In the layout, the outer thermometer cells are guarded by dummy cells, and one thermometer cell is rerouted to realize the four binary cells (A $\langle 0 \rangle$  to A $\langle 3 \rangle$ ).

Second, the current mismatch between the two mixer bias currents in the PM distorts the phase response. This distortion can be corrected to some extent, together with the original nonlinear phase response (see Fig. 10), by the predistortion. For several mismatch scenarios, Fig. 36(a) shows the simulated mixer fundamental output  $V_{\text{mixer}}$  in the third quadrant, where 64 of the 256 PM codes are swept (frequency = 1.8 GHz, SEL = 10). It can be observed that increasing/decreasing the bias current of the in-phase mixer stretches/compresses the in-phase component of  $V_{\text{mixer}}$ . Since the two bias currents are directed alternatively to the mixer outputs, no dc mismatch is induced by the current mismatch. The corresponding phase steps of  $V_{\text{mixer}}$  and  $D_{\text{out}}$  (at the CML–CMOS converter output) are simulated and shown in Fig. 36(b), and the resulting



Fig. 36. Mismatch study (a) mixer output voltage and (b) phase step versus the PM code. (c) Tx performance (with an ideal amplitude modulator).



Fig. 37. (a) WLAN and (b) LTE perforance degradation due to the delay mismatch between the amplitude and the phase path.

Tx performances are summarized in Fig. 36(c). An amplitude modulator with infinite resolution is used in the simulation. The mismatches degrade the performance very little. The third mismatch scenario (case 3) increases one mixer load resistance from 400 to 600  $\Omega$ . In this case, a 100-mV dc voltage offset is created at the mixer output, which can be suppressed by the CML–CMOS converter. The mixer output voltage is amplified in this case and introduces little phase distortion.

Finally, the WLAN and LTE EVM and out-of-band noise for a polar transmitter are simulated and shown in Fig. 37. The WLAN results are close to that reported in [9]. As introduced previously, the designed 8-b modulators and the signal clipping (for a lower PAPR) degrade the signal integrity, so a substantial delay mismatch of 1 ns can be tolerated without any noticeable performance degradation. In this design, the on-chip amplitude and phase paths have a delay mismatch lower than 100 ps. The settling time at the mixer output is about 50 ps, and the delay introduced by the CML–CMOS converter and the following buffers is about 60 ps before the AND gates that combine the AM path and the PM path (see Fig. 2). This delay is compensated partially by the buffer chains in the AM path before the AND gates.

## ACKNOWLEDGMENT

The authors would like to thank the TSMC University Shuttle Program for the chip fabrication.

#### REFERENCES

- H. Xu, Y. Palaskas, A. Ravi, M. Sajadieh, M. A. El-Tanani, and K. Soumyanath, "A flip-chip-packaged 25.3 dBm class-D outphasing power amplifier in 32 nm CMOS for WLAN application," *IEEE J. Solid-State Circuits*, vol. 46, no. 7, pp. 1596–1605, Jul. 2011.
- [2] P. Madoglio *et al.*, "A 20 dBm 2.4 GHz digital outphasing transmitter for WLAN application in 32 nm CMOS," in *IEEE ISSCC Tech. Dig. Papers*, Feb. 2012, pp. 168–170.
- [3] K. Cho and R. Gharpurey, "A 25.6 dBm wireless transmitter using RF-PWM with carrier switching in 130-nm CMOS," in *Proc. IEEE Radio Freq. Integr. Circuits Symp.*, May 2015, pp. 139–142.
- [4] S.-M. Yoo, J. S. Walling, E. C. Woo, B. Jann, and D. J. Allstot, "A switched capacitor RF power amplifier," *IEEE J. Solid-State Circuits*, vol. 46, no. 12, pp. 2977–2987, Dec. 2011.
- [5] S.-M. Yoo et al., "A class-G switched-capacitor RF power amplifier," *IEEE J. Solid-State Circuits*, vol. 48, no. 5, pp. 1212–1224, May 2013.
- [6] W. Yuan and J. S. Walling, "A multiphase switched capacitor power amplifier in 130 nm CMOS," in *Proc. IEEE Radio Freq. Integr. Circuits Symp.*, May 2016, pp. 413–416.
- [7] D. Chowdhury, S. V. Thyagarajan, L. Ye, E. Alon, and A. M. Niknejad, "A fully-integrated efficient CMOS inverse class-D power amplifier for digital polar transmitters," *IEEE J. Solid-State Circuits*, vol. 47, no. 5, pp. 1113–1122, May 2012.
- [8] C. Lu et al., "A 24.7 dBm all-digital RF transmitter for multimode broadband applications in 40 nm CMOS," in *IEEE ISSCC Tech. Dig. Papers*, Feb. 2013, pp. 422–423.
- [9] L. Ye, "Design and analysis of digitally modulated transmitters for efficiency enhancement," Ph.D. dissertation, Dept. Elect. Eng. Comput. Sci., Univ. California at Berkeley, Berkeley, CA, USA, 2013.
- [10] R. Bhat and H. Krishnaswamy, "A watt-level 2.4 GHz RF I/Q power DAC transmitter with integrated mixed-domain FIR filtering of quantization noise in 65 nm CMOS," in *Proc. IEEE Radio Freq. Integr. Circuits Symp.*, Jun. 2014, pp. 413–416.

- [11] N.-C. Kuo *et al.*, "A frequency-reconfigurable multi-standard 65 nm CMOS digital transmitter with LTCC interposers," in *Proc. IEEE Asian Solid-State Circuits. Conf.*, Nov. 2014, pp. 345–348.
- [12] Z. Deng et al., "A dual-band digital-WiFi 802.11a/b/g/n transmitter SoC with digital I/Q combining and diamond profile mapping for compact die area and improved efficiency in 40 nm CMOS," in *IEEE ISSCC Tech. Dig. Papers*, Feb. 2016, pp. 172–173.
- [13] S. Hu, S. Kousai, and H. Wang, "A broadband mixed-signal CMOS power amplifier with a hybrid class-G Doherty efficiency enhancement technique," *IEEE J. Solid-State Circuits*, vol. 51, no. 3, pp. 598–613, Mar. 2016.
- [14] J. S. Park, S. Hu, Y. Wang, and H. Wang, "A highly linear dualband mixed-mode polar power amplifier in CMOS with an ultracompact output network," *IEEE J. Solid-State Circuits*, vol. 51, no. 8, pp. 1756–1770, Aug. 2016.
- [15] H. Lee, C. Park, and S. Hong, "A quasi-four-pair class-E COS RF power amplifier with an integrated passive device transformer," *IEEE Trans. Microw Theory Techn.*, vol. 57, no. 4, pp. 752–759, Apr. 2009.
- [16] J. S. Walling, S. S. Taylor, and D. J. Allstot, "A class-G supply modulator and class-E PA in 130 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 44, no. 9, pp. 2339–2347, Sep. 2009.
- [17] P. A. Godoy, S. Chung, T. W. Barton, D. J. Perreault, and J. L. Dawson, "A 2.4-GHz, 27-dBm asymmetric multilevel outphasing power amplifier in 65 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 47, no. 12, pp. 2372–2384, Dec. 2012.
- [18] A. Banerjee, R. Hezar, L. Ding, N. Schemm, and B. Haroun, "A 29.5 dBm class-E outphasing RF power amplifier with performance enhancement circuits in 45 nm CMOS," in *Proc. 40th Eur. Solid State Circuits Conf. (ESSCIRC)*, Sep. 2014, pp. 467–470.
- [19] K. Onizuka, S. Shigehito, and O. Shoji, "A 1.8 GHz linear CMOS power amplifier with supply-path switching scheme for WCDMA/LTE applications," in *IEEE ISSCC Tech. Dig. Papers*, Feb. 2013, pp. 90–92.
- [20] D. Kang, B. Park, D. Kim, J. Kim, Y. Cho, and B. Kim, "Envelopetracking CMOS power amplifier module for LTE applications," *IEEE Trans. Microw. Theory Techn.*, vol. 61, no. 10, pp. 3763–3773, Oct. 2013.
- [21] K. Oishi et al., "A 1.95 GHz fully integrated envelope elimination and restoration CMOS power amplifier with envelope/phase generator and timing aligner for WCDMA and LTE," in *IEEE ISSCC Tech. Dig. Papers*, Feb. 2014, pp. 60–61.
- [22] S. Jin *et al.*, "A highly efficient CMOS envelope tracking power amplifier using all bias node controls," *IEEE Microw. Compon. Lett.*, vol. 25, no. 8, pp. 517–519, Aug. 2015.
- [23] P. Asbeck and Z. Popovic, "ET comes of age," *IEEE Microw. Mag.*, vol. 17, no. 3, pp. 16–25, Mar. 2016.
- [24] S. Sehajpal, S. S. Taylor, D. J. Allstot, and J. S. Walling, "Impact of switching glitches in class-G power amplifiers," *IEEE Microw. Wireless Compon. Lett.*, vol. 22, no. 6, pp. 282–284, Jun. 2012.
- [25] H.-C. Lu, C.-C. Kuo, S.-A. Wei, P.-S. Huang, and H. Wang, "Ultra broad band CMOS balanced amplifiers using quadrature power splitters on glass integrated passive device (GIPD) and LTCC with flip chip interconnects for SiP integration," in *IEEE MTT-S Int. Microw. Symp. Dig.*, Jun. 2012, pp. 1–3.
- [26] S. Ramakrishnan, L. Calderin, A. Puglielli, E. Alon, A. Niknejad, and B. Nikolić, "A 65 nm CMOS transceiver with integrated active cancellation supporting FDD from 1 GHz to 1.8 GHz at +12.6 dBm TX power leakage," in *Proc. Symp. VLSI*, Jun. 2016, pp. 1–2.
- [27] B. Razavi, *RF Microelectronics*. Englewood Cliffs, NJ, USA: Prentice-Hall, 2011.
- [28] C. J. Galbraith, T. M. Hancock, and G. M. Rebeiz, "A low-loss doubletuned transformer," *IEEE Microw. Wireless Compon. Lett.*, vol. 17, no. 11, pp. 772–774, Nov. 2007.
- [29] E. Kaymaksut and P. Reynaert, "Dual-mode CMOS Doherty LTE power amplifier with symmetric hybrid transformer," *IEEE J. Solid-State Circuits*, vol. 50, no. 9, pp. 1974–1987, Sep. 2015.
- [30] H. Ahn, S. Baek, H. Ryu, I. Nam, and O. Lee, "A highly efficient WLAN CMOS PA with two-winding and single-winding combined transformer," in *Proc. IEEE Radio Freq. Integr. Circuits Symp.*, May 2016, pp. 310–313.
- [31] B. François et al., "A fully integrated watt-level linear 900-MHz CMOS RF power amplifier for LTE-applications," *IEEE Trans. Microw. Theory Techn.*, vol. 60, no. 6, pp. 1878–1885, Jun. 2012.



**Nai-Chung Kuo** (S'09) received the B.S. degree in electrical engineering (with a minor in philosophy) and M.S. degree in communication engineering from National Taiwan University, Taipei, Taiwan, in 2009 and 2011, respectively. He is currently pursuing the Ph.D. degree in electrical engineering at the University of California at Berkeley, Berkeley, CA, USA.

He is involved in the design of microwave and RF circuits and systems and wireless power transfer.

Mr. Kuo was a recipient of the IEEE T-CAS Darlington Best Paper Award in 2017, the IEEE MTT-S Graduate Fellowship in 2016, the Best Design Award from the National Chip Implementation Center in 2012, the First Prize Youth Thesis from the Chinese Institute of Electrical Engineering, the Best Annual Thesis from NTU in 2011, the NTU Presidential Awards, and the Silver Medal from the Asian–Pacific Mathematics Olympiad in 2005. He was a co-recipient of the IET Premium Award in 2015.



**Bonjern Yang** (S'12) received the B.S. degree in electrical engineering and computer sciences from the University of California at Berkeley, Berkeley, CA, USA, where he is currently pursuing the Ph.D. degree in electrical engineering.

His current research interests include RF power amplifiers, reconfigurable RF front ends, and the design and layout automation of analog and mixedsignal circuits.



Angie Wang (S'11–GS'12) received the B.S. degree in electrical engineering from the California Institute of Technology, Pasadena, CA, USA, in 2012. She is currently pursuing the Ph.D. degree in electrical engineering with the University of California at Berkeley, Berkeley, CA, USA.

She is involved in mixed-signal/RF integrated circuits and systems for use in consumer electronics. Her current research interests include the design of application-specified integrated circuit and fieldprogrammable gate array hardware generators to

ease the implementation of very-large-signal integration signal processing systems, with applications in sensor interfaces, software-defined radio, and beyond.



Lingkai Kong (S'07–M'12) received the B.S. degree in mathematics and physics from Tsinghua University, Beijing, China, in 2007, and the Ph.D. degree in electrical engineering from the University of California at Berkeley, Berkeley, CA, USA, in 2012.

He has held internship positions with the Inphi Corporation, Rambus Inc., and Xilinx Inc., where he was involved in various projects including laser driver, millimeter-wave front-end and high-speed link designs. His current research interest includes

the design and optimization of energy-efficient integrated systems, as well as the design automation for large mixed-signal systems.

Dr. Kong was a recipient of the 2012–2013 SSCS Predoctoral Achievement Award, the 2012 James H. Eaton Memorial Scholarship, and the 2011 Analog Devices Outstanding Designer Award. He was a co-recipient of the Best Student Paper Award of the 2011 Symposium on VLSI Circuits.



**Charles Wu** (S'09–M'14) received the B.S. and Ph.D. degrees from the University of California at Berkeley, Berkeley, CA, USA, in 2007 and 2014, respectively.

From 2007 to 2009, he was an Analog Designer with the Broadcom Corporation, Irvine, CA, USA, where he designed analog front-end modules and continuous-time sigma-delta ADCs for DSL cable modems and audio CODEX. In 2010, he joined the Agilent Corporation, as an Engineering Intern, where he explored the ground-breaking techniques

for high-speed ADC sampling front end. From 2011 to 2013, he was an Intern with the Intel Corporation, where he designed the next-generation receiver architecture intended for LTE Advanced. His current research interests include wide dynamic-range receiver design for software-defined radio application and high-speed ADC design techniques.

Dr. Wu was a recipient of the Best Young Scientist Paper Award of the European Solid-State Circuits Conference in 2013.



Vason P. Srini received the B.E. degree in electrical engineering from the University of Madras, Chennai, Tamilnadu, in 1969, the M.S. degree in electrical engineering from Tennessee Technological University, Cookeville, TN, USA, in 1971, and the Ph.D. degree in computer science from the University of Louisiana, Lafayette, LA, USA, in 1980.

He was with Data Flux Systems Inc., where he was involved in parallel processor chips, integration of radar, camera, ladar, and GPS/IMU for autonomous ground vehicle navigation, autonomous

landing of helicopters in unstructured environments, and wireless communication between vehicles. From 2011 to 2016, he was with the Nokia Research Center, Berkeley, CA, USA, focusing on dynamically reconfigurable transceiver research, 94-GHz radar development, integrated radar and camera, and massively parallel embedded processor. He was a Visiting Professor of computer science with Yonsei University, Seoul, South Korea, where he was involved in smartphone video, graphics, and image preprocessing parallel processors, and also at KAIST/ICU, Daejeon, South Korea, where he was involved in mobile computing and autonomous vehicles. He was one of the team leaders of the Berkeley-Sydney Driving Team for the DARPA Urban challenge and Team Cyberrider for the DARPA Grand challenge. He was one of the leaders of the Aquarius Project at University of California (UC) at Berkeley, Berkeley, CA, USA, and developed single-chip solutions for processing Prolog to support inference in AI applications. He was involved in dynamically reconfigurable computing systems to support missile defense applications. He developed dataflow processors, compilers, and signal processing applications. He was also involved in Cray-XMP and Cray-YMP. He held faculty positions with UC at Berkeley and with University of Alabama at Birmingham, Birmingham, AL, USA. His current research interests include the support for LTE-Advanced FDD/TDD transceivers, autonomous aerial systems, and securitized processors.

Dr. Srini was an Editor of the IEEE TRANSACTIONS ON COMPUTERS.



**Elad Alon** (S'02–M'06–SM'12) received the B.S., M.S., and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, USA, in 2001, 2002, and 2006, respectively.

In 2007, he joined the University of California at Berkeley, Berkeley, CA, USA, where he is currently a Professor of electrical engineering and computer sciences and the Co-Director of the Berkeley Wireless Research Center. He has held founding, consulting, visiting, or advisory positions at Dragonfly Technology, Lion Semiconductor, Wilocity,

Cadence, Xilinx, Oracle, Intel, AMD, Rambus, Hewlett Packard, and IBM Research, where he was involved in digital, analog, and mixed-signal integrated circuits for computing, test and measurement, and high-speed communications. His current research interests include energy-efficient integrated systems, including the circuit, device, communications, and optimization techniques used to design them.

Dr. Alon was a recipient of the IBM Faculty Award in 2008, the 2009 Hellman Family Faculty Fund Award, as well as the 2010 and 2017 UC at Berkeley Electrical Engineering Outstanding Teaching Award. He has co-authored papers that received the 2010 ISSCC Jack Raper Award for Outstanding Technology Directions Paper, the 2011 Symposium on VLSI Circuits Best Student Paper Award, and the 2012, as well as the 2012 and 2013 Custom Integrated Circuits Conference Best Student Paper Award.



**Borivoje** Nikolić (S'93–M'99–SM'05–F'17) received the Dipl.Ing. and M.Sc. degrees in electrical engineering from the University of Belgrade, Belgrade, Serbia, in 1992 and 1994, respectively, and the Ph.D. degree from the University of California at Davis, Davis, CA, USA, in 1999.

In 1999, he joined the Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, Berkeley, CA, USA, where he is currently a National Semiconductor Distinguished Professor of Engineering. He co-

authored *Digital Integrated Circuits: A Design Perspective* (Prentice-Hall, 2003, 2nd ed.). His current research interests include digital, analog, and RF integrated circuit design and very-large-scale integration implementation of communications and signal processing systems.

Dr. Nikolić was a recipient of the NSF CAREER Award in 2003, the College of Engineering Best Doctoral Dissertation Prize, and the Anil K. Jain Prize for the Best Doctoral Dissertation in Electrical and Computer Engineering at the University of California at Davis, in 1999, as well as the City of Belgrade Award for the Best Diploma Thesis in 1992. For work with his students and colleagues, he was the recipient of the Best Paper Awards of the IEEE International Solid-State Circuits Conference, the Symposium on VLSI Circuits, the IEEE International SOI Conference, the European Solid-State Device Research Conference, and the ACM/IEEE International Symposium of Low-Power Electronics. During 2014–2015, he was a Distinguished Lecturer of the IEEE Solid-State Circuits Society.



Ali M. Niknejad (S'93–M'00–SM'10–F'13) received the B.S.E.E. degree from the University of California at Los Angeles, Los Angeles, CA, USA, in 1994, and the master's and Ph.D. degrees in electrical engineering from the University of California (UC) at Berkeley, Berkeley, CA, USA, in 1997 and 2000, respectively.

He is currently a Professor with the EECS Department, UC at Berkeley, and a Faculty Director of the Berkeley Wireless Research Center. He is the Co-Founder of HMicro and RF Pixels and an

inventor of the REACH Technology. His current research interests include wireless and broadband communications and biomedical imaging and sensors, integrated circuit technology (analog, RF, mixed-signal, millimeter wave), device physics and compact modeling, and applied electromagnetics.

Prof. Niknejad and his co-authors were the recipients of the 2017 IEEE Transactions on Circuits and Systems Darlington Best Paper Award, the 2017 Most Frequently Cited Paper Award in 2010–2016 of the Symposium on VLSI Circuits, and the CICC 2015 Best Invited Paper Award. He was a recipient of the 2012 ASEE Frederick Emmons Terman Award for his textbook on electromagnetics and RF integrated circuits. the 2013 Jack Kilby Award for Outstanding Student Paper for his work on an efficient quadrature digital spatial modulator at 60 GHz, the 2010 Jack Kilby Award for Outstanding Student Paper for his work on a 90-GHz pulser with 30 GHz of bandwidth for medical imaging, and the Outstanding Technology Directions Paper of ISSCC 2004 for co-developing a modeling approach for devices up to 65 GHz.