HOSTED BY

Contents lists available at ScienceDirect

# Engineering Science and Technology, an International Journal

journal homepage: www.elsevier.com/locate/jestch



Full Length Article

# Low power domino logic circuits in deep-submicron technology using CMOS



Sandeep Garg\*, Tarun Kumar Gupta

Maulana Azad National Institute of Technology, Near Mata Mandir, Bhopal 462003, India

#### ARTICLE INFO

Article history: Received 26 November 2017 Revised 10 June 2018 Accepted 16 June 2018 Available online 20 June 2018

Keywords: CMOS Dynamic Domino Footer

#### ABSTRACT

Leakage power and propagation delay are the two major challenges in designing CMOS VLSI circuits, in deep sub-micron technology. This paper proposes a novel technique: Foot Driven Stack Transistor Domino Logic (FDSTDL) for designing CMOS domino logic gates for the reduction in leakage power and improved noise performance. Two, four, eight and sixteen input OR gates are designed using existing and proposed techniques. These logic gates are simulated on the PTM 32 nm node using HSPICE (Level = 54) in CMOS technology at a clock frequency of 100 MHz. Simulation results are compared based on power consumption, propagation delay and unity noise gain. Simulation results show that proposed domino technique has a maximum power reduction of 59.47% as compared to the CSK-DL technique and maximum delay reduction of 44.6% as compared to the M-HSCD technique in CMOS technology.

© 2018 Karabuk University. Publishing services by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

# 1. Introduction

Wide fan-in dynamic logic gates are the preferred choice in large memories and high-speed processors due to high speed and smaller area characteristics as compared to static CMOS logic gates [20]. The domino logic achieves high speed due to their lower noise margin compared to the static CMOS logic. Low noise margin also implies an increased sensitivity of the domino logic circuits towards noise source. The noise immunity of domino logic circuits can be increased by downscaling the technology. However, this will increase the power consumption of the circuit. In order to reduce the power consumption, the supply voltage is scaled down. This increases the delay of the circuit. Therefore, to compensate for this delay, threshold voltage scaling is done along with supply voltage scaling. Reduction in threshold voltage increases the speed of the domino logic but decreases the noise immunity of the circuit due to increase in sub-threshold leakage current [1,2]. Scaling of technology reduces the thickness of gate oxide that causes an exponential increase in subthreshold and gate leakage current. This leakage current may discharge the precharge node of the domino circuits. Therefore, leakage currents, noise sources, and low threshold voltage degrade the performance of domino logic circuits at high frequencies [17].

*E-mail address*: sandeepgargvlsi@gmail.com (S. Garg). Peer review under responsibility of Karabuk University. The Average power consumption of a domino logic gate is given by the Equation [4]:

$$P_g = P_{short} + P_{switch} + P_{leakage} \tag{1}$$

where P<sub>short</sub> is the short circuit power consumption due to the shorting of the supply and ground.

Pswitch is the power consumed due to charging and discharging of the load capacitance.

Pleakage is the power consumed due to the gate and sub-threshold leakage current.

To reduce various components of power consumption, device dimensions and supply voltage are scaled down. Scaling of device leads to an increase in the leakage current due to the unwanted Short Channel Effects (SCEs) [5,19,22]. These short channel effects reduce the effective channel length of the device, which in turn, reduces the threshold voltage of the device.

To reduce the power consumption in domino logic circuits, several techniques have been proposed in the previous papers [7,11,13]. All of these techniques are modified form of the basic Footerless Domino Logic (FLDL) [6] and Footed Domino Logic (FDL) [6]. In these techniques, additional P and N channel transistors and delay elements are used to reduce power consumption, propagation delay and to improve the noise immunity of the domino circuits. Section 2 discusses these techniques in detail. Section 4 compares these techniques in terms of the power consumption, propagation delay and unity noise gain. This paper proposes a new domino technique for designing high-speed, large fan-in gates

st Corresponding author.

in deep sub-micron technology. The proposed technique uses stacked transistors for reduction of leakage power and improvement in noise performance. In the proposed technique, the contention between the keeper and the evaluation network decreases that reduces power consumption and propagation delay. For large fan-in gates, the proposed circuit shows improvement in noise performance by more than 1.4× compared to the existing techniques.

This paper discusses existing Domino styles in Section 2. Section 3 describes the proposed logic design using CMOS technology. Section 4 compares the proposed circuit with existing domino styles. Section 5 concludes the paper.

# 2. Domino logic styles

For implementing high-speed and high-performance microprocessors, domino logic is preferred over other dynamic logic styles due to lesser area and low power requirements [23]. Domino logic uses only one PMOS transistor in Pull-Up Network (PUN) [19], thus reducing the area and the power consumption compared to dynamic CMOS logic that uses n (No. of inputs) PMOS transistors in PUN. Noise immunity of domino circuit degrades due to the excessive scaling of technology. In addition, subthreshold leakage and gate leakage currents are major challenges in domino circuit design [14].

First technique proposed for domino logic design was Footerless Domino Logic [6] (FLDL) as shown in Fig. 1. In this technique, when the clock is low during the precharge phase, the precharge transistor (P2) turns ON and dynamic node charges to supply voltage (VDD) through P2. When the clock becomes high in the evaluation phase, the output of the circuit changes according to the inputs applied in Pull-Down Network (PDN). At this time, keeper transistor turns ON and connects the dynamic node to supply. Thus, prevents any undesirable discharge of the dynamic node due to charge sharing problem of Pull-Down Network [8,9]. Therefore, increase in the size of keeper transistor improves the robustness of FLDL logic. The keeper ratio [7,13] is given by:

$$K = \frac{W_{\text{keep}}}{W_{\text{and}}} \tag{2}$$

where  $W_{\rm keep}$  is the width of keeper transistor and  $W_{\rm eval}$  is the width of evaluation transistors. Therefore, on increasing K, the robustness of the domino circuit increases with the increase in power consumption and propagation delay.

Fig. 2 shows the variation of power consumption in FLDL domino circuit with the increase in the size of the keeper. As the size of



Fig. 1. Footerless Domino Logic (FLDL).



Fig. 2. Effect of keeper sizing on power consumption of FLDL domino logic circuit.

keeper increase, power consumption increases due to increased contention between the keeper and evaluation logic.

The major drawback of FLDL technique is that when all inputs are low during the evaluation phase, a leakage current flow through Pull-Down Network (PDN) due to subthreshold and gate tunnelling current. In Footed Domino Logic (FDL) [6,15] technique, this leakage current is reduced by inserting a footer transistor N1 in series with evaluation network as shown in the Fig. 3. The drawback of FDL technique is that footer transistor introduces a delay in the circuit that reduces the speed of the circuit. The Robustness of FDL decreases for high Fan-in gates [3].

To reduce the delay, current mirror transistors N2 and N3 are inserted in the FDL logic shown in Fig. 4. These transistors reduce delay but increase discharging current in the circuit. In order to stop discharging of the dynamic node, transistor N4 provides a feedback path from the gate of the current mirror to the output of circuit as shown in Fig. 4. When dynamic node discharges to the ground due to the presence of noise at the input, transistor N4 connects the gate of mirror transistors to ground. In evaluation mode, when the clock is high and all the inputs are low, the stacked transistors N1, N2 and transistors in evaluation logic decreases subthreshold current. In this way, the stacked transistor N2 improves noise immunity of the circuit. This technique is termed as Current Mirror Footed Domino (CMFD) [10] logic.



Fig. 3. Footed Domino Logic (FDL).



Fig. 4. Current Mirror Footed Domino Logic (CMFDL).

In High-Speed Clock Delay (HSCD) [7,11] circuit shown in Fig. 5, transistor N1 is ON at the beginning of precharge mode. Therefore, node N connects to ground through N1. In addition, node G<sub>N</sub> is at a low voltage that turns off N2. Transistor N1 turns off after a delay equal to delay of two inverters. Therefore, voltage at node N increases. The sizing of transistors in evaluation logic is done in such a way that voltage at node G<sub>N</sub> remains low compared to the V<sub>th</sub> of N2. Therefore, N2 remains off in precharge mode. In evaluation mode, N1 is off due to which voltage at node N is increased. This voltage at node N biases the transistor in evaluation logic, which decreases leakage current in evaluation logic thus reducing the leakage power consumption. The speed of the logic increases on increasing the size of N1, N2 or evaluation transistors. The voltage at node N decreases with increase in size of N1 and increases with increase in size of evaluation transistors. Fig. 6 shows the effect of variation of evaluation transistor width on delay of the

In HSCD technique, an AND gate G and an NMOS transistor are added to increase the speed of evaluation logic. This circuit is modified form of HSCD technique [7]. Therefore, it is termed as



Fig. 5. High-speed Clocked delay Domino Logic (HSCD).



Fig. 6. Effect of Evaluation transistor width on delay.

Modified HSCD (M-HSCD) shown in Fig. 7. In M-HSCD technique, dynamic node discharges when one or more inputs becomes high during the evaluation mode.

At this time, the input A to gate G goes high. Input B to the gate G also becomes high after two inverter delays. Therefore, the output of G becomes high turning ON the transistor MD. In this way, the evaluation speed increases. Except for this case, the AND gate output remains zero. A major problem in the circuit is that in precharge phase, the gate of N2 is in high impedance state that causes additional power consumption [12].

In the circuit shown in Fig. 8, the problem of floating gate in M-HSCD technique has been resolved. In this circuit, two NMOS transistors N2 and N3 are stacked between dynamic node and ground. These two transistors turn ON according to the ON-OFF condition of footer transistor N1. Here, the voltage at node N depends on sizing of footer transistor and evaluation transistors. Voltage at node N decreases with increase in size of N1 and increases with increase in size of evaluation transistors. The voltage at node N is kept at a minimum value to reduction of power consumption. This technique is termed as Conditional Evaluation Domino Logic (CEDL) [7].

Fig. 9 shows a domino logic technique in which voltage at node N is given as feedback to transistors N2 and N3 for discharging dynamic node. Here, transistor N2 reduces the voltage at node Q to  $V_{DD}$ – $V_{th}$  that causes an increase in current through the keeper transistor. Therefore, the robustness of circuit improves at a cost of degraded noise performance. In the circuit, when the clock goes high, the voltage at node N turns ON the transistor N3 that discharges the dynamic node. This technique is referred as Conditional Stacked Keeper Domino Logic (CSK-DL) [7] as shown in Fig. 9.

Table 1 shows the differences in various domino logic styles compared to standard footerless domino logic (FLDL) [6] style.

#### 3. Proposed domino logic

The literature review of Section 2 discusses various techniques for designing domino logic circuits. This section proposes a new technique for designing domino logic. This technique is termed as Foot Driven Stack Transistor Domino Logic (FDSTDL) shown in Fig. 10.

The circuit has two sections, input section has a PMOS precharge transistor P1, evaluation network consisting of NMOS transistors in parallel and a footer transistor N1 whereas output section comprises of keeper transistor P2, static inverter and stacked NMOS transistors N2 and N3. The inputs to the circuit are applied through gate of the NMOS transistors in evaluation network. Transistor N2 is driven by the voltage at the foot N of evaluation network whereas transistor N3 is driven by the output voltage. Transistors N2 and N3 are used in a stack configuration. Whenever there is a voltage drop across N1 due to noise pulses, transistor N3



Fig. 7. Modified High-speed Clocked delay Domino Logic (M-HSCD).



Fig. 8. Conditional Evaluated Domino Logic (CEDL).

provides stacking effect by making the gate to source voltage of N2 smaller. This will reduce the leakage power of N2 and makes N1 conduct less.

The circuit operates in two phases:

i) Precharge phase: During the precharge phase, the clock is low. Therefore, transistor P1 turns ON. The dynamic node

charges to  $V_{DD}$  and output of the circuit goes low due to the presence of inverter. This output drives the keeper transistor turning it ON and maintains the dynamic node to high voltage. Thus, preventing any unwanted discharge of dynamic node due to noise. At this time, transistor N1 turns OFF, as the clock is low. Thus, preventing any discharge of dynamic node. At this time, inputs are applied



Fig. 9. Conditional Stacked Keeper Domino Logic (CSK-DL).

**Table 1**Differences in various domino logic styles compared to standard FLDL.

| S. No. | Domino logic Style | Comparison with standard Footerless Domino Logic (FLDL)                                                                                                                                                                         |
|--------|--------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1      | FDL                | A footer transistor N1 is added in the basic FLDL circuit to reduce leakage power by stacking effect.                                                                                                                           |
| 2      | CMFD               | Current Mirror transistors N2 and N3 are added in FDL to reduce delay and N4 is added to reduce the effect of noise.                                                                                                            |
| 3      | HSCD               | Transistors N2, N3, and P3 are added in FDL circuit to maintain a dc bias voltage at foot node N to reduce the leakage current of evaluation network. Clock to the transistor N1 is delayed by the two-inverter delay elements. |
| 4      | M-HSCD             | It is modified form of HSCD. An AND gate G and an N-channel transistor M0 is added in HSCD to increase the speed of evaluation logic.                                                                                           |
| 5      | CEDL               | Stacked transistors N2 and N3 are added in basic footed domino logic to maintain the voltage at foot node N in order to reduce leakage current. Node N drives N2 while delayed clock pulses drive M3.                           |
| 6      | CSK-DL             | Transistors N2, N3, N4, P2, and P3 are added in FDL. Transistors N2 and N3 help in discharging the dynamic node during the evaluation phase.                                                                                    |

but since N1 is OFF, therefore these inputs have no effect on output.

Now, if any of the inputs to PDN is high, then the voltage of node N will be nearly equal to the voltage at the dynamic node because N1 is OFF in precharge mode. At this time, N2 turns ON while N3 remains OFF due to low voltage at the output. Therefore, leakage power consumption of circuit reduces and noise performance improves.

ii) Evaluation phase: During the Evaluation phase, the clock is high. Therefore, transistor P1 turns off and transistor N1 turns ON. At this time, inputs are applied in the evaluation logic. If any of the inputs in the evaluation logic goes high during this phase, the corresponding N channel MOSFET turns on connecting the dynamic node to ground. Therefore, dynamic node discharges to low voltage and output of the circuit goes high which is in accordance with the logic of OR gate.

Here transistors N2 and N3 switch ON and OFF alternately reducing the leakage current in the circuit. Here transistor N3 works as stack transistor. A delay is there between discharging of the dynamic node and charging of the output node due to the presence of inverter between dynamic node and output.

#### 3.1. Transistor sizing

In this paper, CMFD, CEDL, HSCD, M-HSCD, CSK-DL and proposed technique are implemented in CMOS technology and simulated in H-spice PTM 32 nm [24] node for comparing the results. To reduce the propagation delay of the proposed circuit, the width of



Fig. 10. Proposed Foot Driven Stack Transistor Domino Logic (FDSTDL) (OR gate).

transistor N1 is (footer) taken as 4  $L_{min}$  whereas, for existing circuits, it is taken as 2  $L_{min}$ . Here,  $L_{min}$  is the length of the channel, which is taken as 32 nm. The width of N-channel transistors in evaluation block (connected in parallel) is taken as 2  $L_{min}$ . The ratio of the width of PMOS and NMOS transistors in the inverter ( $W_p/W_n$ ) is set to 2. The width of PMOS and NMOS transistors in the inverter is set to 4  $L_{min}$  and 2  $L_{min}$  respectively. The width of keeper transistor (P2) and the precharge transistor (P1) is set to 4  $L_{min}$  and 2  $L_{min}$  respectively. The width of transistor N2 is 6  $L_{min}$  and width of transistor N3 is 2  $L_{min}$ .

The output waveforms of a 2 input OR gate implemented using the proposed technique as shown in Fig. 11. The simulation is done on H-Spice at PTM 32 nm [24] technology with a supply voltage of 0.9 V at a temperature of 25 °C. The output load capacitance is set to 1fF. Clock frequency is set at 100 MHz (T = 10 ns). Here in\_1 and in\_2 are the inputs to the OR gate. Dyn\_node is voltage at the dynamic node of the proposed circuit. It is shown in the graph that

the voltage at dynamic node is opposite to the voltage at output. When the clock is low, the precharge transistor charges the dynamic node. Therefore, dynamic node goes high and output goes low due to presence of inverter. At this time, the inputs have no effect on output. As clock goes high, output is changed in accordance with applied inputs. In Fig. 11, P shows the precharge phase and E shows the evaluation phase of the clock. During Evaluation phase, when any one of the input goes high, the dynamic node starts discharging. When dynamic node discharges fully, then the output of circuit goes high after some delay due to the presence of inverter. Fig. 12 shows the output waveforms of the proposed circuit at a clock frequency of 1 GHz. Fig. 12 shows that the circuit works well at higher frequencies and output of circuit is in accordance with the OR gate logic. It is shown that as frequency increases, the dynamic power consumption of circuit increases.

#### 3.2. AND gate using proposed logic

In Fig. 13, a 2 input AND gate is designed using the proposed FDSTDL technique. In AND gate, the evaluation logic transistors N4 and N5 are connected in series. During the precharge phase, the output of circuit remains high. During the evaluation phase, when both inputs (in\_1 and in\_2) becomes high, transistors N4 and N5 turn ON and the dynamic node gets discharged. If any of the inputs is low during the evaluation phase, the dynamic node remains high and output remains low. The output waveforms of 2 input AND gate for different combinations of inputs is shown in Fig. 14.

#### 3.3. XOR gate using proposed logic

Fig. 15 shows the schematic of a 2 input XOR gate designed using proposed FDSTDL technique. For XOR operation, the N4-N5 series combination is connected in parallel with the N7-N8 series combination. Two inverters are added in the circuit for obtaining complement of the inputs A and B.

The output of a 2-input XOR gate is represented by the equation:



Fig. 11. Transient waveform of 2 input OR gate designed using proposed (FDSTDL) logic at f = 100 MHz (T = 10 ns).



Fig. 12. Transient waveform of 2 input OR gate designed using proposed (FDSTDL) logic at f = 1 GHz (T = 1 ns).



Fig. 13. 2-input AND gate using proposed foot driven stack transistor domino logic.

$$Y = \overline{AB} + A\bar{B}$$

Fig. 16 shows the output waveforms of 2 input XOR gate designed using the proposed technique. The output waveforms in the figure show that proposed logic is suitable for implementing XOR gate.

# 3.4. Small signal analysis

Fig. 17(a) and (b) shows the small signal equivalent circuit model of PMOS and NMOS transistors respectively. The model is taken from BSIM 4.0 UC Berkeley library [25]. The different parameters used in this model are:

 $V_{gs}$  = voltage between gate and source  $V_{bs}$  = voltage between bulk and source  $i_d$  = drain current

$$\left.g_{\text{m}} = \frac{\Delta i_{\text{d}}}{\Delta v_{\text{gs}}}\right|_{v_{\text{ds}}} = \text{transconductance due to change in } V_{\text{gs}}$$

$$g_{\it mb} = \frac{\Delta i_d}{\Delta v_{\it bs}} \bigg|_{v_{\it ds},v_{\it gs}} = transconductance \, due \, to \, change \, in \, V_{\it bs}$$

$$C_{gs} = \frac{2}{3}WLC_{ox} + C_{ov} = gate\text{-source capacitance} \label{eq:cgs}$$

 $C_{\rm gb}$  = bulk-gate capacitance

C<sub>gd</sub> = gate-drain capacitance

 $C_{sb}$  = source-bulk capacitance

C<sub>db</sub> = drain-bulk capacitance

 $1/r_o$  = output conductance

For obtaining frequency response of the proposed circuit, AC analysis of the 2 input OR gate is done by varying frequency from 1 Hz to 5 GHz. One of the inputs in OR gate is given 0.9 V (high) (=-0.91 dB) while another input is kept low. The magnitude and phase plot of output is shown in Fig. 18. It is shown in the figure that the proposed circuit has a constant gain of -100 dB over a large 3-dB bandwidth of 1.0037 GHz.

#### 4. Results and discussion

Proposed FDSTDL technique and existing Domino techniques discussed in Section 2 are simulated in 32 nm Predictive Technology Model (PTM) node using H-spice. Two, four, eight and sixteen input OR gates are implemented using existing and proposed techniques and simulated in H-spice. Various metrics have been used to measure the robustness and noise immunity of domino logic gates. The metrics used in this paper are -



Fig. 14. Output waveforms of 2 input AND gate using Proposed Logic.



 $\textbf{Fig. 15.} \ \ 2 \ input \ XOR \ gate \ using \ proposed \ Logic.$ 



Fig. 16. Output waveforms of 2 input XOR gate using proposed technique.



Fig. 17. (a) Small signal equivalent circuit model of PMOS (b) Small signal equivalent circuit model of NMOS used in this paper.



Fig. 18. Frequency response of the proposed 2 input OR gate.

#### i) Average power

Average Power ( $P_{av}$ ) is calculated by transient analysis of the circuit. In this paper, transient analysis is done for a time span of 100 ns. The frequency of the clock is set to 100 MHz for transient analysis. Pulses of 0.9 V amplitude are applied to all the inputs of the evaluation logic.

# ii) Propagation delay

Propagation delay  $(t_p)$  of the circuit is the time taken by signal to propagate from input to output. Delay determines the speed of the circuit. When circuit enters the evaluation phase, the delay is measured. At that time, the inputs are applied to the circuit and output is measured. For calculating the delay, any one of the input signals is superimposed over the output signal. It is calculated as [12,21]:

$$t_p = \frac{t_{pHH} + t_{pLL}}{2} \tag{3}$$

where  $t_{pHH}$  is the delay between 50% level of rising input and output and  $t_{pLL}$  is the delay between 50% level of falling input and output.

#### iii) Unity noise gain

Unity Noise Gain (UNG) determines the performance of circuit under noise. For calculating UNG, a small portion of the supply voltage is applied as a pulse of very short duration to the input terminals of evaluation logic. The amplitude of input pulse is increased until the amplitude of output pulse becomes equal to input noise pulse while keeping the width of input pulse constant. This amplitude of the input pulse is the required Unity Noise Gain (UNG). In this paper, a noise pulse of width 50 ps is applied at the input and its amplitude is varied to obtain the required UNG [4,6,7,16]. UNG is calculated as:

$$UNG = (V_{in(noise)} : V_{in}(noise) = V_{out}) \eqno(4)$$

#### iv) Power delay product

Power Delay Product (PDP) shows energy consumed in a switching operation. It is given by [18]:

$$PDP = P_{av}.t_{p} \tag{5}$$

#### v) Standby power

Standby power is the power consumed by the circuit when all the inputs to the circuit are set to zero.

#### vi) Energy delay product

Energy Delay Product (EDP) shows the energy efficiency of the circuit in performing the logical operation. EDP is given by:

$$EDP = PDP.t_{p} \tag{6}$$

vii) Figure of merit

The Figure of Merit (FOM) is a quantity, which measures the efficiency and effectiveness of the device. It can be calculated as:

$$FOM = \frac{UNG_{Norm}}{P_{diss\_Norm}.t_{p\_Norm}.A_{Norm}}. \tag{7}$$

where UNG<sub>Norm</sub>, P<sub>diss\_Norm</sub>, t<sub>p\_Norm</sub>, and A<sub>Norm</sub> are normalized values of UNG, average power, propagation delay and area respectively.

Table 2 shows that proposed (FDSTDL) technique has lower power consumption and propagation delay as compared to existing techniques. As shown, for a 2 input OR gate, the maximum reduction in power is 59.43% as compared to CSK-DL technique and maximum reduction in propagation delay is 36.23% as compared to M-HSCD technique. The proposed technique has lesser No. of transistors as compared to other existing techniques. Therefore, the power consumption reduces in the proposed technique. In addition, the transistors N2 and N3 provide stacking effect. In stacking effect, transistors are stacked (series connected). When any of the transistors in stack turns off, subthreshold leakage current will reduce through the stack of transistors. The table shows that proposed circuit has lower power-delay product and energy-delay product compared to existing techniques due to lower power consumption and less propagation delay.

Table 3 shows that for a 4 input OR gate, the maximum reduction in power is 59.47% compared to CSK-DL technique and maximum reduction in propagation delay is 31.96% compared to M-HSCD technique.

Table 4 shows that for an 8 input OR gate, the maximum reduction in power is 57.22% as compared to CSK-DL technique and

**Table 2**Comparison of various domino topologies based on Power, Delay, PDP and EDP for a 2 input OR gate.

| Topology          | Average power (μW) | Propagation delay (ps) | Power-delay product (PDP) | Energy – delay product (EDP) ( $\times 10^{-27}$ ) $J^2$ |
|-------------------|--------------------|------------------------|---------------------------|----------------------------------------------------------|
| CMFD              | 0.7468             | 43.99                  | 32.85                     | 1.45                                                     |
| HSCD              | 0.7830             | 38.3                   | 29.98                     | 1.15                                                     |
| M-HSCD            | 1.19               | 51.52                  | 61.30                     | 3.16                                                     |
| CEDL              | 0.8490             | 38.77                  | 32.91                     | 1.28                                                     |
| CSK-DL            | 1.48               | 39.87                  | 59.00                     | 2.35                                                     |
| Proposed (FDSTDL) | 0.6003             | 32.85                  | 19.71                     | 0.65                                                     |

**Table 3**Comparison of various domino topologies based on power, delay, PDP and EDP for a 4 input OR gate.

| Topology          | Average power $(\mu W)$ | Propagation delay (ps) | Power-delay product (PDP) | Energy – delay product (EDP) ( $\times 10^{-27}$ ) $J^2$ |
|-------------------|-------------------------|------------------------|---------------------------|----------------------------------------------------------|
| CMFD              | 0.7587                  | 46.11                  | 34.98                     | 1.61                                                     |
| HSCD              | 0.7964                  | 37.14                  | 29.57                     | 1.1                                                      |
| M-HSCD            | 1.21                    | 49.21                  | 59.54                     | 2.93                                                     |
| CEDL              | 0.868                   | 40.24                  | 34.92                     | 1.41                                                     |
| CSK-DL            | 1.523                   | 40.57                  | 61.78                     | 2.51                                                     |
| Proposed (FDSTDL) | 0.6172                  | 33.48                  | 20.66                     | 0.69                                                     |

**Table 4**Comparison of various domino topologies based on power, delay, PDP and EDP for 8 input OR gate.

| Topology          | Average power $(\mu W)$ | Propagation delay (ps) | Power-delay product (PDP) | Energy – delay product (EDP) ( $\times 10^{-27}$ ) $J^2$ |
|-------------------|-------------------------|------------------------|---------------------------|----------------------------------------------------------|
| CMFD              | 0.7932                  | 50.97                  | 40.42                     | 2.06                                                     |
| HSCD              | 0.8416                  | 38.42                  | 32.33                     | 1.24                                                     |
| M-HSCD            | 1.24                    | 49.39                  | 61.24                     | 3.02                                                     |
| CEDL              | 0.8920                  | 40.26                  | 35.91                     | 1.45                                                     |
| CSK-DL            | 1.606                   | 41.35                  | 66.40                     | 2.75                                                     |
| Proposed (FDSTDL) | 0.6869                  | 34.75                  | 23.86                     | 0.83                                                     |

**Table 5**Comparison of various domino topologies based on power, delay, PDP and EDP for 16 input OR gate.

| Topology          | Average power (µW) | Propagation delay (ps) | Power-delay product (PDP) | Energy – delay product (EDP) ( $\times 10^{-27}$ ) $J^2$ |
|-------------------|--------------------|------------------------|---------------------------|----------------------------------------------------------|
| CMFD              | 1.141              | 62.47                  | 71.27                     | 4.45                                                     |
| HSCD              | 1.21               | 42.57                  | 51.50                     | 2.19                                                     |
| M-HSCD            | 1.45               | 51.74                  | 75.02                     | 3.88                                                     |
| CEDL              | 1.04               | 45.27                  | 47.08                     | 2.13                                                     |
| CSK-DL            | 1.857              | 46.80                  | 86.90                     | 4.07                                                     |
| Proposed (FDSTDL) | 0.8550             | 40.94                  | 35.00                     | 1.43                                                     |

maximum reduction in propagation delay is 29.64% as compared to M-HSCD technique.

From the results shown in Table 5, it is shown that for a 16 input OR gate, the maximum reduction in power is 53.95% compared to CSK-DL technique and maximum reduction in propagation delay is 20.87% compared to M-HSCD technique. Tables 1–4 shows that the delay of the circuits is in ps while the period of the clock is in ns. Therefore, the impact of delay is very less in the circuit. In addition, above tables show that PDP and EDP are least for proposed technique in all the OR gates. This shows that proposed technique is effective in reducing power and is energy efficient.

Fig. 19 compares different existing domino logic topologies with the proposed technique based on normalized leakage power consumption. The comparison is done for 2, 4, 8 and 16 input OR gates. Fig. 19 shows that proposed technique is effective in reducing leakage power consumption.

Fig. 20 compares the normalized values of standby power consumption for different domino logic circuits. For calculating standby power calculation, all inputs to evaluation logic are set to zero. For simulation of the proposed and existing circuits, clock pulses of 0.9 V amplitude and 100 MHz frequency are applied. In Table 6, different domino techniques are compared based on standby power consumption. The table shows that proposed tech-

nique (FDSTDL) has least stand-by power consumption due to a minimum No. of transistors.

Table 7 compares the Unity Noise Gain (UNG) of different existing domino logic circuits with the proposed domino logic. A higher value of UNG shows better noise immunity. For calculating UNG, noise pulse of varying amplitude and constant width is applied at the input. The result shows that proposed technique has higher noise immunity as compared to existing domino techniques.

Fig. 21 compares different domino techniques for 2, 4, 8 and 16 inputs OR gates based on normalized values of UNG. The proposed circuit shows an improvement in UNG from  $1.07\times$  to  $1.45\times$  compared to various existing techniques.

The Figure of Merit (FOM) of various domino logic techniques is calculated by equation (7) using normalized values of UNG, area, propagation delay and power dissipation. Table 8 shows the FOM and area of 2, 4, 8 and 16 input OR gate for different domino techniques. It is shown in the table that proposed circuit has higher FOM compared to existing circuits. FOM shows the overall performance of circuit in terms of noise, power and propagation delay. The area of the proposed circuit is higher than CMFD, HSCD and CEDL technique due to the large size of footer transistor in proposed domino logic. Fig. 22 shows the comparison of existing and proposed domino techniques based on normalized FOM. It is shown in the figure that proposed technique has 5.56× higher



Fig. 19. Comparison of Normalized leakage power consumption of various domino techniques for 2, 4, 8 and 16 input OR gates. Normalization is done with respect to CSK-DL technique.



Fig. 20. Comparison of the Normalized stand-by power of various domino techniques for 2, 4, 8 and 16 input OR gates. Normalization is done with respect to CSK-DL technique.

**Table 6**Comparison of various domino topologies based on Standby power for 2, 4, 8 and 16 input OR gate.

| Standby power consumption $(\mu W)$ |        |        |        |        |
|-------------------------------------|--------|--------|--------|--------|
| Topology                            | OR2    | OR4    | OR8    | OR16   |
| CMFD                                | 0.0748 | 0.0787 | 0.0766 | 0.0813 |
| HSCD                                | 0.0692 | 0.0911 | 0.0801 | 0.1116 |
| M-HSCD                              | 0.0898 | 0.1090 | 0.0971 | 0.1315 |
| CEDL                                | 0.0755 | 0.0955 | 0.0823 | 0.1320 |
| CSK-DL                              | 0.1091 | 0.1264 | 0.1154 | 0.1501 |
| Proposed (FDSTDL)                   | 0.0206 | 0.039  | 0.0277 | 0.0619 |

**Table 7** Comparison of various domino topologies based on UNG for 2, 4, 8 and 16 input OR gates.

| Topology          | OR2   | OR4   | OR8   | OR16  |
|-------------------|-------|-------|-------|-------|
| CMFD              | 0.711 | 0.615 | 0.602 | 0.558 |
| HSCD              | 0.598 | 0.546 | 0.529 | 0.503 |
| M-HSCD            | 0.649 | 0.562 | 0.546 | 0.516 |
| CEDL              | 0.612 | 0.531 | 0.512 | 0.484 |
| CSK-DL            | 0.542 | 0.477 | 0.462 | 0.429 |
| Proposed (FDSTDL) | 0.762 | 0.671 | 0.655 | 0.620 |

FOM than other techniques for 2 input OR gate and  $4.18\times$  higher FOM for 16 input OR gate.

Table 9 shows the simulation results for a 2-input AND gate using existing and proposed techniques. It is shown in the table that leakage power consumption is less in case of AND gates

compared to OR gates because transistors in the evaluation logic of AND gate are connected in series. Thus, reduces leakage current between dynamic node and ground. In addition, the delay of AND gate is higher than the delay of OR gate due to the stacking of transistors N4-N5 in evaluation logic. It is shown in the table that proposed 2-input AND gate has lower power consumption, propagation delay, EDP and PDP compared to existing techniques. In addition, the proposed AND gate has higher UNG and FOM compared to existing techniques for a 2-input AND gate.

Table 10 shows the simulation results for a 2-input XOR gate using existing and proposed techniques. The power of the proposed XOR gate is higher than the proposed OR gate due to additional inverters and transistors used in XOR gate. In addition, the additional inverters increase the overall delay of the circuit. It is shown in the table that proposed XOR gate has lower power consumption, propagation delay, EDP and PDP compared to existing techniques. In addition, the proposed XOR gate has higher UNG and FOM compared to existing techniques for a 2-input XOR gate.

### 4.1. Monte carlo analysis

In order to verify the reliability and performance, the proposed circuit is tested against variation of oxide thickness (process), temperature and voltage. The experimental results for process variation are obtained by performing Monte-Carlo analysis of the proposed circuit. Table 11 shows the mean ( $\mu$ ), standard deviation ( $\sigma$ ) and variability ( $\sigma/\mu$ ) of the 8-input proposed domino circuit obtained using Monte-Carlo simulation. For this purpose, ten thousand random samples (values) of oxide thickness ( $t_{\sigma x}$ ) is chosen in



Fig. 21. Comparison of Normalized UNG of various domino techniques for 2, 4, 8 and 16 input OR gates. Normalization is done with respect to CSK-DL technique.

 Table 8

 Comparison of various domino topologies based on Area and FOM for 2, 4, 8 and 16 input OR gates.

| Topology          | OR2                                          |      |                       | OR4                                          |      | OR8                   |                                              | OR16 |                       |                                              |      |                       |
|-------------------|----------------------------------------------|------|-----------------------|----------------------------------------------|------|-----------------------|----------------------------------------------|------|-----------------------|----------------------------------------------|------|-----------------------|
|                   | Area<br>(×10 <sup>-15</sup> ) m <sup>2</sup> | FOM  | No. of<br>Transistors | Area<br>(×10 <sup>-15</sup> ) m <sup>2</sup> | FOM  | No. of<br>Transistors | Area<br>(×10 <sup>-15</sup> ) m <sup>2</sup> | FOM  | No. of<br>Transistors | Area<br>(×10 <sup>-15</sup> ) m <sup>2</sup> | FOM  | No. of<br>Transistors |
| CMFD              | 10.72                                        | 4.38 | 10                    | 12.77                                        | 3.92 | 12                    | 16.86                                        | 3.31 | 16                    | 25.06                                        | 2.17 | 20                    |
| HSCD              | 13.79                                        | 3.14 | 14                    | 15.84                                        | 3.32 | 16                    | 19.94                                        | 3.08 | 20                    | 28.13                                        | 2.41 | 24                    |
| M-HSCD            | 20.96                                        | 1.10 | 21                    | 23.01                                        | 1.17 | 23                    | 27.10                                        | 1.23 | 27                    | 35.30                                        | 1.35 | 31                    |
| CEDL              | 14.82                                        | 2.72 | 15                    | 16.86                                        | 2.57 | 17                    | 20.96                                        | 2.55 | 21                    | 29.15                                        | 2.45 | 25                    |
| CSK-DL            | 19.94                                        | 1.00 | 20                    | 21.98                                        | 1.00 | 22                    | 26.08                                        | 1.00 | 26                    | 34.27                                        | 1.00 | 30                    |
| Proposed (FDSTDL) | 15.07                                        | 5.56 | 9                     | 17.12                                        | 5.40 | 11                    | 21.22                                        | 4.85 | 15                    | 29.41                                        | 4.18 | 19                    |



Fig. 22. Comparison of normalized FOM of various domino techniques for 2, 4, 8 and 16 input OR gates. Normalization is done with respect to CSK-DL technique.

 Table 9

 Comparison of various domino topologies based on average power, propagation delay, PDP, EDP, UNG and FOM for a 2 input AND gate.

| Topology          | Average power (µW) | Propagation delay (ps) | Power-delay product (PDP) | Energy – delay product (EDP) ( $\times 10^{-27}$ ) $J^2$ | UNG   | FOM  |
|-------------------|--------------------|------------------------|---------------------------|----------------------------------------------------------|-------|------|
| CMFD              | 0.6302             | 104.21                 | 65.67                     | 6.84                                                     | 0.624 | 4.23 |
| HSCD              | 0.6641             | 93.62                  | 62.17                     | 5.82                                                     | 0.511 | 2.91 |
| M-HSCD            | 1.0346             | 119.54                 | 123.68                    | 14.78                                                    | 0.557 | 1.05 |
| CEDL              | 0.7168             | 89.76                  | 64.34                     | 5.78                                                     | 0.538 | 2.78 |
| CSK-DL            | 1.2625             | 94.02                  | 118.70                    | 11.16                                                    | 0.482 | 1    |
| Proposed (FDSTDL) | 0.5152             | 79.11                  | 40.76                     | 3.22                                                     | 0.671 | 5.39 |

**Table 10**Comparison of various domino topologies based on average power, propagation delay, PDP, EDP, UNG and FOM for a 2 input XOR gate.

| Topology          | Average power (μW) | Propagation delay (ps) | Power-delay product (PDP) | Energy – delay product (EDP) ( $\times 10^{-27}$ ) $J^2$ | UNG   | FOM  |
|-------------------|--------------------|------------------------|---------------------------|----------------------------------------------------------|-------|------|
| CMFD              | 0.9455             | 134.21                 | 126.90                    | 17.03                                                    | 0.577 | 3.22 |
| HSCD              | 1.0426             | 114.43                 | 119.30                    | 13.65                                                    | 0.468 | 2.39 |
| M-HSCD            | 1.4244             | 145.35                 | 207.04                    | 30.09                                                    | 0.503 | 1.11 |
| CEDL              | 1.1163             | 117.11                 | 130.73                    | 15.31                                                    | 0.491 | 2.18 |
| CSK-DL            | 1.7685             | 121.04                 | 214.06                    | 25.91                                                    | 0.449 | 1    |
| Proposed (FDSTDL) | 0.7966             | 91.11                  | 72.58                     | 6.61                                                     | 0.614 | 4.87 |

**Table 11**Results for Monte-Carlo analysis of the proposed circuit.

| Parameters                                   | Proposed (FDSTDL) |
|----------------------------------------------|-------------------|
| Mean ( $\mu$ ) (in $\mu$ W)                  | 8.25              |
| Standard deviation ( $\sigma$ ) (in $\mu$ W) | 0.178             |
| Variability ( $\sigma$ / $\mu$ )             | 0.021             |

the range  $\pm$  10% of nominal value of different transistors in the circuit. For a variation of 10% in  $t_{ox}$  from its actual value, the proposed circuit shows lower variability and standard deviation. This implies that the proposed FDSTDL is reliable or robust in its class.

#### 5. Conclusion

This paper proposes a novel approach for designing domino logic circuits with reduced leakage power consumption and improved noise performance characteristics by stacking of NMOS transistors. The proposed design is compared with the different existing domino techniques on the basis of power, delay, UNG, and FOM. The proposed circuit has a maximum power reduction of 59.47% compared with CSK-DL technique and maximum delay reduction of 44.6% compared with M-HSCD technique in CMOS technology. Noise immunity is compared based on UNG. In the Proposed circuit, the maximum improvement in noise immunity is 1.45× as compared to CSK-DL technique and minimum improvement is  $1.07 \times$  as compared to CMFD technique. The FOM of the proposed technique is 4.18× higher than other existing circuits for 16 input OR gate. Based on the comparison of the proposed domino technique with the existing techniques, we can observe that the proposed technique is more efficient in terms of power consumption, propagation delay, and noise immunity.

#### References

- M. Anis, S. Areibi, M. Elmasry, Design and optimization of multi-threshold CMOS (MTCMOS) circuits, IEEE Trans. Comput. Aided Design Integr. Circuits Syst. 22 (10) (2003) 1324–1342.
- [2] K. Roy, S. Mukhopadhyay, H. Mahmoodi, Leakage current mechanisms and leakage reduction techniques in deep-submicron CMOS circuits, Proc. IEEE 91 (2) (2003) 305–327.
- [3] T.K. Gupta, A.K. Pandey, O.P. Meena, Analysis and design of lector-based dual-Vt domino logic with reduced leakage current, Circuit World 43 (3) (2017) 97–104.
- [4] A. Peiravi, M. Asyaei, Robust low leakage controlled keeper by current-comparison domino for wide fan-in gates Integration, VLSI J. 45 (1) (2012) 22–32.

- [5] S.M. Sharroush, Y.S. Abdalla, A.A. Dessouki, Impact of technology scaling on the performance of domino CMOS logic, in: International Conference on Electronic Design 2008, Dec. 1 –3.
- [6] H. Mahmoodi, K. Roy Diode, Footed domino: a leakage tolerant high fan-in dynamic circuit design style, IEEE Trans. Circ. Syst. -I 51 (3) (2004) 495–503.
- [7] F. Moradi, T.V. Cao, E.I. Vatajelu, A. Peiravi, H. Mahmoodi, D.T. Wisland, Domino logic design for high performance and leakage-tolerant applications, Integr. VLSI J. 46 (3) (2013) 247–254.
- [8] C.H. Cheng, S.C. Chang, J.S. Wang, W.B. Jone, Charge sharing fault detection for CMOS domino logic circuits, International Symposium on Defect and Fault Tolerance in VLSI Systems, 1999, Nov. 1–3.
- [9] T.K. Gupta, K. Khare, Lector with footed-diode inverter: a technique for leakage reduction in domino circuits, Circuits Syst. Sig. Process. 32 (6) (2013) 2707– 2722
- [10] F. Moradi, A. Peiravi, H. Mahmoodi, A new leakage-tolerant design for high fanin domino gates, Proceedings of 16th International Conference on Microelectronics, 2004, Tunisia, Dec. 6–8.
- [11] F. Moradi, H. Mahmoodi, A. Peiravi, A high speed and leakage-tolerant domino logic for high fan-in gates, Proceedings of the 15th ACM Great Lakes Symposium on VLSI (GLSVLSI), 2005, Chicago, IUSA, Apr. 17–19.
- [12] A. Dadoria, K. Khare, T.K. Gupta, R.P. Singh, Ultra-low power FinFET- based domino circuits, Int. J. Electron. 104 (6) (2017) 952–967.
- [13] A. Dadoria, K. Khare, T.K. Gupta, R.P. Singh, A novel high-performance leakage-tolerant Wide Fan-In Domino Logic Circuit in deep Sub-micron technology, Circuit Syst. 6 (4) (2015) 103–111.
- [14] L. Nan, C. XiaoXin, L. Kai, M. KaiShengi, W. Di, W. Wei, L. Rui, Y. DunShan, Low power adiabatic logic based on FinFETs, Sci. China Informat. Sci. 57 (2) (2014)
- [15] N. Gong, B. Guo, J. Lou, J. Wang, Analysis and optimization of leakage current characteristics in sub-65 nm dual vt footed domino circuits, Microelectron. J. 39 (2008) 1149–1155.
- [16] N. Shanbhag, K. Soumyanath, S. Martin, Reliable low- Power Design in the presence of deep submicron noise, in: Proceedings of the 2000 international symposium on Low power electronics and design, 2000, Rapallo, Italy, July 25 – 27
- [17] S.H. Choi, B.C. Paul, K. Roy, Dynamic Noise Analysis with Capacitive and Inductive Coupling, in: Proceedings of 7th Asia and South Pacific Design Automation Conference(ASPDAC), 2002, Jan 11.
- [18] A. Neve, H. Schettler, T. Ludwig, D. Flandre, Power-delay product minimization in high-performance 64-bit carry-select adders, IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 12 (3) (2004) 235–244.
- [19] M. Jhumb, Garima, H. Lohani, Design implementation and performance comaprison of multiplier topologies in Power-delay space, Eng. Sci. Technol. Int. J. 19 (1) (2016) 355–363.
- [20] M. Jhumb, Citanjali Efficient Adders for assitive devices space, Eng. Sci. Technol. Int. J. 20 (1) (2017) 95–104.
- [21] P. Kumar, R.K. Sharma, Low voltage high performance hybrid full adder, Eng. Sci. Technol. Int. J. 19 (1) (2016) 559–565.
- [22] M.H. Moaiyeri, M. Nasiri, N. Khastoo, An Efficient ternary serial adder based on carbon nanotube FETs, Eng. Sci. Technol. Int. J. 19 (1) (2016) 271–278.
- [23] P. Gronowski, Issues in dynamic logic design, Design of High-Performance Microprocessor Circuits, A.Chandrakasan, W. J. Bowhill, and F. Fox, Eds. Piscataway, NJ: IEEE Press, 2001, Chapter 8, 140–157.
- [24] Technology library: 32 nm bulk CMOS, Predictive Technology model, http:// ptm.asu.edu/
- [25] Web address: http://bsim.berkeley.edu