# University of New Mexico UNM Digital Repository

Electrical and Computer Engineering ETDs

**Engineering ETDs** 

7-9-2009

# Circuit designs for low-power and SEU-hardened systems

Vallabh Srikanth Devarapalli

Follow this and additional works at: https://digitalrepository.unm.edu/ece etds

#### Recommended Citation

 $Devarapalli, Vallabh \ Srikanth. \ "Circuit \ designs \ for \ low-power \ and \ SEU-hardened \ systems." \ (2009). \ https://digitalrepository.unm.edu/ece\_etds/69$ 

This Thesis is brought to you for free and open access by the Engineering ETDs at UNM Digital Repository. It has been accepted for inclusion in Electrical and Computer Engineering ETDs by an authorized administrator of UNM Digital Repository. For more information, please contact disc@unm.edu.

| Vallabh Srikanth Devarapalli Candidate                                                          |                          |
|-------------------------------------------------------------------------------------------------|--------------------------|
| Electrical and Computer Engineering  Department                                                 |                          |
| This thesis is approved, and it is acceptable in quality and form for publication on microfilm: |                          |
| Approved by the Thesis Committee:                                                               |                          |
| Torpm Fulph                                                                                     | , Chairperson            |
| 1                                                                                               |                          |
| Je -                                                                                            | , Dr. Steven C. Suddartl |
|                                                                                                 |                          |
| 2. Howard Polland                                                                               | , Dr. L. Howard Pollard  |
|                                                                                                 |                          |
|                                                                                                 |                          |
| Accepted:                                                                                       |                          |
|                                                                                                 | raduate School           |
|                                                                                                 | Date                     |

### CIRCUIT DESIGNS FOR LOW-POWER AND SEU-HARDENED SYSTEMS

By

Vallabh Srikanth Devarapalli

Bachelor of Engineering, Electronics and Communication, Andhra University, 2004

#### **THESIS**

Submitted in Partial Fulfillment of the Requirements for the Degree of

Master of Science Electrical Engineering

The University of New Mexico Albuquerque, New Mexico

May, 2009

© 2009, Vallabh Srikanth Devarapalli

#### **DEDICATION**

To my father, Dr. D. Prasada Rao, my motivation personified. He, a tireless teacher by virtue and profession has taught me to succeed and complete the most formidable tasks by setting himself as an example.

To my mother, D. Lakshmi Kantham for all her prayers and unconditional love.

To my brother, D. Naga Ravi Kanth for always being there when I most needed him. He, a diligent student himself has taught me that passion and love for what one does has no failure.

#### **ACKNOWLEDGEMENT**

I would like to thank Prof. Payman Zarkesh-Ha, my advisor and thesis chair, for his guidance, support and encouragement during my Master's program and also for being an outstanding teacher inside and outside of the classroom.

I would like to thank Prof. Steven C. Suddarth, my advisor and financial supporter, for his guidance in academic as well as non-academic topics. Through his knack for details, he has always provided me with many insightful observations and valuable comments. I am very grateful for all his advice which has and will be a great value addition to me during my professional career.

I am also grateful to Prof. L. Howard Pollard, for accepting to be as a committee member with very short notice. I am also grateful to him for patiently answering my questions even when they are not from his courses. I would like to thank him for all his time.

I gratefully acknowledge Craig Kief, for making my life at graduate school most comfortable by always making sure I had everything I need. He was always there to patiently review my work. I am very fortunate to have him during my Master's, without whom it would not have been as successful.

I would like to thank my brother, D. Naga Ravi Kanth and his wife, P. Ridhima for taking good care of me over my entire stint at the University of New Mexico.

Finally, I convey many thanks to all my friends for filling my life with love and happiness.

## CIRCUIT DESIGNS FOR LOW-POWER AND SEU-HARDENED SYSTEMS

By

Vallabh Srikanth Devarapalli

#### ABSTRACT OF THESIS

Submitted in Partial Fulfillment of the Requirements for the Degree of

Master of Science Electrical Engineering

The University of New Mexico Albuquerque, New Mexico

May, 2009

#### CIRCUIT DESIGNS FOR LOW-POWER AND SEU-HARDENED SYSTEMS

#### By

#### Vallabh Srikanth Devarapalli

Bachelor of Engineering in Electronics and Communication, Andhra University, 2004 M.S. in Electrical Engineering, University of New Mexico, 2009

#### **ABSTRACT**

The desire to have smaller and faster portable devices is one of the primary motivations for technology scaling. Though advancements in device physics are moving at a very good pace, they might not be aggressive enough for now-a-day technology scaling trends. As a result, the MOS devices used for present day integrated circuits are pushed to the limit in terms of performance, power consumption and robustness, which are the most critical criteria for almost all applications.

Secondly, technology advancements have led to design of complex chips with increasing chip densities and higher operating speeds. The design of such high performance complex chips (microprocessors, digital signal processors, etc) has massively increased the power dissipation and, as a result, the operating temperatures of these integrated circuits. In addition, due to the aggressive technology scaling the heat withstanding capabilities of the circuits is reducing, thereby increasing the cost of packaging and heat sink units. This led to the increase in prominence for smarter and more robust low-power circuit and system designs.

Apart from power consumption, another criterion affected by technology scaling is robustness of the design, particularly for critical applications (security, medical, finance, etc). Thus, the need for error free or error immune designs. Until recently, radiation effects used to be a major concern in space applications only. With technology scaling

reaching nanometer level, terrestrial radiation has become a growing concern. As a result Single Event Upsets (SEUs) have become a major challenge to robust designs. Single event upset is a temporary change in the state of a device due to a particle strike (usually from the radiation belts or from cosmic rays) which may manifest as an error at the output.

This thesis proposes a novel method for adaptive digital designs to efficiently work with the lowest possible power consumption. This new technique improves options in performance, robustness and power. The thesis also proposes a new dual data rate flip-flop, which reduces the necessary clock speed by half, drastically reducing the power consumption. This new dual data rate flip-flop design culminates in a proposed unique radiation hardened dual data rate flip-flop, "Firebird". Firebird offers a valuable addition to the future circuit designs, especially with the increasing importance of the Single Event Upsets (SEUs) and power dissipation with technology scaling.

# Table of Contents

|             | Page                                                          |
|-------------|---------------------------------------------------------------|
| List of Fig | guresxi                                                       |
| List of Ta  | blesxv                                                        |
|             |                                                               |
| Preface     | XV                                                            |
|             | Motivation xvi                                                |
|             | Contributions of this Thesisxvii                              |
|             | Thesis Organizationxviii                                      |
| Chapter 1   | Introduction                                                  |
|             | 1.1. Measurement Criteria for Power Calculations              |
|             | 1.1.1. Dynamic Power Consumption                              |
|             | 1.1.2. Static Power Consumption                               |
|             | 1.1.3. Direct-Path Power Consumption                          |
|             | 1.2. Impact of Technology Advancements on Power Dissipation 3 |
|             | 1.3. Physical Mechanism of Particle-Silicon Interaction       |
|             | 1.3.1. Direct Ionization6                                     |
|             | 1.3.2. Indirect Ionization                                    |
|             | 1.4. Charge Collection Mechanism for a Typical CMOS Device    |
|             | 1.5. Impact of Technology Scaling on SEU                      |
| Chapter 2   | Background                                                    |
|             | 2.1. Power Reduction Techniques                               |

| 2.1.1. Dynamic Power Reduction                        | 11 |
|-------------------------------------------------------|----|
| 2.1.2. Leakage Power Reduction                        | 16 |
| 2.2. SEU Immunity in Combinational Logic              | 20 |
| 2.2.1. Inherent Immunity                              | 21 |
| 2.2.2. Artificial Immunity                            | 24 |
| Chapter 3 Scavenger Technique                         | 26 |
| 3.1. Introduction                                     | 26 |
| 3.2. Adaptive Voltage and Frequency Scaling Technique | 28 |
| 3.3. Adaptive Error Detection/Correction Techniques   | 29 |
| 3.3.1. Razor Technique                                | 30 |
| 3.3.2. Latch Based Time Borrowing Technique           | 31 |
| 3.4. Scavenger Technique                              | 32 |
| 3.5. Benefit of Scavenger Technique                   | 33 |
| 3.6. Limitations                                      | 35 |
| 3.7. Implementation                                   | 38 |
| 3.8. Experimental Confirmation                        | 40 |
| 3.9. Conclusions                                      | 42 |
| Chapter 4 Firebird Flip-Flop                          | 43 |
| 4.1. Introduction                                     | 43 |
| 4.2. Background                                       | 44 |
| 4.2.1. D Flip-Flop                                    | 45 |
| 4.2.2. Existing Rad-hard Flip-Flop Design             | 46 |
| 4.3 Existing Dual Data Rate Flin-Flon Designs [27]    | 47 |

|           | 4.4. Dual Data Rate Flip-Flop Design | 49 |
|-----------|--------------------------------------|----|
|           | 4.5. Firebird                        | 51 |
|           | 4.6. Simulations                     | 57 |
|           | 4.7. Conclusions                     | 61 |
| Chapter:  | 5 Conclusions                        | 62 |
| Reference | res                                  | 65 |

# List of Figures

| Figure 1: Delay Distribution [1]xv:                                             |
|---------------------------------------------------------------------------------|
| Figure 1.1: Moore's Law [5]                                                     |
| Figure 1.2: Power Density Problem                                               |
| Figure 1.3: Impact of Scaling on Leakage Current [6]                            |
| Figure 1.4: Interaction of an Energetic Proton and Silicon [5]                  |
| Figure 1.5: Electron concentration due to funneling in an n+/p silicon junction |
| following an electron strike [9]                                                |
| Figure 1.6: The ALPEN Effect [11]                                               |
| Figure 2.1: Diminishing Returns in Parallelism [13]                             |
| Figure 2.2: Logic Restructuring                                                 |
| Figure 2.3: Input Reordering                                                    |
| Figure 2.4: Components of MOS Leakage                                           |
| Figure 2.5: Stack Effect                                                        |
| Figure 2.6: Leakage Control by Vector Manipulation [10]                         |
| Figure 2.7: Logical Masking                                                     |
| Figure 2.8: NAND Gate Sensitivity to Different Input Vectors [17]               |
| Figure 3.1: Power and Frequency Variation                                       |
| Figure 3.2: Razor Technique [23]                                                |
| Figure 3.3: Latch Based Time Borrowing [4]                                      |
| Figure 3.4: Scavenger Technique                                                 |
| Figure 3.5: Scavenger for SEU Detection                                         |
| Figure 3.6: Scavenge from Previous Stage                                        |

| Figure 3.7: Solution to Short Path Issues in Critical Paths    | 36 |
|----------------------------------------------------------------|----|
| Figure 3.8: Trade-off                                          | 37 |
| Figure 3.9: Timing Variations controlled by Input Patterns     | 38 |
| Figure 3.10: Test Circuit                                      | 39 |
| Figure 3.11: Adaptive System Algorithm                         | 39 |
| Figure 3.12: Risk Rate                                         | 40 |
| Figure 3.13: Results at 3.3V                                   | 40 |
| Figure 3.14: Results at 2.64V                                  | 41 |
| Figure 3.15: Results for Adaptive System                       | 41 |
| Figure 3.16: Test Setup                                        | 42 |
| Figure 4.1: D Flip-Flop                                        | 45 |
| Figure 4.2: D Flip-Flop Output Waveform.                       | 45 |
| Figure 4.3: Simplified Error-Correction Scanout Flip-Flop      | 47 |
| Figure 4.4: Error-Correction Scanout Flip-Flop Output Waveform | 47 |
| Figure 4.5: (a) DESPFF (b) DSPFF (c) PULS Generator [27]       | 49 |
| Figure 4.6: Dual Data Rate Flip-Flop                           | 50 |
| Figure 4.7: Dual Data Rate Flip-Flop Output Waveform           | 51 |
| Figure 4.8: Firebird: A SEU Hardened Dual Data Flip Flop       | 53 |
| Figure 4.9: Firebird Flip-Flop Output Waveform                 | 54 |
| Figure 4.10: Extended Firebird                                 | 55 |
| Figure 4.11: Extended Firebird Output Waveform                 | 56 |
| Figure 4.12: Dual Data Rate Flip-Flop Output                   | 59 |
| Figure 4.13: Existing Rad-hard Flip-Flop Output                | 59 |

| •   |  |
|-----|--|
| X1V |  |

| Figure 4.14: Firebird Flip-Flop Output | . 60 |
|----------------------------------------|------|
| Figure 4.15: Extended Firebird Output  | . 60 |

# List of Tables

| Table 2.1: Stack Effect on Leakage [14]   | 18 |
|-------------------------------------------|----|
| Table 4.1: C-Element Truth Table          | 50 |
| Table 4.2: Extended C-Element Truth Table | 53 |
| Table 4.3: Results                        | 58 |

## **Preface**

## Motivation

Device dimensions are being reduced for high-speed, complex, and compact circuitry. Due to the presence of larger number of transistors in a unit area of a chip and the higher clock speeds, the predictability of the designs has drastically reduced. Figure 1 shows that, even though the typical delays of newer technologies have improved by a good extent, the worst-case delays have not improved much. Thus, the traditional worst-case design approach is very expensive from the area, power and performance point of view. Moreover using standard flip-flops for synchronous designs wastes half the clock edges (positive or negative edge), adding unnecessary switching power consumption on the design.



Figure 1: Delay Distribution [1]

The reduction in device capacitance, lower voltage levels, as well as increase in clock speeds and functionality has also increased the probability of single event upsets. It is predicted that soft error rate (SER) of combinational logic will equal the SER of unprotected memory by 2011 [2]. Traditional circuit-level hardening techniques, such as gate duplication, and gate cloning, will result in unacceptable area and power overhead for future technology advancements due to higher SERs.

A new design approach which gives the designer flexibility in the three critical areas: power, performance, and robustness with very low area overhead is needed as a substitute to worst-case design approaches. Secondly, a new robust dual data rate flip-flop with as low a gate count as possible is needed to reduce the dynamic power of a digital system. Thirdly, a new radiation hardened dual data rate flip-flop is required to increase the performance and robustness of future designs in critical cases. This thesis focuses on solutions to all the three requirements mentioned above that might be critical for the successful survival of the digital designs with future technology advancements.

#### Contributions of this Thesis

This thesis work is a compilation of low power adaptive digital design techniques, robust low power consuming dual data rate flip-flop design and SEU hardened dual data rate flip-flop design. The following are the contributions of this thesis work.

- [1a] Srikanth V. Devarapalli, Payman Zarkesh-Ha, and Steven C. Suddarth "Adaptive Circuit Implementation in FPGAs," *FPGA Summit 2008*, December 2008.
- [2a] Srikanth V. Devarapalli, Payman Zarkesh-Ha, and Steven C. Suddarth "Scavenger: An Adaptive Design Technique for Low Power ASIC/FPGA," *ICC 2009*, April 2009.
- [3a] Provisional Patent filed on April 7<sup>th</sup>, 2009. Title: SEU-hardened Dual Data Rate Flip-Flop Circuit Designs. Inventors: Srikanth V. Devarapalli, Payman Zarkesh-Ha, and Steven C. Suddarth.

[4a] Srikanth V. Devarapalli, Payman Zarkesh-Ha, and Steven C. Suddarth, "Firebird: A SEU-hardened Dual Data Rate Flip-Flop," will be submitted to CICC 2009, September 2009.

# Thesis Organization

This thesis is organized in the following manner. Chapter 1 gives preliminary knowledge about power consumption and its criticality. It also covers the basic physical mechanisms of how a particle strike on semiconductor devices causes an upset in the logic value stored at a node in an integrated circuit. The increase in SEU influence with technology scaling is also discussed. Chapter 2 gives an overview of existing solutions employed to reduce power consumption. The process of SEUs in combinational logic, explaining the inherent and artificial immunity of standard combinational logic is also covered for better understanding of the SEU hardened flip-flop design presented in this thesis. Chapter 3 discusses about a new adaptive technique, "Scavenger" that can be used for future ASIC/FPGA lower power designs. This chapter concludes with the results indicating this technique's advantages. Chapter 4 discusses a new dual data rate flip-flop which is as robust as a standard D flip-flop and also gives better power savings than the existing dual data rate flip-flop. This chapter also discusses a unique radiation hardened dual data rate flip-flop, "Firebird". Firebird flip-flop never latches faulty data due to SEUs occurring on the flip-flop, giving it unprecedented advantage over the existing radiation hardened flip-flop. This chapter also shows the benefits of both the dual data rate flip-flop and Firebird flip-flop. Chapter 5 concludes the thesis work with how this thesis work has

satisfied a design engineer's dream to have simple, efficient and flexible solutions to power consumption and robustness issues in a very economical way.

# Chapter 1

# Introduction

#### 1.1. Measurement Criteria for Power Calculations

The power consumption in conventional CMOS digital designs can be expressed as a sum of three main components: (1) Dynamic Power Consumption, (2) Static or Leakage Power Consumption and (3) Direct-Path Power Consumption.

$$P_{tot} = P_{dyn} + P_{stat} + P_{dp} \tag{1.1}$$

# 1.1.1. Dynamic Power Consumption

Dynamic power represents the power dissipation during a switching event in a digital design i.e., a transition from 1 to 0 or vice versa. Every time there is a transition from high to low (or low to high) the load capacitance at the output node discharges or charges, respectively. Each time the load capacitor gets charged through the PMOS transistor the voltage at the node rises from 0 to VDD and a certain amount of energy is drawn from the power supply. Part of this energy is dissipated in the PMOS device and the rest is stored on the load capacitor. Similarly, when the capacitor discharges the stored energy is dissipated in the NMOS transistor [3, 4].

The dynamic power consumption can be calculated as follows:

$$Q = C_I V_{DD} \tag{1.2}$$

$$E = C_L V_{DD}^2 \tag{1.3}$$

$$P_{dyn} = \frac{1}{2} C_L V_{DD}^2 f \alpha \tag{1.4}$$

where Q is the charge, E is the energy,  $P_{dyn}$  is the dynamic power consumption,  $C_L$  is the load capacitance,  $V_{DD}$  is the supply voltage, f is clock frequency,  $\alpha$  is the activity factor

#### 1.1.2. Static Power Consumption

Static power represents the power consumption of a circuit due to a current flow between the supply rails in the absence of switching activity. Ideally, this static current of CMOS inverter should be zero as PMOS and NMOS are never on simultaneously in a steady-state operation. Unfortunately, leakage current flows through the reverse-biased diode junctions of the transistors located between the source or drain and the substrate [3, 4].

$$P_{stat} = V_{DD}I_{leak} \tag{1.5}$$

where  $P_{stat}$  is the static power consumption,  $I_{leak}$  is the leakage current and  $V_{DD}$  is the supply voltage.

# 1.1.3. Direct-Path Power Consumption

Direct-path power represents the power consumption of a circuit due to a direct current path between  $V_{DD}$  and GND for a short period of time during switching, when both PMOS and NMOS transistors are conducting simultaneously. In an ideal case, where the rise and fall time of the input waveform is zero, the direct path power consumption will

be zero. The finite slope of the input signal causes the direct current path resulting in a current spike [3, 4].

Approximating the current spikes as triangles, the direct path power consumption can be calculated as follows:

$$E_{dp} = t_{sc} V_{DD} I_{peak} \tag{1.6}$$

$$P_{dp} = t_{sc} V_{DD} I_{peak} f \tag{1.7}$$

$$P_{dp} = C_{sc} V_{DD}^{2} f ag{1.8}$$

where  $E_{dp}$  is the Energy consumption per switching period,  $P_{dp}$  is the direct-path power consumption,  $I_{peak}$  is the maximum current,  $V_{DD}$  is the supply voltage,  $t_{sc}$  is the time when both PMOS and NMOS are conducting, f is the clock frequency and  $C_{sc}$  is the short circuit capacitance [4].

# 1.2. Impact of Technology Advancements on Power Dissipation

Power dissipation does not follow simple scaling rules. The various factors influencing power consumption are discussed in this section.

Technology scaling helps reduce the dynamic power consumption by reducing voltage and device sizes which allows a reduction in load capacitance, as well as a reduction in supply voltage due to voltage scaling. However, over the past few decades, scientists, who are trying to keep up with Moore's Law (the number of transistors on a chip doubles every 18 to 24 months), have made chips with more complex designs resulting in higher gate density and faster clock speeds. The impact of Moore's law over the number of transistors per chip can be observed from Figure 1.1. This is increasing the effective dynamic

power. The impact on power density over the years due to technology scaling as predicted by Intel Corporation can be observed from Figure 1.2. Though these trends might seem overwhelming, they are not too farfetched with the present technology advancements.



Figure 1.1: Moore's Law [5]



Figure 1.2: Power Density Problem

To keep up with the need for higher performance along with technology scaling, designers have to reduce the threshold voltages. With reduction in threshold voltages, the leakage power has increased due to higher subthreshold leakage. Due to the device scaling, the gate induced drain leakage (GIDL) has also increased. Though techniques like adaptive body biasing, high V<sub>t</sub> transistors and high K dielectric are being used, the effective leakage power is increasing with technology scaling. The influence of technology scaling and temperature on leakage current can be observed in Figure 1.3.



Figure 1.3: Impact of Scaling on Leakage Current [6]

# 1.3. Physical Mechanism of Particle-Silicon Interaction

SEUs are typically caused due to two main sources. Primarily, SEUs are caused by ionizing radiation components in the atmosphere such as neutrons, protons, and heavy ions. Additionally, SEUs can also be caused by alpha particles from the decay of trace concentrations of uranium and thorium present in some integrated circuit packaging materials [7]. The solar rays from the Sun that dominate the Earth's environment, and galactic cosmic rays from space contain subatomic energetic particles that collide with nitrogen and oxygen atoms and produce high energy protons, neutrons, and heavy ions [8].

There are two primary methods by which ionizing radiation releases charge in a semiconductor device: direct ionization by the incident particle itself and ionization by secondary particles created by nuclear reactions between the incident particle and the struck device. Both mechanisms can lead to integrated circuit malfunction [9].



Figure 1.4: Interaction of an Energetic Proton and Silicon [5]

Figure 1.4 shows how an energetic proton produces an electric signal. The proton produces charges along its path, in the form of electrons and holes. These are collected at the source and drain the transistor, producing a current pulse. This pulse can be large enough to change the state of a node (achieved by shorting the drain and substrate of the transistor under attack) from logic 1 to logic 0 and vice versa [8].

#### 1.3.1. Direct Ionization

Direct ionization is a process that occurs when a heavy ion strikes a semiconductor material and it releases electron-hole pairs along its path as it loses energy. Any ion with atomic number greater than or equal to two (i.e., particles other than protons, electrons, neutrons, or pions) is classified as a heavy ion. Lighter particles, such as protons, do not usually produce enough charge by direct ionization to cause upsets. However, recent research has suggested that as devices become ever more susceptible, upsets in digital ICs due to direct ionization by protons may occur [9].

#### 1.3.2. Indirect Ionization

Indirect ionization is a process where a high-energy light particle (proton or neutron) enters the semiconductor lattice and undergo an inelastic collision with a target nucleus. This collision might result in elastic collisions that produce Si recoils or the emission of alpha or gamma particles and the recoil of a daughter nucleus (e.g., Si emits alphaparticle and a recoiling Mg nucleus) or spallation reactions, in which the target nucleus is broken into two fragments (e.g., Si breaks into C and O ions), each of which can recoil. These particles are much heavier than the original proton or neutron and can deposit energy along their path. They deposit higher charge densities as they travel and therefore may be capable of causing an SEU. Inelastic collision products typically have fairly low energies and do not travel far from the particle impact site resulting in all electron-hole pair generation near the impact area [9].

# 1.4. Charge Collection Mechanism for a Typical CMOS Device

When a particle strikes a microelectronic device, the most sensitive regions are usually reverse-biased p/n junctions. The high field present in a reverse-biased junction depletion

region can collect most of the particle-induced charge through drift processes, thereby resulting in a transient current at the junction contact. Strikes on a depletion region can cause the carriers to diffuse into the vicinity of the depletion region field where they can be efficiently collected. This process of temporary depletion region extension is referred as funneling. This funneling effect can increase charge collection at the struck node by extending the junction electric field away from the junction and deep into the substrate, such that charge deposited some distance from the junction can be collected through the efficient drift process. Figure 1.5 shows the electrons concentration due to funneling [9].



Figure 1.5: Electron concentration due to funneling in an n+/p silicon junction following an electron strike [9]

The two major mechanisms that cause SEUs are: (1) Drift process and (2) Diffusion process. Drift process causes the initial flip of the logic state as explained above. The more important factor is the diffusion process (electrons diffusing from substrate to drain/bulk potential barrier), which contributes to the late time collection of the current at the struck node ensuring that a bit stays flipped [9, 10].

The charge collection mechanism in submicron devices results from a disturbance in the channel potential of the device, referred as funneling effect. The effect is triggered by a particle strike that passes through both the source and the drain at near-grazing incidence as shown in Figure 1.6. Such a strike causes a significant (but short-lived) sourcedrain conduction current that mimics the "on" state of the transistor. This phenomenon is called the ALPEN effect [9]. ALPEN effect tends to increase as the channel length decreases. Another effect known as the bipolar transistor effect is caused due to injection of electrons over the source/well barrier. For example, in an n-channel MOSFET holes left in the well due to a particle strike raise the well potential effectively lowering the source/well potential barrier. This lowered potential barrier causes the source to inject electrons into the channel. These electrons can be collected at the drain effectively increasing the original particle-induced current. This current increases the SEU sensitivity. Because the electrons are injected over the source/well barrier, this is referred to as a bipolar transistor effect, where the source acts as the emitter, the channel as the base region, and the drain as the collector. Reducing the channel length effectively decreases the base width, and the effect becomes more pronounced [9].



Figure 1.6: The ALPEN Effect [11]

# 1.5. Impact of Technology Scaling on SEU

The soft error rate does not have a linear relationship with device scaling. The various factors influencing SER are discussed in this section.

Technology scaling has reduced the device sizes effectively reducing the capacitances. As a result, lesser charge is needed to upset that node as the critical charge of the node decreases [9, 12]. This means that a larger number of radiation strikes will be capable of causing upsets. Technology scaling has also increased the clock frequency and lowered supply voltage, both of which have effectively increased soft error rates. Experiments indicate that ALPEN effect increases rapidly for effective gate lengths below about 0.5 µm. It has also been predicted that the ALPEN effect can occur in 0.3 µm gate length MOS-FETs even for normal incidence strikes and can lead to charge multiplication [9]. The bipolar transistor effect also increases with technology scaling. Even light particle strikes (proton and neutron) may lead to direct ionization in advanced technologies [9, 10]. Firstorder calculations suggest that the neutron-induced SER should increase with the mass density of a material. Therefore, CMOS processes which use heavy materials like copper, tantalum, tungsten and cobalt, SER is predicted to increase [12]. On the contrary technology scaling reduces the collection volume as the drain depletion area reduces. This reduction in collection volume reduces the charge collection efficiency helping in improving the soft error rates with scaling [9, 10, 12].

# Chapter 2

# Background

This chapter provides a brief overview of the standard solutions employed to reduce the power consumption of integrated circuits. The process of SEUs in combinational logic is also explained for a better understanding of the SEU hardened flip-flop presented in this thesis. This chapter ends with motivation for this thesis.

# 2.1. Power Reduction Techniques

## 2.1.1. Dynamic Power Reduction

As covered in the introduction section, dynamic power calculation is done using the basic equation (1.4),

$$P_{dyn} = \frac{1}{2} C_L V_{DD}^2 f \alpha \tag{2.4}$$

From this equation, we can clearly deduce that the dynamic power can be reduced by either decreasing the supply voltage, load capacitance, frequency, or activity factor.

# 2.1.1.1. Supply Voltage Reduction

Supply voltage has a quadratic effect on dynamic power. This makes supply voltage scaling the most attractive technique to reduce dynamic power. It is important however to

keep in mind that the performance of a circuit is directly proportional to the supply voltage. The challenge is to reduce supply voltage without adversely impacting the throughput.

Introducing parallelism/pipelining at the architectural level helps to maintain the efficiency at lower voltages, but parallelism increases chip area. Too much parallelism might also result in an increase in power consumption. It can be understood from the graph shown in Figure 2.1, which is a plot between normalized power and supply voltage. The plot shows reduction in power consumption with more parallelism, but beyond certain point it begins to show an increase in the power for an increase in parallelism. This phenomenon occurs because the capacitance overhead starts to dominate at high levels of parallelism [13].



Figure 2.1: Diminishing Returns in Parallelism [13]

Multiple voltage domains offer another method to reduce supply voltage by using separate voltage levels for different sections of the circuit. Therefore, lower supply voltages can be used for slower sections of the circuitry and normal supply for time critical sections. Unfortunately this comes at a cost of DC-DC converters.

Dynamic voltage scaling is a better option for applications with non-uniform throughput requirements. Therefore, depending on the need for the computation, the voltage can be scaled high or low. This would add additional circuitry to adaptively control the supply voltage to the circuitry. Threshold voltage of the devices can also be reduced to increase the performance of the design at the cost of increased leakage current. Adaptive threshold voltage control can also be done based on the computation needs.

#### 2.1.1.2. Load Capacitance Reduction

Device size is a measure of load capacitance. As the device size scales down, the transistor capacitance decreases, reducing the load capacitance for the previous stage. Unfortunately as the size of the device decreases, the time to drive the load increases due to increase in output resistance. This is because of the fact that  $R_{out}$  is inversely proportional to the width of the transistor. Therefore all transistors should be sized for lowest power along with the constraint to meet timing requirements.

Optimum gate sizing for a required power and timing constraint can be done by using the following delay equation [2]:

$$\hat{D} = t_{p0}(P + N\hat{f}) \tag{2.5}$$

$$P = Np \qquad \dots (2.6)$$

$$\hat{f} = \sqrt[N]{F} \qquad \dots (2.7)$$

$$F = GBH = \prod gh \tag{2.8}$$

where  $\hat{D}$  is the effective delay,  $t_{p0}$  is the unit delay, N is the number of stages, p is the intrinsic delay, g is the logical effort and h is the electrical effort.

Intrinsic delay is a function of the technology. It is the delay primarily due to internal capacitances (unloaded gate delay). Logical effort is a function of the complexity of a gate and not its size. It is a ratio of the input capacitance of the gate to the input capacitance of an inverter (reference gate). It is a measure of the gate's ability to drive a given load. Electrical effort is characterized by the load. It is the ratio of the output capacitance to input capacitance. It represents the load that a given gate is subjected to.

#### 2.1.1.3. Activity Factor Reduction

Logical restructuring can be used to reduce the switching activity of the intermediate nodes. Figure 2.2 gives a simple example of how switching activity on intermediate nodes can be decreased using logical restructuring. Chain structure gives a lower switching activity on node O2.

Logical restructuring is also used to reduce spurious transitions (glitch) which helps to bring down the overall switching factor. This can also be explained using the example shown in Figure 2.2. If all the inputs A, B, C and D are occurring at the same time then there is a possibility for glitch occurrence in a chain structure due to the gate delays introduced for O1 and O2 nodes. But in the tree structure, the transitions on both O1 and O2 will occur at the same time assuming equal gate delays.



Figure 2.2: Logic Restructuring

Input reordering also helps to reduce the activity factor. This can be explained better with the help of the example shown in Figure 2.3. We can observe from the example that the internal node will have much lower activity factor (0.02) for the case where B and C is given to the first and gate than the activity factor (0.1) for the other one.



Figure 2.3: Input Reordering

Resource sharing also sometimes increases the switching factor. For example, say two inputs use the same track through a multiplexer. Even when there is no transition on the

signals individually, it will still result in switching activity on the track in case both the signals are not having the same value. Therefore, avoiding resource sharing might sometimes save dynamic power.

There are many other optimization techniques, like clock gating, pre-computation, and dynamic power management that are proposed to reduce the switching activity of logic circuits.

#### 2.1.2. Leakage Power Reduction

The various components involved in semiconductor device leakage are (1) Junction leakage, I<sub>J</sub>: the reverse bias p-n junction leakage at the Drain, (2) Weak inversion leakage (or subthreshold conduction), I<sub>SUB</sub>: the diffusion of carriers, (3) Drain Induced Barrier Lowering (DIBL): interaction of the depletion region of the Drain with the Source under the channel effectively reducing the gate control, (4) Gate Induced Drain Leakage (GIDL), I<sub>GIDL</sub>: the high electric field under the Gate/Drain overlap region which thins out the depletion width of the drain to well p-n junction, (5) Gate oxide leakage, I<sub>G</sub>: direct tunneling through the gate oxide. Gate oxide leakage, unlike other leakages, occurs when the gate is on. Figure 2.4 shows the various components of the MOS leakage current which are explained above.



Figure 2.4: Components of MOS Leakage

#### 2.1.2.1. Stack Effect

The stack effect refers to the reduction in leakage in a transistor stack when more than one transistor is turned off. This can be explained using the subthreshold current equation of a transistor.

$$I_{D} = I_{o} e^{\frac{V_{GS} - V_{Th}}{nV_{t}}} (1 - e^{\frac{-V_{DS}}{V_{t}}}) (1 + \lambda V_{DS})$$
(2.1)

$$I_{leakage} = I_{D} \Big|_{V_{GS}=0} = I_{o} e^{\frac{-V_{Th}}{nV_{i}}} (1 - e^{\frac{-V_{DS}}{V_{i}}}) (1 + \lambda V_{DS})$$
 (2.2)

where  $I_D$  is drain current,  $I_o$  and n are empirical parameters,  $V_{GS}$  gate to source voltage,  $V_{DS}$  is the drain to source voltage,  $V_{Th}$  is the threshold voltage,  $V_t$  is the thermal voltage,  $\lambda$  is the channel length modulation.

From the equations (2.1) and (2.2), we can clearly observe the heavy dependence of leakage current on threshold and drain to source voltages. In the case of stacked transistors, the effective threshold voltage increases due to increase in source to substrate reverse bias and also the drain to source voltage of individual transistors decreases as

shown in Figure 2.5. This compound effect greatly reduces the leakage current in stacked transistors. Table 2.1 shows the factor of reduction in leakage current obtained with an increase in depth of the stack [14].



Figure 2.5: Stack Effect

Table 2.1: Stack Effect on Leakage [14]

|        | $High V_t$ | $Low\ V_t$ |
|--------|------------|------------|
| 2 NMOS | 10.7X      | 9.96X      |
| 3 NMOS | 21.1X      | 18.8X      |
| 4 NMOS | 31.5X      | 26.7X      |
| 2 PMOS | 8.6X       | 7.9X       |
| 3 PMOS | 16.1X      | 13.7X      |
| 4 PMOS | 23.1X      | 18.7X      |

The disadvantages of this technique are higher area overhead and larger input capacitance, which result in an increase in dynamic power.

#### 2.1.2.2. Vector manipulation

Vector manipulation is a zero area overhead and zero dynamic power overhead technique to fight the leakage power consumption. Each input vector gives effective leakage power consumption for the circuit at hand. On analyzing and determining the vector pattern that gives the lowest leakage power, one can apply that input vector to the circuit during the idle time. This can be better understood from the example shown in Figure 2.6 [15]. In this example, input vector "111" gives the highest amount of leakage, as 3 PMOS in OFF state are in parallel resulting in maximum leakage current and 3 NMOS driving the output are in series resulting in very weak drive strength.



Figure 2.6: Leakage Control by Vector Manipulation [10]

### 2.1.2.3. Dual $V_{Th}$ and Adaptive Body Biasing

Dual  $V_{Th}$  is a device level solution where some devices are made with high threshold voltage and others are made with low threshold voltage. By using the low  $V_{Th}$  devices for timing critical parts of the circuit and high  $V_{Th}$  devices for the non-timing critical parts of

the circuit the overall leakage power can be reduced without having much performance decrease.

Unlike the dual  $V_{Th}$  approach, where the threshold voltage of the transistors is raised using the device level solutions (like high k dielectrics or, thicker gate oxide); adaptive body biasing can be used to dynamically control the threshold voltage as shown below.

$$V_{Th} = V_{To} + \gamma (\sqrt{|2\phi_F + V_{SB}|} - \sqrt{|2\phi_F|})$$
 (2.3)

where  $V_{Th}$  is the threshold voltage,  $V_{To}$  is the threshold voltage when  $V_{SB}$  is zero,  $V_{SB}$  is the source to substrate voltage,  $\gamma$  is body effect coefficient,  $\phi_F$  Fermi potential.

From equation (2.3), we observe that the threshold voltage can be increased or decreased by varying the source to substrate voltage. This saves the manufacturing costs to a large extent.

# 2.2. SEU Immunity in Combinational Logic

There are various factors that influence soft error rates in combinational logic like the drive strength of the gate, fan out capacitance of the gate, clock speed, and logic depth. The SEU immunity in a combinational logic can be classified into two broad categories:

(1) Intrinsic immunity and (2) Extrinsic immunity. These inherent masking factors and man-made SEU mitigation techniques are explained in this section.

#### 2.2.1. Inherent Immunity

#### 2.2.1.1. Logical Masking

Even though a radiation strike is strong enough to cause an erroneous voltage level at the gate, it needs to propagate to a latching element to actually affect the functionality of the circuit. If, along the combinational path, this error gets logically masked, then the error will not be captured by the latching element [16].

Logical masking is explained with the help of a simple combinational circuit as shown in Figure 2.7. In this example, consider that input A to the inverter is "1". For normal operation (in absence of radiation strike) the output of the inverter will be "0". But when a particle strikes on the inverter, it might result in flipping the output Ā to "1". But the inverter has an AND gate at its fan out. If the second input, B to the AND gate has a value of "0" when this particle strike happens, it will have no impact on the OUT signal. This kind of masking occurs due to the fact that as long as one of the inputs to an AND gate is "0" the output is always "0" irrespective of the other input value. For the same case, if signal B is "1" during the particle strike then the OUT signal will be corrupted.



Figure 2.7: Logical Masking

Therefore, an AND gate has a logical masking for logic "0" as its other input. Similarly OR, NAND, NOR gates also have a logical masking for logic "0", "1", "1" respectively as their other inputs. Inverters, XOR, XNOR have no logical masking.

#### 2.2.1.2. Electrical Masking

The pulse width and height that could cause an error are dependent on the drive strength of the gate under attack [16] and not all radiation strikes are strong enough to create such current pulses. The ability of the gate to attenuate the signal variation caused by such weak particle strikes is called the electrical masking property of the gate. The probability of attenuating weak pulses increases with logical depth. The ability of a gate to electrically mask itself increases with an increase in load capacitance and reduction in that gate's drive strength. This is due to the fact that a large spurious pulse is needed to discharge or charge a larger capacitance. Secondly, the ability to attenuate a spurious pulse is higher for a weaker gate; therefore, lower the drive strength, better immunity to particle strikes.

Not all standard gates or the transistors inside the gates are equally sensitive. For example, for a NAND gate, the NMOS connected to the output node has the highest sensitivity. This is due to the location of the transistor inside the NAND gate and its carrier type (electrons). This makes the gates sensitivity to radiation strikes, input vector dependent as well. The degree of sensitivity of a sample NAND gate to each of the four input vectors can be seen in the Figure 2.8. The gate is most sensitive for an input vector "01". This is due to the fact that for the input combination "01", the drain, channel and source region of the upper NMOS and the drain-channel region of the lower NMOS are sensitive

to particle strikes. At the same time, there is only one PMOS driving the output node, reducing the drive strength to provide the stabilizing current. For a "00" input, only the drain of the upper NMOS is sensitive to particle strikes whereas two PMOSs in parallel drive the output node [17].



| Inputs |   | Failure rate     |
|--------|---|------------------|
| Α      | В | (Arbitrary Unit) |
| 0      | 0 | 10               |
| 0      | 1 | 120              |
| 1      | 0 | 90               |
| 1      | 1 | 50               |

Figure 2.8: NAND Gate Sensitivity to Different Input Vectors [17]

### 2.2.1.3. Temporal Masking

Even if a particle strike is strong enough to result in erroneous voltage level and also propagates to the latching element through a logically sensitized path, it might still not result in data corruption. This is due to the fact that all latching elements have a finite sampling window over which they capture data. If the erroneous data does not reach the latching element during that sampling window the output data is uncorrupted. The sampling window is equal to setup time plus hold time [16].

Therefore, any SEU occurring before or after the timing window will not corrupt the output data. This immunity to particle strikes occurring at time intervals outside this sampling window is called temporal masking of the circuit.

#### 2.2.2. Artificial Immunity

#### 2.2.2.1. Device-Level Hardening

Device level hardening is a technique where the efficiency of the device to collect charge is reduced thereby reducing the soft error rate. This is achieved by different technology solutions like using retrograde well or epitaxial layers and using silicon-on-insulator (SOI) devices. Device level hardening is a very expensive technique due to its need for new process technology [10].

#### 2.2.2. System-Level Hardening

System-level hardening is a technique where the designers provide architectural solutions to mitigate affects of single event upsets. These architectural solutions involve techniques like triple modular redundancy (TMR) and majority voting. Due to the nature of these techniques, an overwhelming area overhead and design effort is incurred. System-level hardening also includes techniques, such as scrubbing and watchdog timers, where the data in the pipeline needs to be completely flushed on detection of an error [10]. This method of reinitializing the system to an earlier correct state can be a huge performance overhead.

### 2.2.2.3. Circuit-Level Hardening

Circuit-level hardening is a technique, where the sensitivity of the gate or the logical arraignment of the gates is modified in such a way that the data corruption due to single

event upset is not transmitted to the next logic path. This kind of hardening is achieved by predicting sensitive gates and then sizing, duplicating, and/or cloning those sensitive gates [10]. Circuit-level hardening can also be done using indirect methods, such as dual  $V_{DD}/V_{TH}$ . Therefore, circuit-level hardening has a low area overhead and low manufacturing costs in comparison to device-level and system-level hardening.

## Chapter 3

## Scavenger Technique

#### 3.1. Introduction

In this chapter, a new technique that performs the voltage scaling in conjunction with frequency scaling to achieve ultra low power design in ASIC/FPGA is proposed. To detect errors and obtain the corrected data without any loss in performance, a delayed clock flip-flop is utilized to borrow timing from non-critical paths to be used in critical paths evaluation. Conservative experimental results suggest that our design technique can reduce the power consumption in FPGAs by 31% for an error free operation with only a 1% disagreement between the output values of the main and the delayed flip-flop. This disagreement signal, referred by the name "risk", is used as a status signal to indicate a drop in the available safety margins for error free operation. This approach thereby gives maximum flexibility in all three critical areas: performance, power, and robustness.

A major concern in modern VLSI circuit design is power consumption. This concern is more pronounced in FPGAs [18, 19] since a large number of transistors must be used to implement a Configurable Logic Block (CLB) to perform a single logical function. Our proposed technique takes the basic power calculation into consideration.

$$P = \frac{1}{2}C_L V_{DD}^2 f \alpha + V_{DD} I_{leak}$$
 (3.1)

where P is the total power consumption  $C_L$  is the load capacitance,  $V_{DD}$  is the supply voltage, f is clock frequency,  $\alpha$  is activity factor, and  $I_{leak}$  is the leakage current. The first

term in equation 3.1 represents the dynamic power consumption and the second term represents the leakage power consumption. Direct-path power consumption is ignored in this analysis as it contributes to a very small portion of the total power consumption.

There is a tradeoff between power and the frequency of operation. This is due to the relationship between the path delay and supply voltage as shown in the equation 3.2.

$$t_d \approx \frac{NC_L V_{DD}}{\left(\frac{W}{L}\right) k_n \left(V_{DD} - V_{Th}\right)^2} \tag{3.2}$$

where  $t_d$  is the path delay, N is the number of gates in the logic path, W and L are the width and length of the transistor,  $k_n$  is the process transconductance parameter and  $V_{Th}$  is the threshold voltage.

Our proposed technique takes advantage of the heavy dependence of dynamic power consumption on supply voltage. By reducing the supply voltage to a point just above the occurrence of erroneous data, one could substantially save power. The benefit of this technique can be explained from the basic power equation, which shows the stronger dependence of power over voltage than frequency over voltage.

$$P \propto V_{DD}^{2} \tag{3.3}$$

$$f \propto V_{DD}$$
 (3.4)



Figure 3.1: Power and Frequency Variation

Figure 3.1 illustrates the impact of supply voltage reduction on clock frequency and power consumption using SPICE simulation on a CMOS inverter in TSMC 0.25  $\mu m$  technology.

### 3.2. Adaptive Voltage and Frequency Scaling Technique

One can obtain higher power savings from voltage scaling at a relatively lesser reduction in performance, which is observed from Figure 3.1. This revelation is the principal motivation behind the design of the adaptive voltage and frequency scaling technique, which can be used for ultra-low-power and non-critical timing applications. It has the ability to step up/down the frequency [20] on an as needed basis, making it highly beneficial for all applications that have non-uniform processing load. By this technique, the system can be switched back and forth from high power-more throughput mode to low power-less throughput mode, on demand.

The ultra low power adaptive system is used in power critical applications such as in sensor networks with energy harvesting [21, 22]. The idea behind this approach is to let the power critical system operate at high-throughput mode, when there is a larger computation requirement and/or constant availability of power and to let it operate at low-throughput mode when there is a smaller computation requirement and/or limited availability of power.

In order for the adaptive voltage and frequency scaling technique to work error free, a new error detection/correction mechanism (scavenger) has been implemented into the design.

# 3.3. Adaptive Error Detection/Correction Techniques

Standard adaptive error detection/correction techniques can be broadly classified as either (1) Always Correct or Let Fail and Correct or (2) Spatial and/or Temporal Redundancy. System monitoring and frequency adjustment and adaptive delay control/body bias are some of the techniques used under the first classification. Triple modular redundancy (TMR), error correction codes (ECC) and delay clock latching (example: Razor [23, 24]) are some of the techniques used under the second classification.

Scavenger technique can be categorized as a temporal redundancy technique with an "always correct" approach. The two existing methods that assisted in designing the scavenger technique are (1) The Razor Technique [23, 24], (2) Latch Based Time Borrowing.

#### 3.3.1. Razor Technique

The Razor technique is a temporal redundancy technique with a "let fail and correct" approach where the data from all the critical paths are sampled twice at different time intervals. This is achieved by having an additional flip-flop with delayed clock at the end of all the critical paths. The delayed clock ensures that correct data is always latched into the second flip-flop even when system performance is lowered due to reduction in power supply. Razor technique is implemented using the circuit shown in Figure 3.2. The XOR gate acts as an error detection circuit. Therefore, every time the supply voltage goes too low for the flip-flop FF1 to latch the right value, the output of the XOR gate goes high in case there is a data transition. A high on the XOR output results in a high on the error signal. The clock is stalled for one clock cycle every time error signal goes high. During this time, select signal to the multiplexer goes high, making the output of flip-flop FF2 to be selected forcing the correct value to pass to the next stage. This helps the design to run at lowest possible power consumption (cannot be lower than the point where even FF2 misses the correct data) for little area overhead and performance overhead (one clock cycle for every error). This technique has its limitations, such as short paths similar to scavenger technique, which will be covered in the limitations section.



Figure 3.2: Razor Technique [23]

#### 3.3.2. Latch Based Time Borrowing Technique

Latch based design style has a significant performance advantage compared to an edge triggered system. Since the worst-case logic path determines the minimum clock period in an edge triggered system, even if a logic block finishes before the end of the clock period, it has to sit idle until the next clock edge. But a latch based design enables more flexible timing by allowing one stage to pass slack or to borrow time from other stages. This flexibility increases the overall performance.

Latch based time borrowing can be better understood by the simple example shown in the Figure 3.3. The earliest time that the combinational logic block A (CLB\_A) can start computing is at edge 1. It can happen if the previous logic block did not use any of its allocated time (CLK1 high phase) or if it finished by using slack from previous stages. Therefore, the maximum time that can be borrowed from the previous stage is half of a cycle ( $\frac{1}{2}T_{clk}$ ). This implies that the maximum logic cycle delay is equal to  $1.5 \times T_{clk}$ . How-

ever, it is important to note that the overall logic delay for an n-stage pipeline cannot exceed the time available ( $n \times T_{clk}$ ) [4].



Figure 3.3: Latch Based Time Borrowing [4]

### 3.4. Scavenger Technique

As mentioned in the previous section, the idea employed in the Razor technique [23, 24] is to decrease the supply voltage close to the point of failure, where a small amount of error can be detected and corrected. Our circuit solution for error detection is similar to the Razor circuit solution [23, 24], where a flip-flop with a delayed clock is used to detect errors and correct them using a temporal redundancy technique. However, a new technique is used to extract correct data with zero performance overhead.

We call this new approach the *Scavenger technique*, where its basic principle is to collect unused clock duration from logic paths adjacent to critical paths and utilize it to successfully complete critical path computations during low power operations. Unlike latched-based time borrowing techniques [4, 25, 26], we use a delayed- and/or early-

clocked flip-flop to send the data earlier or collect the data later than the expected time for guaranteed data recovery.

Figure 3.4 shows the error detection circuit, where the guaranteed correct data is always available after delayed clock. In the case where the supply voltage is decreased to the point where one clock period is not sufficient to complete the critical path computation, flip-flop FF1 cannot capture the correct data. However, flip-flop FF2 will always capture correct data as it has more than one clock period (clock period + delay) to capture the data. This is due to the delayed clock given to it. As a result, the correct value is always given to the next stage and the reduced computation time (clock period - delay) for the stage after the critical path is compensated by the idle time available in that state.



Figure 3.4: Scavenger Technique

## 3.5. Benefit of Scavenger Technique

The Razor technique presented in [23, 24] is a preliminary development for ASIC type designs, where unfortunately every time an error occurs the clock is stalled for one cycle and the entire computation is repeated. Moreover, it is very difficult to implement such an approach in reconfigurable fabrics of FPGAs. We utilize a new concept to improve the

performance and make it possible to implement it in future FPGA fabrics. Our proposed technique provides a better solution by eliminating the need to stall the design.

As shown in Figure 3.4, in our modified error detection/correction circuit, the output will always be the corrected logic even if a mismatch occurs. This approach eliminates the need to stop the system clock for re-computation, thereby avoiding huge amounts of logic in the clock circuitry to incorporate this need based stalling. It is a significant design advantage as clock circuitry itself has very critical design constraints.

Scavenger technique can also be used as an SEU detection circuit. As shown in Figure 3.5, the XOR gate should be sized to detect pulses with pulse strength in the range of typical SEUs and the delayed clock should be phased shifted from the main clock by an amount larger than the typical pulse widths generated by SEUs. If these two criteria are met scavenger technique can be used as an SEU detection circuit.



Figure 3.5: Scavenger for SEU Detection

### 3.6. Limitations

Because of the delayed clock used in our circuit implementation, the overall clock duration for the critical path increases by decreasing the available clock period for the stage following the modified path. If the logic stage following the critical path is also a critical path, then applying scavenger technique on the former critical path could impact the maximum clock frequency. Generally, it is very unlikely for a critical path to be followed by another critical path. However, in such cases, an early clock similar to the delayed clock can be used on the initial critical path as shown in the Figure 3.6. By this approach, it can steal a portion of the clock period from the previous stage rather than from the later stage.



Figure 3.6: Scavenge from Previous Stage

In cases where more than two critical paths occur consecutively, this technique can be applied only on the first and the last critical paths. The intermediate critical paths need to be broken down into two paths, which will result in lowering the throughput of the design. To avoid such rare situations, an initial constraint restricting not more than two critical paths to occur consecutively should be specified in the synthesis tool.

Short paths should also be checked for all the critical paths to avoid faulty error detection. Short paths are those paths, which cause a major part of the logic path to be bypassed for certain input patterns by changing the data much earlier than anticipated. As a result, the hold time margins of the output data prior to the input data will reduce due to a short path. This reduction in hold time margins, depending on the amount of delay provided to the delayed clock, might cause flip-flop FF2 to capture erroneous data. Padding inverter chains to the short paths as shown in Figure 3.7 resolves the problem by increasing the hold time margins. The overhead resulting from padding inverter chains is too low to lessen the advantage of the scavenger technique, as the padding needs to be applied only on the short paths occurring in the critical paths.



Figure 3.7: Solution to Short Path Issues in Critical Paths

Power optimization techniques; in general, reduces the power consumption of a system at the cost of reduction in robustness of the system. As shown in the Figure 3.8, the number of critical paths increases with power optimization. However, additional time for computation is available to all the critical paths on which scavenger technique is applied.

This additional time is due to the use of delayed/early clock in the scavenger technique. During this additional time the "risk" signal is activated (goes high), which can be used as a status signal to make sure that the sensitivity of the system is kept in check to avoid any malfunctioning.



Figure 3.8: Trade-off

The critical path delay is dynamic to a certain extent, based on the best case and the worst case input pattern. This can be explained through the example shown in Figure 3.9. For both the cases in the example the output of the NAND gate changes from '0' to '1'. But in case 1 only one PMOS is in ON state which is driving the output high. Moreover one NMOS is in ON state further lowering the drive strength of the gate. Whereas in case 2 both the driving PMOS are in ON state and both the NMOS are in OFF state (less leakage due to stack effect) thereby taking much less time to drive the output from '0' to '1' than case 1. The difference in path delays between the two cases defines the minimum timing margin between the main clock and delayed/early clock. This constraint ensures error free operation even when a worst case input follows the best case input.



Figure 3.9: Timing Variations controlled by Input Patterns

At the system-level implementation, our adaptive circuit solution, for very low power operation (when implemented for synchronous systems) needs a handshaking protocol overhead in order for the non-adaptive system to properly interface with our adaptive system.

## 3.7. Implementation

We implemented our design technique in an FPGA (Spartan3E). The test circuit in Figure 3.10 is used to verify the functioning of the scavenger design. The combinational logic used to increase the risk for erroneous operation is a XORed output of a 34X35 bit multiplier. The XOR length is chosen in such a way that the combinational depth is as close as possible to the critical path. The delayed clock used in the test circuit is 180° out of phase from the main clock.



Figure 3.10: Test Circuit

The algorithm used to implement the adaptive voltage and frequency scaling system is as shown in Figure 3.11. The clock period is varied over a 20ns-60ns range with a step size of 20ns.



Figure 3.11: Adaptive System Algorithm

### 3.8. Experimental Confirmation

Figure 3.12 shows the increase in risk rate as the voltage is reduced. This evaluated risk rate can be used as a measure to avoid operating at voltages too low for even the delayed flip-flop to capture the correct data.



Figure 3.12: Risk Rate

Figure 3.13 and Figure 3.14 show the oscilloscope traces of a physical circuit. The waveforms are the results of the test circuit operated at 3.3V and 2.64V respectively. These results show that there is more risk for lower voltage operation.



Figure 3.13: Results at 3.3V



Figure 3.14: Results at 2.64V

Figure 3.15 shows the results of the same test circuit operated at 3.3V, with adaptive algorithm incorporated. These results indicate that there is relatively more risk even at 3.3V, which is due to more logic added as a result of the adaptive algorithm and also due to change in input pattern causing a change in critical path timing.

The waveform marked as "Out1" represents the output signal from FF1 (flip-flop with main clock). The waveform marked "Out2" represents the output signal from FF2 (flip-flop with delayed clock). The waveform marked as "Frequency" represents the frequency of operation (20ns when it is low and 40ns when it is high) of the system.



Figure 3.15: Results for Adaptive System



Figure 3.16: Test Setup

Figure 3.16 shows the complete test setup used for the implementation of this new design technique. All the tests were done using Xilinx ISE 10.1, Digilent Spartan3E boards and Tektronix MSO 4054 Oscilloscope.

### 3.9. Conclusions

A new approach of designing a very robust low power system with inherent PVT (Process, Voltage and Temperature) variation protection is proposed. The proposed scavenger technique has very small area overhead and is easy to incorporate into existing designs. As long as there are no more than two consecutive critical paths, this technique can reduce power with no throughput loss.

The adaptive voltage and frequency scaling solution proposed in this paper is useful for very low energy consumption systems, such as energy harvesting applications, with non-uniform computation load.

## Chapter 4

## Firebird Flip-Flop

#### 4.1. Introduction

Aggressive technology scaling is quickly exhausting the maximum available speed of operation and the acceptable energy consumption. We propose a new flip flop design which gives double the data rate for the same clock speed by using both clock edges. This new dual data flip flop, unlike the existing ones, uses almost the same number of gates for double the data rate and has robustness in par with standard D flip-flop. Due to its low activity factor compared to the existing dual data rate flip-flops [27, 28], it also has considerably lower power consumption.

Another consequence of technology scaling is radiation induced soft errors in flip-flops, which have become a major challenge for robust VLSI designs. Based on our new dual data rate flip-flop design, a SEU (Single Event Upset) hardened flip-flop, "Firebird", is also proposed. Unlike the existing rad-hard flip-flops [29, 30], the proposed Firebird design will latch data on both the clock edges. Extended Firebird design with improved SEU hardening (never latch faulty data due to SEUs occurring on the flip-flop) is proposed for more critical applications.

As already mentioned, one of the major concerns in modern VLSI circuit design is power consumption. Our proposed technique takes the basic power equation into consideration,

$$P = \frac{1}{2}C_L V_{DD}^2 f \alpha + V_{DD} I_{leak}$$
 (4.1)

It has been previously reported that the clock distribution networks alone consume a significant portion of the chip power. For instance, Bailey et. al showed that the clock distribution in a 600MHz Alpha microprocessor consumes about 50% of the total chip power dissipation [31]. By designing a flip-flop which can sample data on both the clock edges, the requirement for clock switching in a circuit is reduced by half. In other words the dual data rate flip-flop helps to reduce the clock frequency by 50% thereby saving extra clock distribution energy. This will significantly reduce the total chip power dissipation in high performance digital systems.

The second major concern in modern VLSI circuit design is robustness. SEUs pose a major challenge for an error free circuit operation. SEU is defined is by NASA as radiation-induced errors in microelectronic circuits caused when charged particles (usually from the radiation belts or from cosmic rays) lose energy by ionizing the medium through which they pass, leaving behind a wake of electron-hole pairs. The charge created from these particle strikes produces a voltage spike that can cause an unwanted pulse in logic circuits or can cause a memory element to change state from a '1' to a '0' and vice versa. In the later part of this chapter, we propose a dual data rate flip-flop circuit that can mitigate the SEU effects.

### 4.2. Background

The two existing designs that serve as the basis for the proposed dual data rate flip-flop and Firebird flip-flop are: (1) Standard D Flip-Flop and (2) Error-Correcting Scanout flip-flop Design.

## 4.2.1. D Flip-Flop

A standard D flip-flop is designed with a back to back high level triggered latch and a low level triggered latch connection. This structure can be seen in Figure 4.1. The D flip-flop shown acts like a negative edge triggered flip-flop. The functionality can be better understood from the waveforms shown in Figure 4.2.



Figure 4.1: D Flip-Flop



Figure 4.2: D Flip-Flop Output Waveform

#### 4.2.2. Existing Rad-hard Flip-Flop Design

A simplified version deduced from the error-correction scanout flip-flop [30] is shown in Figure 4.3. The latches LA and LB together act like a single positive edge triggered D flip-flop. Similarly PH1 and PH2 latches act like another single positive edge triggered D flip-flop. Therefore, each flip-flop is replaced with two flip-flops and the outputs from these two flip-flops are given to a C-element. The inherent property of the C-element to keep the output unaltered in case there is a mismatch in the two inputs (for example due to a particle strike on one of the four latches), the entire structure acts like a SEU hardened flip-flop. The functioning of this flip-flop can be better understood from the waveforms shown in Figure 4.4. As can be observed from the waveforms, one of the important drawbacks of this design is that it cannot correct the data if there is an SEU occurrence at the time of the active edge of the clock (such as in SEU\_2), the data transition will be completely missed on that respective output. The functioning of the C-element is explained in detailed in the next section. A weak keeper structure is used at the output of the C-element to fight the leakage current in the C-element when both the pull-up and the pull-down paths in the C-element are shut off. Since a weak keeper structure is used, a particle strike on the keeper structure will not affect the output.



Figure 4.3: Simplified Error-Correction Scanout Flip-Flop



Figure 4.4: Error-Correction Scanout Flip-Flop Output Waveform

# 4.3. Existing Dual Data Rate Flip-Flop Designs [27]

The existing low power dual-edge triggered flip-flops presented here use the technique similar to edge-triggered latches which create a narrow sampling window to over-

come race problem. Therefore double-edge triggered flip-flops latch the data on both rising and falling edge of the clock using a narrow pulse generated on both the clock edges. Thus, the clock frequency is reduced by half while the data throughput is preserved. Figure 4.5 shows two proposed static pulsed flip-flops structures and the pulse generator circuit. The pulse generator consists of four inverters which generate delayed and inverted clock signals, CLK2 and CLK3, along with two NMOS transistors for pulse generation as shown in the Figure 4.5.

In Figure 4.5(a) the PULS signal applied to the NMOS transistor MN1 creates a narrow transparency window in which data inputs can affect the state of static nodes SB and S through NMOS transistors MN2 and MN3. The PMOS transistor MP5 (MP4) pulls S (SB) node up to V<sub>dd</sub>. In Figure 4.5(b) the pass transistors *MN2* and *MN3* contribute in data capturing during the pulse window with *PULS* signal. Since data inputs have direct access to static nodes *SB* and *S* through *MN2* and *MN3*, this structure shows smaller delay than the former one. For distinction of these two dual-edge triggered static pulsed flipflops the first flip-flop was named as DESPFF and the second one as DSPFF. Two weak NMOS transistors *MN6* and *MN7* are used such that the nodes *SB* and *S* will not be floating at anytime which could result in short-circuit current on the following inverter or even functional failure.



Figure 4.5: (a) DESPFF (b) DSPFF (c) PULS Generator [27]

## 4.4. Dual Data Rate Flip-Flop Design

The present idea works by splitting the sequential two latch structure used in a conventional D flip-flop (explained in previous section), and organizing them in parallel style. Then the outputs of these two latches are given to the two inputs of the standard C-element [29], as shown in Figure 4.6 which by virtue of its operation holds the present data until both the inputs given to it become equal in value. Once both inputs given to the C-element become equal it acts like a simple inverter. The basic operation of the C-element can be understood from its truth table shown in Table 4.1.



Figure 4.6: Dual Data Rate Flip-Flop

**Table 4.1: C-Element Truth Table** 

| Q1 | Q2 | D_Out          |
|----|----|----------------|
| 0  | 0  | 1              |
| 1  | 1  | 0              |
| 0  | 1  | Previous Value |
| 1  | 0  | Previous Value |

Data input is given in parallel to both the high level and low level triggered latches. Thus, allowing this new flip-flop to capture any transitions in the input data during the entire clock period. Since the C-element stores the data until both the latch outputs become equal, this new design incorporates both the storage and edge triggering attributes of a flip-flop. The dual edge detection ability of this design can be understood from the waveforms shown in Figure 4.7.



Figure 4.7: Dual Data Rate Flip-Flop Output Waveform

A weak keeper structure is used at the output of the dual data rate flip-flop. This keeper block has a twofold functionality. Firstly, it keeps the output state of the C-element even if there is an excessive leakage current passing through the C-element during the period when both the pull-up and pull-down paths of the C-element are shut off. Secondly, the keeper also takes care of any charge sharing issues that might occur in case the next stage is not isolated by static logic. Again, due to the inherent property of the design to make the C-element to float in order to store data may make charge sharing issue critical.

### 4.5. Firebird

The technology scaling is occurring in tandem with our increasing desire to explore space extensively. As the amount of charge needed to flip a bit is reducing with technology scaling, radiation effects like SEUs are gaining hold even at terrestrial level. Triple modular redundancy (TMR) has been one of the popular techniques to reduce the impact

of SEUs on the design. However, the area and power overhead involved in the TMR technique is extremely high, making complex designs unreasonably large. This dilemma has been the main motivation behind designing Firebird flip-flop which is completely immune to SEUs.

The Firebird design is shown in Figure 4.8. This design has four latches in parallel similar to the dual data rate flip-flop having two latches. Among the four latches, two are high-level triggered latches and the remaining two are low-level triggered latches. All four outputs from these latches are given to an extended C-element similar to the dual data rate flip-flop, where the outputs of the two latches are given to a C-element. The functioning of an extended C-element is very similar to that of a C-element, as in Table 4.2. Moreover, extended C-element is immune to SEUs even during the float mode as there are always two transistors ON/OFF, unlike a normal C-element where only one transistor is ON/OFF in the pull-up or pull-down path. The immunity of this Firebird flip-flop to SEUs can be better understood from the waveforms shown in Figure 4.9. The waveforms in Figure 4.9 show that the new flip-flop also latches faulty data due to SEUs occurring during the holding clock edges, on the respective latches (such as SEU\_4). Apart from that the output data may be slightly affected when there is a SEU occurrence at the beginning of the active level of the respective latch (such as in SEU\_3) right when the extended C-element is transitioning from float mode to inverter mode. Similar affects are also present in Error-Correction Scanout Flip-Flop. The waveforms also indicate that this scenario will not always result in affecting the output as can be observed in case of SEU\_5. Therefore, only when SEU causes a bit inversion which would cause an additional delay in the transition of the extended C-element from float mode to inverter mode, the output will be delayed as well.



Figure 4.8: Firebird: A SEU Hardened Dual Data Flip Flop

**Table 4.2: Extended C-Element Truth Table** 

| Q1 | Q2       | Q3             | Q4 | D_Out |
|----|----------|----------------|----|-------|
| 0  | 0        | 0              | 0  | 1     |
| 1  | 1        | 1              | 1  | 0     |
|    | All Othe | Previous Value |    |       |



Figure 4.9: Firebird Flip-Flop Output Waveform

The Firebird's immunity to SEUs can be improved with additional area overhead. This Extended Firebird with improved SEU hardened is shown in Figure 4.10. Its operation can be better understood using waveforms shown in Figure 4.11. Extended Firebird design never latches faulty data due to SEUs which might even occur at the time of the active clock edges (such as SEU\_4). This advantage is due to the fact that Q1 and Q3 stabilize into C1 and remains unaffected by SEU on Q1 and Q3 during clock edges. Similarly Q2 and Q4 stabilize into C2. Since these stabilized values are given to extended C-element the final output never completely misses the data due to SEUs. As Q1 and Q3 disagree with each other only during a radiation hit, under the assumption of Single Event Upset normal C-element and weak keeper structure would be sufficient to ensure SEU

hardening at C1. Similarly normal C-element and weak keeper structure is sufficient for SEU hardening of C2.



Figure 4.10: Extended Firebird



Figure 4.11: Extended Firebird Output Waveform

The extended C-element is made to float in order to store data similar to the dual data rate flip-flop design. Therefore, if the extended C-element keeper structure is made out of standard inverters and a particle strike happens on the keeper structure during the period when the extended C-element is afloat, it will drive the output to a faulty value even though the keeper structure is a weak driver. In order to avoid such situations the keeper block at the output of the Firebird is made up of back to back C-elements instead of standard inverter blocks. As the C-element by virtue of its functionality is immune to SEUs, the final keeper structure will never drive a faulty value at the output.

The main advantage of Firebird over the existing rad-hard flip-flop [29, 30] is that it will also latch data on both the edges of the clock. The existing rad-hard flip-flop cannot

correct any SEUs occurring during the active edge of the clock. However, due to its parallel latch format, Firebird can correct any SEUs occurring even during both the active edges of the clock. Secondly, it is also a dual edged flip-flop for the same amount of transistors, thereby making it highly power efficient in comparison to the existing rad-hard flip-flop.

#### 4.6. Simulations

The SPICE simulations on the dual data rate flip-flop were performed using the Berkeley Predictive Transistor Model (BPTM) in a 0.18µm process technology node [32] with a supply voltage of 1.8V. The designs were optimized for a clock frequency of 400MHz and data switching activity equal to 0.5. A load capacitance of 100fF was used for the output. Transistor sizing was optimized using an iterative procedure with the objective of achieving high speed and low power (minimum Power-Delay Product (PDP)). These criteria were picked so that the proposed dual data flip flop (DDFF) can be compared with the existing dual data rate flip-flops, Dual-Edge Triggered Static Pulsed Flip-flops (DESPFF and DSPFF) [27].

Table 4.3 summarizes the numerical results for the two previous dual data rate flip-flops along with the proposed dual data rate flip-flop (DDFF). The proposed dual data rate flip-flop shows lower PDP as well as lower power consumption in comparison to the previously proposed dual data flip-flops.

Table 4.3: Results

|            | *FF   | Power | PDP   | Norm. | Device |
|------------|-------|-------|-------|-------|--------|
|            | Delay | (µW)  | (fJ)  | PDP   | Count  |
| DESPFF[27] | 184.7 | 116.0 | 21.42 | 0.968 | 24     |
| DSPFF[27]  | 180.5 | 122.6 | 22.13 | 1.000 | 25     |
| DDFF       | 203.6 | 102   | 20.77 | 0.939 | 28     |

<sup>\*</sup> FF Delay for DESPFF and DSPFF is D-Q

The SPICE simulations on existing rad-hard fli-flop, Firebird and Extended Firebird are also done using BPTM0.18µm process technology models with a supply voltage of 1.8V. The simulations were run at 100MHz. A load capacitance of 50fF was used for the output. The SEUs are simulated using a current source of 500µA peak current and 700ps width. The SEUs immunity testing is done by giving this current source at different time intervals on drains of various transistors in the flip-flop. Data with switching activity of 200% (data switching on both edges of the clock) is given to Firebird and Extended Firebird and data with 100% switching activity is given to existing rad-hard flip-flop.

Figure 4.12 shows the SPICE results for the proposed dual data rate flip-flop. The results show that the data is successfully captured on both clock edges. Figure 4.13 and Figure 4.14 show the SPICE results for the existing rad-hard and Firebird flip-flop. The results in Figure 4.13 and Figure 4.14 show that both the existing rad-hard flip-flop and Firebird miss data when SEUs occurring during the active edge of the clock, but Firebird captures data on both the clock edges. Thus Firebird is more power efficient than the existing rad-hard flip-flop. However, Extended Firebird corrects SEUs occurring even dur-

<sup>\*</sup> FF Delay for DDFF is C-Q

ing both active clock edges and also successfully captures data on both the clock edges as shown in Figure 4.15.



Figure 4.12: Dual Data Rate Flip-Flop Output



Figure 4.13: Existing Rad-hard Flip-Flop Output



Figure 4.14: Firebird Flip-Flop Output



Figure 4.15: Extended Firebird Output

The waveform marked as "D\_in" represents the data input given to the respective flip-flop design. The waveform marked as "D\_out" represents the output signal from the respective flip-flop design. The waveforms marked as "CLK" represents the clock given to respective flip-flop design. The waveform marked "Simulated SEU" represents the artificial particle strikes in the form of current spikes given to the respective flip-flop design for SEU immunity testing.

## 4.7. Conclusions

A new and robust dual data rate flip-flop with low power consumption is proposed that uses almost the same number of gates for double the data rate in comparison to a standard D flip-flop making it very attractive for future technologies.

A unique SEU-hardened dual data rate flip-flop, Firebird, is proposed. Unlike the existing rad-hard flip-flops, our proposed flip flop is dual edged. Moreover an Extended Firebird flip-flop with improved SEU immunity is also proposed.

# Chapter 5

### Conclusions

Over the past few years, designers have believed that integrated circuit industry was going to saturate soon. They have been proven wrong by new technology advancements. Though with the present rate of device scaling, it is bold to state that technology scaling will not saturate, future technology advancements are inevitable.

One of the most critical bottlenecks for technology advancements is power consumption. With current operation speeds and on-chip densities, power consumption has drastically increased. Designers use various methods to reduce both dynamic and leakage power. These techniques achieve dynamic power reduction by reducing switching activity, load capacitance, supply voltage or frequency using techniques, such as logical restructuring, input reordering, clock gating, device sizing, and dynamic voltage control. Static power reduction is achieved by using techniques, such as stacked transistors, adaptive body biasing, sleep transistors, and vector manipulations. Though designers were successful at reducing power consumption, the amount of complexity, area and performance overhead involved in these techniques is slowly reaching unacceptable levels.

Another critical design challenge faced because of technology scaling is robustness of the designs. Single event upsets have plagued electronic systems for a long time. SEUs used to be a major concern for space applications, however with technology scaling they started influencing electronics at the terrestrial level too. Though unprotected memories are more sensitive to SEUs, it is predicted that the soft error rate (SER) of combinational logic may dominate the SER of unprotected memory by the year 2011 [2].

The techniques used to reduce SEU sensitivity for combinational logic are increasing the device size of sensitive gates [17], duplicating the sensitive gates [33], and gate cloning [34]. These techniques involving selective hardening of gates to soft error rate (SER) reduction result in huge area, power and/or delay overhead. With technology advancements the probability of sensitive gates is increasing, worsening the amount of overhead incurred. Therefore, such techniques may not be suitable for future circuit designs.

A design engineer's dream is to have simple, efficient and flexible solutions to power and robustness issues with lowest possible area, performance and delay overhead. It would be a perfect complement if such a solution can be incorporated into the existing and future design with minimal or no redesign effort. We believe that we have satisfied these desires of the designers by proposing novel solutions like "Scavenger Technique" and "Firebird" flip-flop in this thesis.

Scavenger technique proposed in this thesis is a new approach which uses temporal redundancy on the outputs of the critical paths of a design. The basic principle is to collect unused clock duration from logic paths adjacent to critical paths and utilize it to successfully complete critical path computations during low power operations. Scavenger's "always correct" approach eliminates the additional area, power and performance overhead required for error correction. Due to the availability of status signal, "risk", this technique offers maximum flexibility with respect to performance, power and robustness. Simplicity of this method (resulting in very low area overhead) would make it a very good choice for future low power designs. Based on this Scavenger technique, we also proposed an adaptive voltage and frequency scaling technique suitable for very low power applications with non-uniform processing loads. By this technique, the system can be switched

back and forth from high power-more throughput mode to low power-less throughput mode, on demand.

We also proposed a new dual data rate flip-flop which takes up almost the same number of transistors as a standard D flip-flop. Thus this new dual data rate flip-flop gives double the processing capabilities for the same chip area. This new flip-flop is designed by organizing the sequentially placed latches (of a D flip-flop) in a parallel structure and giving the outputs of these two latches to the two inputs of the C-element. The simplicity and robustness of this design makes it a perfect choice for future digital designs. Based on this dual data rate flip-flop design, a unique SEU immune dual data rate flip-flop (Firebird) is proposed. The new design will never latch faulty data due to SEUs and will also latch data on both of the edges of the clock making it highly suitable for SEU sensitive low power applications. We expect that techniques like these will find more prominence with future technology advancements.

## References

- [1] Worm F., Thiran P., de Micheli G. and Ienne P., "Self-Calibrating Networks-On-Chip," *IEEE International Symposium on Circuits and Systems*, Vol. 3, May 2005, pp. 2361-2364.
- [2] Shivakumar, P. et al., "Modeling the effect of technology trend on the soft error rate of combinational logic," *International. Conference on Dependable Systems and Networks*, 2002, pp. 389-398.
- [3] Daniel Mlynek and Yusuf Leblebici, Design of VLSI Systems, web based course, http://lsiwww.epfl.ch/LSI2001/teaching/webcourse/ch07/ch07.html#7.2
- [4] J. Rabaey, A. Chandrakasan, and B. Nikolic, Digital Integrated Circuits, Prentice Hall Publishers, 2nd Edition, 2003.
- [5] Exponential increases in the number of transistors integrated into Intel's chips, http://www.intel.com/cd/corporate/techtrends/emea/eng/209729.htm
- [6] A. Chandrakasan, W. Bowhill, and F. Fox, "Design of High-Performance Microprocessor Circuits", IEEE Press, 2000.
- [7] R. Baumann, "Soft Errors in Advanced Semiconductor Devices-Part I: The Three Radiation Sources," *IEEE Transactions on Device and Materials Reliability*, Vol 1, NO. 1, March 2001, pp. 17-22.
- [8] Quick Logic Corporation, "Single Event Upsets in FPGAs," White Paper.
- [9] Paul E. Dodd, Lloyd W. Massengill, "Basic mechanism and modeling of single-event upset in digital microelectronics," *IEEE Transactions on Nuclear Science*, vol. 50, no. 3, June 2003.

- [10] Aahlad Srinivasa M., "Single Event Upset Hardened CMOS Combinational Logic and Clock Buffer Design," Master's Thesis, University of New Mexico, December 2008.
- [11] E. Takeda, K. Takeuchi, D. Hisamoto, T. Toyabe, K. Ohshima, and K. Itoh, "A cross section of α-particle-induced soft-error phenomena in VLSIs," *IEEE Trans. Electron. Devices*, Vol. 36, November 1989, pp. 2567–2575.
- [12] Tanay Karnik, Peter Hazucha, and Jagdish Patel "Characterization of soft errors caused by single event upsets in CMOS processes," *IEEE Transactions on Dependable and Secure Computing*, vol. 1, no. 2, April-June 2004.
- [13] Chandrakasan, A.P., Sheng, S., and Brodersen, R.W., "Low-power CMOS digital design," IEEE Journal of Solid-State Circuits, Vol. 27, Issue 4, April 1992, pp. 473-484.
- [14] Y. Ye, S. Borkar, and V. De, "A New Technique for Standby Leakage Reduction in High-Performance Circuits," *1998 Symposium on VLSI Circuits*, June 1998, pp. 40-41.
- [15] Payman Zarkesh-Ha, Advanced VLSI Design, ECE595 course, University of New Mexico, http://lsiwww.epfl.ch/LSI2001/teaching/webcourse/ch07/ch07.html#7.2
- [16] Chong Zhao, Xiaoliang Bai, and Sujit Dey, "A scalable soft spot analysis methodology for compound noise effects in Nano-meter Circuits," *Design Automation Conference (DAC)*, June 2004.
- [17] André, K. Nieuwland, Samir Jasarevic, and Goran Jerin, "Combinational logic soft error analysis and protection," *12th IEEE International On-line Testing Symposium*, 2006.

- [18] Tim Tuan, Arif Rahman, Satyaki Das, Steve Trimberger, and Sean Kao, "A 90-nm Low-Power FPGA for Battery-Powered Applications," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, Vol. 26, Issue 2, February 2007, pp. 296-300.
- [19] Jason H. Anderson, and Farid N. Najm, "Active Leakage Power Optimization for FPGAs," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 25, Issue 3, March 2006, pp. 423-437.
- [20] A.K. Uht, "Going Beyond Worst-Case Specs with TEAtime," *IEEE Computer*, vol. 37, Issue 3, March 2004, pp. 51-56.
- [21] Vijay Raghunathan, Kansal A., Hsu J., Friedman J., Mani Srivastava, "Design considerations for solar energy harvesting wireless embedded systems," 4th International Symposium Information Processing in Sensor Networks, April 2005, pp. 457-462.
- [22] S. Behrens and J. Davidson, "Energy Harvesting for Sensor Networks," 17th

  IEEE International Symposium Applications of Ferroelectrics, February 2008, pp.

  1-2.
- [23] T. Austin, D. Blaauw, T. Mudge, and K. Flautner, "Making Typical Silicon Matter with Razor," *IEEE Computer Magazine*, vol. 27, Issue 3, March 2004, pp. 57-65.
- [24] D. Blaauw, et al., "Razor II: In Situ Error Detection and Correction for PVT and SER Tolerance," *IEEE ISSCC Dig. Tech. Papers*, February 2008, pp. 400-401.
- [25] Quan Yuan, Hai-gang Yang, Fang-yuan Dong, and Tao Yin, ""Time Borrowing" Technique for Design of Low-Power High-Speed Multi-Modulus Prescaler in

- Frequency Synthesizer," *IEEE International Symposium on Circuits and Systems*, May 2008, pp. 1004-1007.
- [26] Shi-Zheng Eric Lin, Chieh Changfan, Yu-Chin Hsu, and Fur-Shing Tsai, "Optimal Time Borrowing Analysis and Timing Budgeting Optimization for Latch-Based Designs," *ACM Transactions on Design Automation of Electronic Systems*, vol. 7, Issue 1, January 2002, pp. 217-230.
- [27] Aliakbar Ghadiri, Hamid Mahmoodi, "Dual-Edge Triggered Static Pulsed Flip-Flops," *IEEE 18th International Conference on VLSI Design held jointly with 4th International Conference on Embedded Systems Design*, January 2005, pp. 846-849.
- [28] Nam Duc Nguyen, "Dual Data Rate Flip-Flop," US Patent # 7,242,235, July 10, 2007.
- [29] S. Mitra, N. Seifert, M. Zhang, Q. Shi and K.S. Kim, "Robust System Design with Built-In Soft Error Resilience," *IEEE Computer*, Vol. 38, No. 2, February. 2005, pp.43-52.
- [30] S. Mitra, M. Zhang, TM Mak, N. Seifert, Victor Zia, K. S. Kim, "Logic Soft Errors: A Major Barrier to Robust Platform Design," *IEEE International Test Conference*, November 2005, pp. 696.
- [31] D. W. Bailey and B. J. Benschneider, "Clocking Design and Analysis for a 600-MHz Alpha Microprocessor." *Journal of Solid-State Circuits*, pp. 1627–1633, Nov. 1998.
- [32] Berkeley Predictive Technology Model, http://www-.eas.asu.edu/~ptm/

- [33] Karthik Mohanram, and Nur, A. Touba, "Partial error masking to reduce soft error failure rate in logic circuits," *18th IEEE International. Symposium. of Defect and Fault Tolerance in VLSI Systems*, Nov. 2003, pp. 433-440.
- [34] Chong Zhao, Sujit Dey, "Improving transient error tolerance of digital VLSI circuits using robustness compiler (ROCO)," 7th International Symposium on Quality Electronic Design, 2006.