#### Introduction

## Introduction to Ic technology :-

Gi. Venkata Hanuman

It is an electronic assembly maintain such away that all components of the circuit are fabricated on a single container called chip (or) IC.

\* The technique to Increase no. of devices per chip is called Level of Integration.

It mainly consists of

- 1. SS I → Small Scale Integration
- 2. MSI -> medium scale Integration
- 3. LSI -> Large scale Integration
- 4. VLSI -> Very large scale Integration
- 5. ULSI -> Ultra large scale Integration

| -A) •                 | <b>51.</b> 1000 -                       |        |      | 9                                     |
|-----------------------|-----------------------------------------|--------|------|---------------------------------------|
| Type of Ic technology |                                         | Device | Year | Applications                          |
|                       |                                         |        | 1960 | Grates, OP-AMPS                       |
| 7                     | 581 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 | 1-100  |      |                                       |
|                       | msi ····                                | ioc-IK | 1965 | Registers, filters                    |
|                       | MIST                                    | IK-loK | 1970 | AlD , DIA , micro Processors          |
| ?                     | LSI                                     |        |      | · · · · · · · · · · · · · · · · · · · |
|                       | VLSI                                    | >lok   | 1975 | memories, DSP kits                    |
|                       | VLSI                                    |        |      |                                       |

Advantages of Ic's :-

\* Low Power consumption

\* Faisy replacement

- \* High speed of operation
- \* small in size
- \* Reduce cost
- \* High reliability

moore states that the transistor count double for ever





metal oxide Semi conductor (mos transistor):-

Principle:- A voltage is applied to the gate terminal, it controls the current in a conducting channel between source and drain.

Ex:-Innmos, the majority carriers are electrons.

Atve voltage applied on gate write substrate it enhances no of electrons in the channel and Increases the conductivity of channel.

For Gate Voltage is less than threshold Voltage value, the channel is cut-off, thus causing a very low drain-to-source current.

Threshold voltage (VE):- The voltage at which a mos device begins to conduct (turn on).

# key representations





nimas diramsistor:

1. Nmos enhancement mode transistor :-

It consists of moderately doted. P-type silicon substrate into which two heavily doped not regions, namely source and drain are diffused.



- Between Source and drain there is a narrow region of P-type

  Substrate called channel which is covered at the top with Sion (oxide) layer

  over the Sion layer, polysilican gate is deposited over the regions

  between drain and source.
- hence it is not conducting.

field is established between Jake and substrate.

conditions are reveired

condition -1: Yes = 0

Inthis condition, a channel between drain and source is established but no current is flowing between source and drain he 45=0



condition -a :- Ys < ys - Yt

when is is applied between drain and source, current starts floring through channel.

\* The resistance of channel varies along its length this results in IR drop along the channel. The Voltage is maximum at source end.



The effective gate voltage  $y = ys - v_b$  as no current flows when 95 < Ve As long as (Ygs-VE) > Yds, the channel will be Inverted at drain end.

In this condition, device is in non-saturated region of operation.

If is is Increased to greater than (Vgs-Vt), in such condition IR drop = Vgs-Ve takes place over less than whole length of channel. so, at drain terminal Insufficient electric field available to give rise to an Inversion layer create a channel. The channel is Pinchoff.



diffusion current completes the path from \* In this condition,

\* The device operated in Saturation region. The device behaves a con source for Increase of 4s above 4s = 9s-Vt.

when transistor operated in depletion made, even when yes

- Applying we voltage to the gate, current flows in a n-type transist and conduction takes place.
- \* similarly, the voltage to the gate, current flows in a p-type transistor and conduction takes place.



\* The channel width can be controlled by applying suitable -ve voltage to

\* variation in gate voltage allow control of current between source & drain.



\* applying -ve voltage between gate and source will give rise to formation of P-type channel between source and drain.

of drain is made we wireto source current can flow through chann

\* Here the current is carried by holes.

The hole mobility (up) compared to electron mobility (un) is 2.5 times less.

commonly used symbols for Nmos and Pmos:

nmos Enhancement

nmos pepletion

Pmos Enhancement

Nmos Fabrication Process :-

Step1: wafer pre paration:

Let us consider a single peace of silicon material and it is moderately doped with P-substrate. The water diameter is 75 to 150mm and thick ness is oumm. The p-substrate is boran.



Steps: - oxidation: To Protect the surface, a layer of silicon dioxide (sio2) is grown over the surface. The thickness of sion is typically 14m.

This layer act as a barrier to dopants during processing.

Step 3: - photo resist :-

. 51

The surface is now covered with photo resist which is deposited



step4:- masking:- The Photo resist layer is exposed to UV light through a mask.



The mask corresponds to regions into which diffusion is to take place with transistor channels.

Step5:- Etching:- The regions are then etched to gether so that the wafer surface is exposed in window defined by mask.

BROWN TO BE TO SEE

Remaining photoresist is removed and a thin layer of sion is deposited on top of this to form the gate structure.

The Polysilican layer consists of heavily doped Polysilican deposition (CVD).



slep7: nt diffusion:

The thin oxide is removed to expose areas into which n-type Impurities are to be diffused from source and drain.

\* Diffusion is achieved by heating the wafer to high temperature and passing gas containing the desired n-type Impurity over the surface



step 8 contact cuts (holes):- Thick silicon dioxide (sio2) is grown and then masked with photo resist and then etched to expose specific areas of polysilicon gate, drain and source areas where connections are made.



Step9 :- metallization:

The metal (n.l.) is deposited over its surface to a thickness typically of 121m. This metal dayer is then masked and etched to form the reactived Interconnection pattern.



regular governor by the same

cmos Fabrication process:

cmos can be obtained by Integrating both the Nmos and

pmos devices on the same chip.

The cross can be fabricated using different processes such as

- is N-well Process
- i) P-well process
- iii) Twintub Process

cmos fabrication :. motor steps: Formation of nevel Define Nmos and Pmos areas rield and gate exidations (thirox) Form and Patilern Polysilicon pt diffusion of diffusion contact cuts Deposit and Pattern metallization overglass with cuts for bonding pads consider a P-type substrate in mile step2; oxidation The entire Surface is coated with Sion layer

Step3:- photoresist: - Now, photoresist is contined entire surface.



stepy:- masking the photo resist is exposed to uv stays through the N-well mask.



steps: Etching



step6: Removal of Photoresist:

then the entire layer of photoresist is stripped off.



Step7:- Formation of N-well:- By using diffusion process N-well is formed



pattern the Polysilleon layer which is deposited after thinoxide



step 9:- pt diffusion and masking

P-diffusion regions are diffused to form the terimals of the

pmos.



step 10 :- masking and nt diffusion:-

To form n-transistor, n-diffusion regions are used.





# a. P-well cmos fabrication;

pakir





ox idation

The embire surface is coated with sing layer



photoresist

Now the entire surface covered with a photoresist



stepy: masking: The photoresist is exposed to UV rays through the p-well





Step6 Removal of Photoresist:

The Photo resist can be removed using of hydroflouring acid. Then the entire layer of photoresist is stripped off.



slep 7: - Formation of P-well: "Busing diffusion process P-well is formed



steps:- Pattern the polysilkon layer;

The Polysilicon layer which is deposited after thin oxide.



stepig:- pt diffusion and masking: To form p-transister, p-diffusion regio





step ID: - masking and midiffusion:-

To form n-transistor, n-diffusion regions are used.

I was a series of the second of the second



Stepn: \_ metallization





Twin tub process : The day many

- A twin-tub Process is a logical extension of P-well and n-well approaches.
- The process starts with a substrate of high resistivity n-type material and then both p-well and n-well regions are created.
- with this process it is possible to preserve the performance of n-transistors without compromising p-transistors

control to the tipes of an indicate the theorem later

as the twin-tub process allows separate optimization of pand in transistors.

An Inverter arrangement of twin-tub as shown in fig.



Fig: Twin-tub Inverter structure.

The twin-tub cmos technology provides the basic for Separate optimization of p-type and n-type transistor making it possible for threshold voltage, body effect and gain associated with n and p-devices to be Independently optimized.

comparisions between cmos and Bipolar technologies

## cmos technology

- \* Low Static Power dissipation
  - \* High Input Impedance (Low drive current)
    - \* High Noise margin
    - \* High Packaging density
    - \* Low transconductance (2m)
    - \* output current produced is Low
- \* Bidirectional capability

# Bipolar technology

- \* High Static Power dissipation
- \* Low Input Impedance
  (High drive current)
- \* Low Noise margin
- \* Low Packaging density
- \* High transconductance
- \* output current produced is high
- \* unidirectional capability

Drain-to-source current (Ids) versus drain-to-source voltage (ds)

relation ships :-



mos transistors are voltage controlled device. A voltage on the gate terminal Induces a charge in the channel that exists between source and drain. The charge then move from source to drain under the Influence of electric field generated by voltage Vis applied between drain and source.

The charge Induced is dependent on the gate to source Voltage Vas current IDs is dependent on both Vas and Vos.

The drain-to source current Ids is given by

$$I_{dS} = \frac{\text{charge Induced in } \cdot \text{channel } (\text{Qu}_{c})}{\text{Electron transit time } (\text{P})} = \frac{\text{Qu}_{c}}{\text{P}} - 0$$

Electron transit time, 
$$1 \text{sd} = \frac{\text{Length of channel}}{\text{Velocity}} = \frac{1}{\sqrt{2}} - 2$$

where, u = mobility of holes (or) electrons Eds = Electric field (drain to source)

substitute en (1) in (3)

substitute en (5) in (2)

s. The electron and hole mobilities at room temperature um = 650 cm / v-sec , up = 240 cm / v-sec

Non-saturated region:-

The Average length along the channel is considered as  $\frac{V_{ds}}{2}$ 

\* Effective gate voltage Vg = Vgs-Vt

where  $V_t$  = threshold voltage is needed to Invert the charge under the gate and to establish the channel.

The charge per unit area = Eg &o Eins — (5)

where, Eo > permittivity of free space (8.854 x 1014 Flcm)

Eins -> Relative Permitivity of Insulation between gate and channel (5102=47

Eg - Average electric field gate to channel

consider 
$$E_g = \left( \frac{V_{OS} - V_L}{D} - \frac{V_{dS}}{2} \right)$$

where D -> oxide thickness

substitute er 1 in er 1

$$I_{dS} = \frac{\varepsilon_0 \underbrace{\varepsilon_{ims} \omega \kappa}_{D} \underbrace{\left( (v_{dS} - v_{e}) - \frac{v_{dS}}{\lambda} \right)}_{A \ V_{dS}}}{\underbrace{\left( (v_{dS} - v_{e}) - \frac{v_{dS}}{\lambda} \right)}_{E}}_{C} = \underbrace{\varepsilon_0 \underbrace{\varepsilon_{ims} \omega A^*}_{LD} \left( (v_{dS} - v_{e}) - \frac{v_{dS}}{\lambda} \right) v_{dS}}_{C}$$

= 
$$\frac{\varepsilon_0 \varepsilon_{ims} H}{D} \cdot \frac{\omega}{L} \left[ (495 - V_E) - \frac{V_{dS}}{2} \right] V_{dS} - 9$$

$$= \frac{\kappa \omega}{L} \left[ \left( \sqrt{g_S - v_E} \right) - \frac{\sqrt{d_S}}{2} \right] \sqrt{d_S} - \left( \frac{1}{2} \cdot \kappa - \frac{\varepsilon_0 \varepsilon_{ims} u}{D} \right)$$

$$I_{ds} = \beta \left[ (Y_{gs} - Y_{t}) - \frac{Y_{ds}}{2} \right] Y_{ds} - \left[ \vdots \beta = \frac{K \omega}{L} \right]$$

substitute ev 1 in 1,

$$I_{ds} = \frac{c_g A_I}{L^2} \left[ \left( \frac{V_{gs} - V_E}{2} \right) - \frac{V_{ds}}{2} \right] V_{ds}$$
 (3)

consider 
$$c_g = c_0 \omega L$$
 — (4)

substitute ev@ in ev 3

$$T_{dS} = \frac{c_0 \omega 4}{L} \left( (v_{gS} - v_E) - \frac{v_{dS}}{2} \right) v_{dS} \qquad (S)$$

In this region, consider Yas = Ys-vt.

early becomes

$$I_{dS} = \frac{k_{UV}}{L} \left[ (V_{9S} - V_{E}) - (V_{9S} - V_{E}) \right] (V_{9S} - V_{E}) \qquad \left[ :: V_{dS} = V_{9S} - V_{E} \right]$$

$$= \beta \left[ (V_{9S} - V_{E})^{2} - (V_{9S} - V_{E})^{2} \right] \qquad \left[ :: \beta = k_{U} \right]$$

$$= \beta \left[ \left( (V_{9S} - V_{E})^{2} - (V_{9S} - V_{E})^{2} \right) - (\tilde{D}) \right]$$

Now, Jate per channel capacitance  $c_g = e_0 \frac{e_{instul}}{D}$ 

eq @, becomes

consider cg = cocol

$$= \int_{as}^{as} \frac{c_0 \omega \kappa u}{a \kappa} \left[ \left( \frac{c_0 s - v_k}{s} \right)^2 \right]$$

$$= \int_{as}^{as} \frac{c_0 \omega u}{a \kappa} \left[ \left( \frac{c_0 s - v_k}{s} \right)^2 \right] - \left( \frac{c_0 \omega u}{s} \right)^2$$

The current in 3 regions

-1.



Spital State State

a) pepletion mode device





b) Enhancement mode device

mos transistor threshold voltage (Nt)

For switching an enhancement mode most transistor from OFF to ON state, applying sufficient gate voltage to neutralize these charges and enable the underlying silicon to undergo an inversion due to electric field from the gate.

For switching an depletion made mos transistor from on to OFF state consists in applying enough voltage to the gate to add to state consists in applying enough voltage to the gate to add to the stored charge and Invert the 'n' Implant region to p-region.

The Threshold voltage (Vt) is given by

at - charge per unit area in the depletion layer

ouss - charge density at Si: Sioa Interface

& -> capacitance per unit gate area

Pms - work function différence between gate and silicon

of > Fermisevel Potential between inverted surface and bulk silicon

To evaluate by, each devm is determined as follows

$$\phi_{PN} = \frac{kT}{2} \ln \left( \frac{n}{n!} \right) \text{ volts } -3$$

Ques = (1.5 to 8) x 108 coulomb/m depending on crystal orientation

where,

VSB -> substrate bias voltage (-ve wir to wmos, trefor pmos)

9 -> 1.6x1019 coulomb

N > Impurity concentration in substrate (NA (Or) ND)

Esi → Relative permitivity of silicon (≈11.7)

n; > Intrinsic electron concentration (16 x 16 cm3 at 300 k)

k → Boltzmann's constant (+4×1023 Joule ok)

Alternative expression for Vt in terms of VsB is as follows

where  $V_{\xi}(0) \rightarrow$  Threshold voltage for  $V_{SB}=0$ 

D -> oxide thickness

Body effect :-

Increasing YsB causes
the channel to be depleted
of charge carriers and thus
the threshold voltage is raised.

The variation of threshold voltage due to source to substrate voltage is referred as "Body effect".



The relation ship between threshold voltage and substrate bias voltage is given by

where 7 is constant it depends on substrate doping, so that more lightly doped the substrate, smaller will be body effect.

mos transistor transconductance (Im) and output conductance (Is) :-

Transconductance (9m):-

It is defined as the ratio of soutput current to the change in

Input voltage by taking output voltage is constant

$$g_{m} = \frac{s I_{ds}}{s v_{gs}}$$
  $v_{ds} = constant$ 

we know that

$$I_{dS} = \frac{\alpha_{\ell}}{\gamma_{\ell d}} \Rightarrow SI_{dS} = \frac{S\alpha_{\ell}}{\gamma_{\ell d}} - 2$$

But 
$$\frac{1}{5d} = \frac{1^{2}}{44 \text{ Vds}}$$
 —3

substitute ev@ in ev@

we know that, shange in charge a=cv

substitute evo in eva

under saturation region, Vds = 48-Vb

$$9_m = \frac{c_g \mathcal{H} \left( \frac{v_{gS} - v_E}{L^{\nu}} \right)}{L^{\nu}} - 8$$

we know that 
$$c_g = \frac{\epsilon_0}{D} \frac{\epsilon_{inscul}}{D}$$

substitute ev 1 in ev 1

$$= \underbrace{k \cdot \omega}_{L} \left[ v_{gs} - v_{t} \right]_{0}^{+} \left[ v_{ss} - v_{t} \right]_{0}^{+}$$

$$\vartheta_{s} = \frac{s \tau_{ds}}{s v_{ds}}$$
$$= \lambda \cdot \tau_{ds} = \left(\frac{1}{L}\right)^{\nu}$$

From above equation, 2dt and Isat

mos transistor figure of merit (ab) 1-

The figure of merit is a quantity used to characterized the performance of a device related to other devices of same time.

It is defined as the ratio of transconductance to the gate-tochannel capacitance

$$\omega_{0} = \frac{g_{m}}{cg} = \frac{\sqrt{g}}{L^{\nu}} \qquad \left[ :: g_{m} = \frac{c_{g}}{4} \frac{\sqrt{g} s^{-\nu} \epsilon}{L^{\nu}} \right]$$

$$= \frac{\sqrt{g}}{\sqrt{g}} \qquad \left[ :: g_{m} = \frac{c_{g}}{4} \frac{\sqrt{g} s^{-\nu} \epsilon}{L^{\nu}} \right]$$

$$= \frac{\sqrt{g}}{\sqrt{g}} \qquad \left[ :: g_{m} = \frac{c_{g}}{4} \frac{\sqrt{g} s^{-\nu} \epsilon}{L^{\nu}} \right]$$

$$= \frac{\sqrt{g}}{\sqrt{g}} \qquad \left[ :: g_{m} = \frac{c_{g}}{4} \frac{\sqrt{g} s^{-\nu} \epsilon}{L^{\nu}} \right]$$

$$= \frac{\sqrt{g}}{\sqrt{g}} \qquad \left[ :: g_{m} = \frac{c_{g}}{4} \frac{\sqrt{g} s^{-\nu} \epsilon}{L^{\nu}} \right]$$

$$= \frac{\sqrt{g}}{\sqrt{g}} \qquad \left[ :: g_{m} = \frac{c_{g}}{4} \frac{\sqrt{g} s^{-\nu} \epsilon}{L^{\nu}} \right]$$

$$= \frac{\sqrt{g}}{\sqrt{g}} \qquad \left[ :: g_{m} = \frac{c_{g}}{4} \frac{\sqrt{g} s^{-\nu} \epsilon}{L^{\nu}} \right]$$

$$= \frac{\sqrt{g}}{\sqrt{g}} \qquad \left[ :: g_{m} = \frac{c_{g}}{4} \frac{\sqrt{g} s^{-\nu} \epsilon}{L^{\nu}} \right]$$

$$= \frac{\sqrt{g}}{\sqrt{g}} \qquad \left[ :: g_{m} = \frac{c_{g}}{4} \frac{\sqrt{g} s^{-\nu} \epsilon}{L^{\nu}} \right]$$

$$= \frac{\sqrt{g}}{\sqrt{g}} \qquad \left[ :: g_{m} = \frac{c_{g}}{4} \frac{\sqrt{g} s^{-\nu} \epsilon}{L^{\nu}} \right]$$

$$= \frac{\sqrt{g}}{\sqrt{g}} \qquad \left[ :: g_{m} = \frac{c_{g}}{4} \frac{\sqrt{g} s^{-\nu} \epsilon}{L^{\nu}} \right]$$

$$= \frac{\sqrt{g}}{\sqrt{g}} \qquad \left[ :: g_{m} = \frac{c_{g}}{4} \frac{\sqrt{g} s^{-\nu} \epsilon}{L^{\nu}} \right]$$

$$= \frac{\sqrt{g}}{\sqrt{g}} \qquad \left[ :: g_{m} = \frac{c_{g}}{4} \frac{\sqrt{g} s^{-\nu} \epsilon}{L^{\nu}} \right]$$

$$= \frac{\sqrt{g}}{\sqrt{g}} \qquad \left[ :: g_{m} = \frac{c_{g}}{4} \frac{\sqrt{g} s^{-\nu} \epsilon}{L^{\nu}} \right]$$

$$= \frac{\sqrt{g}}{\sqrt{g}} \qquad \left[ :: g_{m} = \frac{c_{g}}{4} \frac{\sqrt{g} s^{-\nu} \epsilon}{L^{\nu}} \right]$$

$$= \frac{\sqrt{g}}{\sqrt{g}} \qquad \left[ :: g_{m} = \frac{c_{g}}{4} \frac{\sqrt{g} s^{-\nu} \epsilon}{L^{\nu}} \right]$$

$$= \frac{\sqrt{g}}{\sqrt{g}} \qquad \left[ :: g_{m} = \frac{c_{g}}{4} \frac{\sqrt{g} s^{-\nu} \epsilon}{L^{\nu}} \right]$$

$$= \frac{\sqrt{g}}{\sqrt{g}} \qquad \left[ :: g_{m} = \frac{c_{g}}{4} \frac{\sqrt{g} s^{-\nu} \epsilon}{L^{\nu}} \right]$$

$$= \frac{\sqrt{g}}{\sqrt{g}} \qquad \left[ :: g_{m} = \frac{c_{g}}{4} \frac{\sqrt{g} s^{-\nu} \epsilon}{L^{\nu}} \right]$$

$$= \frac{\sqrt{g}}{\sqrt{g}} \qquad \left[ :: g_{m} = \frac{c_{g}}{4} \frac{\sqrt{g} s^{-\nu} \epsilon}{L^{\nu}} \right]$$

$$= \frac{\sqrt{g}}{\sqrt{g}} \qquad \left[ :: g_{m} = \frac{c_{g}}{4} \frac{\sqrt{g} s^{-\nu} \epsilon}{L^{\nu}} \right]$$

$$= \frac{\sqrt{g}}{\sqrt{g}} \qquad \left[ :: g_{m} = \frac{c_{g}}{4} \frac{\sqrt{g} s^{-\nu} \epsilon}{L^{\nu}} \right]$$

$$= \frac{\sqrt{g}}{\sqrt{g}} \qquad \left[ :: g_{m} = \frac{c_{g}}{4} \frac{\sqrt{g} s^{-\nu} \epsilon}{L^{\nu}} \right]$$

$$= \frac{\sqrt{g}}{\sqrt{g}} \qquad \left[ :: g_{m} = \frac{c_{g}}{4} \frac{\sqrt{g} s^{-\nu} \epsilon}{L^{\nu}} \right]$$

$$= \frac{\sqrt{g}}{\sqrt{g}} \qquad \left[ :: g_{m} = \frac{c_{g}}{4} \frac{\sqrt{g} s^{-\nu} \epsilon}{L^{\nu}} \right]$$

From ev O, switching speed depends on i) carrier mobility

- ii) Inversly on sociare of channel len
- randon de la parti de la gardii) gate voltage

A high speed switching circuits, high Im as possible.

The transistor can be used as an on-off switch as shown



Fig: mos bransistor as a switch

\* The switch is turned off by setting ys=0, the channel disappears and a small amount of leakage current flows at the drain end.

\* consider ys = YDD, the switch is turned on, the current flows through it.

If the gate and drain are pass transistor are both HIGH, the source will rise to the lower of two Potentials.

S= Le Vod-Yth \* If the gate and drain are both at VDD the source can only rise

to one threshold voltage below through gate.



A number of transistors can be used as switches in series (E) in switching logic arrays. An AND array as shown in fig.



### Nmos Inverter:

The basic Inverter circuit requires a transistor with source connected to ground and a load resistor of some sort connected from the drain to positive supply rail Yop. The output is taken from the drain and the Input applied between gate and ground.

The NMOS Inverter circuit as shown in fig.



In this configuration, with no current drawn from the output, the current IDS for both transistors are equal.

For the depletion mode transistor, gate is connected to the Source so it is always on and only the characteristic curve Vis=0 is relevant.

In this configuration the depletion made device is called the pull-up (AU) and enhancement made device is called the pull-down (AD) transistor.

by superimposing Vasto depletion mode characteristic curve on the family of curves for the tenhancement mode device with the condition the maximum voltage across the enhancement mode device corresponds to minimum voltage across the depletion mode device.

The transfer characteristics of NMOS Inverten as shown in fig.



Now the Input Voltage Vin is the gabe-to-source voltage Vas for the pull down transistor. As this exceeds the threshold voltage of pull down transistor, current begins to flow. This causes the output pull down transistor, current begins to flow. This causes pull down voltage Vout to decrease and further Increase in Vin causes pull down transistor to some out of saturation and become resistive.

• Definition of the section of the s

Scanned with CamScanner

Scanned with CamScann

11,



Fig: Nmos Inverter transfer characteristics

The point on the transfer characteristic at which Vout = Vin is denoted as

Determination of pull-up to pull-down ratio (zpu|zpd) for an Nmos Inverter :-

consider two Inverters are arranged in a cascaded manner.

These two are operate in a Saturation region.

The depletion mode transistor for which Vgs=0, so maximum drain current flows i.e all devices are on.

For equal margins around the Inverter threshold, we said  $V_{inv} = 0.5 \, V_{DD}$  at this Point both the transistors are in Saturation.

we know that,

Ids = 
$$\frac{k\omega}{L} \left[ \left( \frac{(v_{gs} - v_{e})^{2}}{2} \right] - 0 \right]$$
 [: saturation region

In depletion mode

$$T_{dS} = K \cdot \frac{\omega_{pu}}{L_{pu}} \left[ \frac{(y_{S} - V_{k})^{2}}{2} \right]$$

$$= K \cdot \frac{\omega_{pu}}{L_{pu}} \left[ \frac{(-V_{kd})^{2}}{2} \right] - 2 \quad [: V_{gs} = oV]$$

For enhancement mode,

$$I_{ds} = k \cdot \frac{\omega_{pd}}{L_{Rd}} \left[ \frac{\left( V_{inv} - V_{E} \right)^{2}}{2} \right] - 3 \quad \left[ :: V_{gs} = V_{inv} \right]$$

equating ev 1 & 3)

$$\left[ \frac{\omega_{\text{Pu}}}{L_{\text{Pu}}} \left[ \left( -\frac{V_{\text{td}}}{2} \right)^{2} \right] = K \frac{\omega_{\text{Pd}}}{L_{\text{Pd}}} \left[ \frac{\left( V_{\text{imV}} - V_{\text{E}} \right)^{2}}{2} \right]$$

The Aspect ratio (2) is defined as the ratio of length to the width.

$$\frac{1}{z_{pq}} \left[ \left( \frac{-v_{td}}{z} \right)^{\gamma} \right] = \frac{1}{z_{pd}} \left[ \left( \frac{v_{inv} - v_{t}}{z} \right)^{\gamma} \right]$$

$$\Rightarrow (v_{inv} - v_{b})^{2} = \frac{2pd}{2pu} \left[-v_{bd}\right]^{2}$$

$$V_{inv} - V_t = \frac{-V_{td}}{\sqrt{\frac{z_{pu}}{z_{pd}}}} - \Psi$$

substitute typical values of

$$0.5 V_{DD} - 0.2 V_{DD} = \frac{0.6 V_{DD}}{\sqrt{\frac{2pu}{2pd}}}$$

$$\sqrt{\frac{2pu}{2pd}} = \frac{0.6 V_{DD}}{0.3 V_{DD}}$$

$$\frac{2pu}{2pd} = (\frac{2}{1})^{2} = \frac{1}{1} = \sqrt{\frac{2pu}{2pd}} = \frac{1}{1}$$

$$= 4:1$$

Determination of Zpu/zpd ratio for an ismos Inverter driven through one (or) more pass transistors:

some times the Inpub to an Inverter may come from the output of an Inverter but after passing through one (or) more remos transistors that are used as pars transistors. The arrangement as shown in fig.

when A in is at ovolts, Bis at Vop, but the voltage into Inverter 2 at point c has got reduced from 400 by threshold voltage of series pass transistor. There is a reduction in voltage by tp where Up is the threshold voltage.

Hence Imput voltage to Inverter 215

when Input to Inverter 1 is VDD, its pull down transistor conducting with a low voltage across it. At the same time,

the Pull-up transistor T, is in saturation and represented as a current source.

The equivalent dircuits of Inverters 1 & Inverter a as shown in fig.



a) Inverter 1 with Imput = VDD



b) Inverter 2 with Input = VDD - VEP

pull down transistor,

$$R_{1} = \frac{V_{DS1}}{T_{DS}}$$

$$= \frac{V_{DS1}}{V_{DS1}} \left[ \frac{V_{DD} - V_{E} - \frac{V_{dS}}{2}}{V_{DS1}} \right] V_{SS}$$

$$= \frac{1}{K} \cdot \frac{U_{Pd1}}{U_{Pd1}} \left[ \frac{V_{DD} - V_{E} - \frac{V_{dS}}{2}}{V_{DS1}} \right] V_{SS}$$

Ves is small and hence it can be neglected

$$R_1 = \frac{1}{K} PdI \frac{1}{(V_{DD} - V_t)} - 0$$

$$I_1 = I_{AS} = \frac{K}{L_{PQ1}} \left[ \frac{(V_{DS} - V_{Ed})^2}{2} \right]$$

consider Ys= 00

$$J_1 = k \cdot \frac{\omega_{\text{put}}}{L_{\text{put}}} \left[ \left( -\frac{V_{\text{bd}}}{2} \right)^{k} \right]$$

The output of Inverter 1, Voul = TRI

$$V_{out_1} = K \cdot \frac{\omega_{pu_1}}{L_{pu_1}} \left[ \left( \frac{-V_{td}}{2} \right)^{\nu} \right] \cdot \frac{1}{K} \cdot \frac{z_{pd_1}}{z_{pd_1}} \left( \frac{-V_{td}}{V_{DD} - V_{t}} \right)$$

$$= \frac{1}{z_{pu_1}} \left[ \left( \frac{-V_{td}}{2} \right)^{\nu} \right] \cdot \frac{z_{pd_1}}{z_{pd_1}} \left( \frac{-V_{td}}{V_{DD} - V_{t}} \right)$$

$$= \frac{2Pd1}{2Pu1} \cdot \frac{(-V_{td})^2}{2(V_{DD}-V_t)} = 2$$

consider Inverter & with Input as You-YEP

$$I_{a} = k \cdot \frac{1}{2pua} \left[ \left( -\frac{V_{td}}{2} \right)^{2} \right] - \Theta$$

The output of Inverter 2, Vout 2 = IzR2

Vout 
$$a = K \cdot \frac{1}{2pua} \left[ \left( \frac{-V_{td}}{2} \right)^{2} \right] \cdot \frac{1}{kr} \frac{2pda}{pda} \left[ \left( \frac{V_{DD} - V_{tp}}{V_{DD} - V_{tp}} \right) - V_{t} \right]$$

$$= \frac{z_{Pd2}}{z_{pu2}} \cdot \frac{1}{(v_{DD}-v_{tp})-v_t} \cdot \frac{(-v_{td})^2}{2} - 5$$

The output of Inverter 2 is same as the inverter 1

$$V_{\text{out},1} = V_{\text{out},2},$$

$$T_1 R_1 = T_2 R_2.$$

$$\frac{\overline{z_{Pd1}}}{\overline{z_{Pd1}}} \cdot \frac{(-V_{td})^{2}}{\overline{z_{Pd2}}} = \frac{\overline{z_{Pd2}}}{\overline{z_{Pd2}}} \cdot \frac{1}{(V_{DD} - V_{tP}) - V_{t}} \cdot \frac{(-V_{td})^{2}}{\overline{z_{Pd2}}}$$

$$\frac{z_{\text{Pdl}}}{z_{\text{Pul}}} \cdot \frac{(-v_{\text{td}})^2}{\sqrt{2}} \cdot \frac{1}{v_{\text{DD}} - v_{\text{t}}} = \frac{z_{\text{Pda}}}{z_{\text{Pua}}} \cdot \frac{1}{(v_{\text{DD}} - v_{\text{tp}})^{-v_{\text{t}}}} \cdot \frac{(-v_{\text{td}})^2}{2}$$

substitute vt = 0.2 VDD, Vtp = 0.3 VDD in eq 6,

$$\frac{z_{PUR}}{z_{PdR}} = \frac{z_{PUR}}{z_{PdR}} \cdot \frac{v_{OD} - 0.2 v_{DD}}{(v_{DD} - 0.3 v_{DD}) - 0.2 v_{DD}}$$

$$= \frac{z_{PUR}}{z_{PdR}} \cdot \left[ \frac{0.8 v_{DD}}{0.5 v_{DD}} \right] \approx \frac{z_{PUR}}{z_{PdR}} \cdot \left[ \frac{0.8 v_{DD}}{z_{PdR}} \right]$$

$$= \frac{z_{PUR}}{z_{PdR}} \cdot \left[ \frac{8}{5} \right]$$

Therefore an Inverter driven through one (or) more pass transistors as a zpu/zpd is greater than 8/1.

Scanned with CamScanner

## Alternative forms of Pull-up :-

#### 1. Load resistance (Ri) :-

flows from Vob to Vss.

If logic 1' is given at Vin, there is a current flows from Vpo to Vss.

of the large space requirements of resistor.

Produced in Silicon substrate.



Fig:- Resistor Pull-up

#### 2. Nmos depletion mode transistor pull-up:

- e) Dissipation is high since rail to rail current flows when Vin= Logical 1
  b) switching of output from 1 to 0 begins when Vin exceeds 4 of
  pull down device.
- non-saturated initially, this Process presents lower resistance through which to charge capacitive loads.



a) circuit



b) Transfer characteristics

## 3. Nmos enhancement mode pull-up :-

- \* Dissipation is high since current flows when Vin = logical 1
- \* Your can never reach by (Logical 1) if You = You as is normally the case
- \* You may be derived from a switching source, for example, one phase of a clock, so that dissipation can be greatly reduced. If you is higher than You, an extra rail is required





b) transfer characteristics

- 4. complementary transistor pull-up:
- -> No output current flow either for logical 1 (or) logical 0 Inputs.
- -> For similar dimension devices, the n-channel device is

faster than p-channel device.



an circuit



b) Transfer characteristics

The cmos Inverter can be designed by using pmos and Nmos transistors.

The operation of the circuit as follows:

the Jate of the P-channel transistor is at You below the source Potential, i.e

Vos=Voo. This turns on this transistor, capacitor charged upto Voo.



Fig: cmos Inverter

transistor since Vas=0.

If the Input voltage incised to threshold voltage level of the n-channel transistor and then to VoD, n-channel transistor will conduct and p-channel transistor gets turned off, discharging the load capacitance to ground potential.

The transfer characteristics as shown in fig. below



The current / voltage relation ships for the mos transistor

$$I_{ds} = \frac{P}{3} \left( Y_{ds} - Y_{b} \right)^{\gamma} - 2 \left[ P = \frac{1}{2} \frac{\omega}{L} \right]$$

'B' applicable to both nmos and pmos transistors,

$$\beta_n = \frac{\epsilon_0 \epsilon_{ins} \alpha_n}{D} \frac{\omega_n}{L_n} - Q$$

Where up, wn, Lp, in one pand n-transistor dimensions.

Mp, un one hole and electron mobilities.



Fig: - cmos Inverter current versus Vin

It has five distinct regions of operations.

Vin = Logico = ov

PMOS is ON and NMOS is OFF. Hence no current flows through the Inverter circuit and output directly connected to YDD.

N-transistor on and P-transistor OFF. In this condition again no current flows through the circuit.

Vin > 1/6 (The Input Voltage is Increased above the Region 2:threshold voltage)

N-transistor conducts and has a very large difference between drain and source and is in saturation.

P-transistor also conducts, with only small voltage difference between drain and source it operates in unsaturated resistive region.

#### Region 4:

conditions are same as in region 2 but with the roles of P-and n-transistors reversed. That is, P-transistor has a large voltage across it while the n-transistor has a small voltage acrossit. The drain current in both the regions a and 4 is Small.

Large current flows in region 3. In this region, both the devices are in saturation.

since, the two transistors are in series, the current through them is same, we can write

$$\frac{1}{ds} = \frac{k\omega}{L} \left[ \frac{v_{9s} - v_{E}}{2} \right]^{\gamma}$$
 (saturation)

$$\frac{1}{ds} = k \frac{\omega}{L} \left[ (4s - v_E)^2 ds - \frac{v_{1s}^2}{2} \right] \left[ \text{resistive} \right]$$

$$\omega kT$$
  $I_{dS} = \frac{k\omega}{kL} \left[ v_{gS} - v_{E} \right]^{2}$ 

$$Idsp = \frac{K \omega_p}{4p} \left( \frac{V_{9S} - V_E}{2} \right)^2$$

$$= k \frac{\omega_P}{L_P} \left( \frac{V_{in} - V_{DD} - V_{LP}}{2} \right)^{2} \qquad \left[ : V_{gs} = V_{in} - V_{DD} \right]$$

$$V_{t} = V_{tP}$$

$$I_{dsn} = k \frac{\omega_n}{L_n} \left( \frac{v_{in} - v_{tn}}{a} \right)^{\gamma}$$

$$I_{dsn} = \frac{\beta_n}{2} \left( v_{in} - v_{kn} \right)^n - Q$$

Equating 0 & 2 we get

$$V_{in} = \frac{V_{DD} + V_{tp} + V_{tn} \left( \frac{\beta_n}{\beta_p} \right)^{1/2}}{1 + \left( \frac{\beta_n}{\beta_p} \right)^{1/2}}$$
 (3)

an region 3, both transistors are in saturation. they act a current source, the equivalent circuit, in this region as shown in fig.



This region 3 is Very unstable and change over from one logic level to other.

If  $V_{in} = -V_{tp}$ ,  $\beta_{p} = \beta_{n}$ , equation reduces to  $V_{in} = 0.5 \text{ VDD}$ At this Point two 'p' factors will be equal.

 $P_n = P_p$ , the device geometries should satisfy  $\frac{u_n w_n}{L_n} = \frac{\lambda l_p w_p}{L_p}$ 

The mobilities of electrons and holes are unequal.

It is necessary that the width to length ratio (w/L) of p-device be two to three times that of n-device i.e

$$\frac{\omega_{P}}{\psi} = 2 \cdot \frac{\omega_{n}}{L_{n}}$$

By keeping minimum size geometries for both p-and n-devices, effect 'B' ratio is minimized.

The transfer characteristics as shown in Ag.



#### Bicmos Inverter:-

The simple Bicmos Inverter as shown in fig. below

The Inverter circuit

consists of two bipolar transistors

Ti, Ta and mos devices are

enhancement mode.



operation

\* Vin=0 ive Logico, Is is off which keeps T, non-conducting.

Ty is on and supplies base current to To which conducts and act as a current source to charge the load of toward +5v(VDD). The output but goes +5v less the base to emitter drop YBE of To.

\* Vin = Logic 1 (+5v), Ty is OFF so that To will be non-conducting.

To is on and supplies current to base of To which conducts

- through it to 'o' volts (OIND). The Yout falls to 0 volts plus the saturation voltage VE(sat) between collector and emitter of Ti.
- \* charging and discharging of the load c<sub>L</sub> is very fast because transistors T<sub>1</sub>, T<sub>2</sub> present low Impedances when turned on into saturation.
- Yee (sat) is quite small and Yze equals 0.7 volts approximately.
- The Inverter offers a low output Impedance and a high Input Impedance. It occupies a relatively small area but has a high current drive capability.
- There is a constant Dic path between the rails through T3 and which allows a significant static current flow whenever Vin=Logic

There is another problem, that there is no discharge path for current from the base of either npn transistor when it is turned off. This effects the speed of action of the circuit.

in an Improved Inverter circuit, as shown in fig.



Fig: - Bicmos Inverter with no static current flow

resistors as shown in fig.



The resistors provide an Improved swing of output Voltage when either bipolar transistor is off.

They also Provide discharge Paths for the base currents during turn-off An Improved Bicmos Inverter using mos transistors for base current discharge as shown in fig.



turned off. That is when  $T_2$  is to be turned off,  $T_5$  gets turned on and provides discharge Path for base current of  $T_2$ . Bicmos Invertous are more suitable where high load current sinking & sourcing is required.

- A problem which is Inherent in the P-well, n-well process is due to large no of Junctions, which are formed in these structures, consequent presence of parasitic transistors and diodes.
- the establishment of low resistance conducting Path between VDD, VSS.
- -> Latch-up may be Induced by glitches on supply rails (07) Incident radiations.

consider, Parasitic components associated with the p-well structure,



Fig: - Latch up effect in P-well structure

- between VDD, Vss.
- Tf Sufficient substrate current flows to generate enough voltage across Rs to turn on transistor T, this will then draw current through Rf
- If voltage developed is sufficient, To also turn on, establishing

- a self sustaining www resistance Path between supply rails.
- If a current gains of a transistors BxB2>1, Latch up may occur.
- Equivalent circuit as



Fig: Latch up circuit model.

→ with no Injected current, Parasitic transistors exhibit high resistance, but sufficient substrate current flow will cause switching to Low resistance.



once catched up, this condition maintain until the catch-up current drops below II.

Remedies for Latch-up Problem Include,

- \* Increase in substrate doping levels with a consequent drop in value
- \* Reducing Rp by control of fabrication parameters and by ensuring Low contact resistance to Yss.

### \* Introducing guard rings

Latch-up in N-well fabrication as shown in fig.



#### mos layers :-

Mos circuits are formed on 4-basic Layers i.e. n-diffusion, p-diffusion, polysilicon, metal.

- > Each layer is Isolated from one another by thick (or) thin Insulating layers (sion).
- > A transistor is formed by polysilican and thinox regions cross one another.
- The basic mos transistor properties can be modified by the use of an Implant with in the thinox region and this is used in moss circuits to produce depletion mode transistors.

#### stick diagrams :-

stick diagram is used to convey layer Information through the use of color code (or) monochrome encoding schemes.

terplanat.

was older at house defence radiagnet

39 K.F.

maker to and store

-> For different processes different color codes and different encoding schemes used.

The encoding schemes are

(minima) well know

- 1. Encodings for a simple single metal mmos process
- 2. Encodings for a Double metal cmos p-well process
- 3. Encodings for a double metal double poly, Bicmos n-well Proce



Note: P-type transistor are placed above and n-type transistors below the demarcation line.

Bicmos n-well Process

|                                       |                        |                                  | 1 2017                                                                       | mask layout encoding                   | 100                          |
|---------------------------------------|------------------------|----------------------------------|------------------------------------------------------------------------------|----------------------------------------|------------------------------|
| color                                 | stick discoding        |                                  | Layers                                                                       |                                        | 6 49                         |
| orange.                               | mono chrome            |                                  | Polysilicon                                                                  | 2 Milli                                | cPS and                      |
| Pink Pale Green                       | Not separately encoded |                                  | P-base of Bipolar npr transistor Burried collector of Bipolar npr transistor | maell 1                                | CBA<br>Not offi<br>-co       |
| Feature                               | (                      | Feature (Stick)<br>Cmono chrome) | 1                                                                            | conochrome)                            | Feature (mask<br>Cmonochrome |
| type embancement                      |                        | Demarcation lines                | Δ:ω                                                                          | 5 D ovarge                             | 5 G                          |
| type enhancement<br>bly. a transistor |                        | Demarcati                        | Revelent                                                                     | orange                                 | S G                          |
| nph bipolan transistor                |                        |                                  | 9/6/ (Yai 2.5 r m ect.)                                                      | harmanad<br>Impuranad<br>Valaborati ak |                              |

## Design rules and Layout:

The design rules are the effective Interface between the circuit/system designer and the fabrication engineer.

circuit designers in general want tighter, smaller layouts for Improved performance and decreased silkon area.

on other hand, the process designer wants design rules that result in a controllable and reproducible process. worl colloring

Lambda-based design rules:





design rules for wires (nmos and cmos)

- > From above fig,

  > minimum nt-diffusion and pt-diffusion width is 27
- -> minimum spacing between two nt-diffusion and pt-diffusion is 32
- > For metal 1, minimum width should be 32 and minimum separation from another metal 1 wire is 32.
- > For metal 2, minimum width should be ux and minimum separation from another metal 2 wire is ux.
- For Polysilicon wire the minimum width is 27 and minimum separation from another Polysilicon wire is 27. This separation is also called as Poly-poly separation.



Fig: construction rules for transistors

From fig. smallest transister width 22 and length 27.



From above figure,

- Separation from contact cut to transistor is an
- Implant for an NMOS depletion mode transistor has extend ax from the channel in all directions.
- separation from Implant to another transistor is an. ->



व ने प्रतिक्रिक वर्ष १ वर्ष व व १ वर्षाल

(polysilicon to extend from diffusion at least by 22)

Fig: - Transistor Lay out rules

From above fig,

- > Polysilican to extend from diffusion at least by 22
- → Diffusion should not decrease in width at least before 22 from the Polysilicon.

contact cuts:

when making contacts between Polysilicon and diffusion in mmos circuits it should be recognized that there are 3 possible approache

- 1. Polysilicon to metal and then metal to diffusion
- Burried contact (Poly to diffusion)
- Butting contact (Poly to diffusion using metal)

# metal 1 to Polysilicon (or) to diffusion :-



2xxx ad centered on uxxux Superimposed areas of layers to be joined in all cases.







minimum separation multiple cuts

#### via (contact from metal & to metal 1 and thence to other layers)



via and cut used to connect



Fig: - Design rules for wives (nmos and cmos) con contacts (nmos crockettin to believe made been dodate out and inglished of cmos

de Burried comback (not) to diffusion)

3. Bubling contact (101) to difference wing notified - 3.









Fig: Design rules for wires (24m cmos)



a) n-type enhancement

b) p-type enhancement

Fig: Polysilicon 1 transistors





b) metal 1 to poly 2 190 I ladon of ()





d) metal 1 to n-diffusion



e) multiple contact cuts



f) via metal 2 metal 2 de la la la de de

Fig: Rules for n-well and YDD, Vss contacts



& Inton

features.

n-well spacings and width

Fig: Rules for n-well

1.2 4m Double metal, single poly. cmos rules:

As fabrication technology, Improves so the feature size reduces and a separate set of micron tased design rules must accompany each new feature size.





Fig: Design rules for wires (Interconnects)



W

Bills to combutte



Fig: - All devices shown in n-type. The same rules for apply for P-type

Miller

1. Draw the stick diagram and layout diagram of NMOS Inverter.



MELTY or Layout: -VDD g - Flandam of

11 345 056





# 4. Draw the Layout & stick diagram of 3-ilp NOR gate.





Loyout



# 5. Draw the stick and layout diagrams of cmos Inverter.



6 Draw the Layout diagram of 2-I/P cmas NAND Grate





Looper

lay out :-





(or) Layout



8. Draw the Layout diagram of 3-Ilp cms NAND gate.















lo. Draw the Layout diagrams of Nmos, Pmos, cmos using Expression Y=(A+B)C

.







stickdiagram V<sub>DD</sub> A D D -olp Vss

Draw the Stick diagram of 2-Input cmos NAND gate.





Draw the stick diagram of 2-Ilp cmos NOR gale.





\* Y=a·b+c we cmos.





Y= a (b+c+d) -> Draw the Stick diagram of stick diogram b C a 5 D 0/1

# Basic Circuit concepts

sheet resistance (Rs):-

sheet resistance is a measure of resistance of a thin films

that have a uniform thickness.

consider a uniform slab of conducting material of resisitivity'e', width 'w', thick ness 't', length'L'.





Fig: - Sheet resistance model

where A -> Area of cross section

$$R_{AB} = \frac{PL}{tw}$$
 ohm

consider L=w, since it is a uniform slab,

$$R_{AB} = \frac{PL}{tL} = \frac{e}{t} = R_{S}$$

The sheet resistance  $R_S = \frac{P}{t}$  ohm/square

For Example, 14m Per side square slab of the material has exactly same resistance as a 1cm per side square slab of the same material if the thickness is same.

Typical sheet resistance Rs of mos Layers for 54m, orbit 24m, The Court of the C orbit 1.24m is given in table.

| Layers                | Rs ohm per soucare |                       |                          |
|-----------------------|--------------------|-----------------------|--------------------------|
|                       | 5 km               | orbit aum             | orbit I.a.um             |
| metal                 | 0.03               | 0.04                  | 0.04                     |
| Diffusion (or active) | 10-750             | 20->45                | 20->45                   |
| slizide               | <b>2</b> →4        | Special States to the | . July or ( Till a hards |
| Polysilicon           | 15→100             | 15 → 30               | 15→30                    |
| n-transistor channel  | 104                | 2×104                 | 2×104                    |
| p-transistor channel  | 2.5 x 104          | 4.5 × 104             | 4.5×104                  |

Sheet resistance concept applied to mos transistors and Inverters:

consider n-type transistor has a length L=22 and

width w= ax.

Resistance defined Interms of Impedance 'z' i.e  $z = \frac{L}{\omega} = \frac{2\lambda}{22} = 1$ 



Resistance (R) = 15quare x Rs ohm Square est rejuste of the follow or never the of enter physical enters

For 54m technology R=10420 Yardania oli li linistikoar sucasa For orbit aum technology R= 2x101 N

$$z=\frac{L}{\omega}=\frac{8\lambda}{8\lambda}=4$$

channel resistance, R=z.Rs

R=4×104 is (for 54m technology)

R= 8x104 N (For orbit aum technology)



#### caluclation of Resistance of a simple Inverter:

For depletion mode, Pull-up transistor

L:w is 4:1

$$R_{p,q} = Z \times R_S$$

$$= 40 \times 10^4 = 40 \times 10^{-10}$$



es designal in an ambacina see 4

Pull-down transistor, L: wis 1:1

Total resistance R= Rp.u+ Rp.d

= 40 Km + 10 Km

Jak H lo nad

the good to paying any about old section.

consider simple cmos Inverter as shown in fig. below





n-type enhance ment mode. - Thirthart Thirthard I in gallo chias - 35 milestants

Area capacitances of Layers:-

For any layer, knowing the dielectric thickness, we can caluclate the orea capacitance

$$c = \frac{\varepsilon_0 \varepsilon_{ins} A}{D}$$

where, A > Area of plates

E > permitivity of free space D > thickness of sion (8.85 x 10 14 F/cm)

rollisment made inst

Eins -> Relative permitivity of Sion = 4

|                          | Value in PFX 104/4m (Relative values in brackets) |             |             |  |
|--------------------------|---------------------------------------------------|-------------|-------------|--|
| capacitance              | 5 km                                              | Rum         | 1.24m       |  |
| Gate to channel          | 4 (1.0)                                           | 8 (1.0)     | 16 (1.0)    |  |
| Diffusion (active)       | (0.25)                                            | 1.75 (0.22) | 3.75 (0.23) |  |
| Polysilicon to substrate | 0.4 (0.17                                         | 0.6 (0.075) | 0.6 (0.038) |  |
| metal 1 to substrate     | 0.3 (0.075)                                       | 0.33 (0.04) | 0.33 (0.02) |  |
| metal 2 to substrate     | 0.2 (0.05)                                        | 0.17 (0.02) | 0.17 (0.01) |  |
| metal 2 to metal 1       | 0.4 (0.1)                                         | 0.5 (0.06)  | 0.5 (0.03)  |  |
| metal 2 to polysilicon   | 0.3 (0.075)                                       | 0.3 (0.038) | 0.3 (0.018) |  |

Note:- Relative value = Specified value / gate to channel Value for that technology

Standard unit of capacitance 15 :-

The unit is denoted by  $\square c_g$ .

It is defined as gate to channel capacitance of a mos transistor having  $\omega = L =$  feature size that is standard.

For any mos process cg is evaluated is For 54m mos circuits:

Area/standard square = 5 4mx 5 4m -> 22 K



= 25 4m

capacitance value from table = 4×104 PF/4m² (:: Grate to channel)

standard value II cg = 254m² × 4×104 PF/14m²

= 0.01 PF

#### ii) For 24m mos circuits:-

Area/standard Succase =  $2\mu m \times 2\mu m = 4\mu m^{\nu}$ Gate capacitance value =  $9\times 10^{4}$  PF/ $4m^{\nu}$ Standard value  $\Box^{c}g = 4\mu m^{\nu} \times 9\times 10^{4}$  PF/ $4m^{\nu}$  = 0.032 PF

#### iii) For 1.2 4m mos circuits :-

Area/standard square =  $1.2 \text{ Am} \times 1.2 \text{ Am} = 1.44 \text{ Am}^{2}$ Gate capacitance value =  $16 \times 10^{4} \text{ PF}/4 \text{ m}^{2}$ .

Standard value  $\square c_{g} = 1.44 \text{ Am}^{2} \times 16 \times 10^{4} \text{ PF}/4 \text{ m}^{2}$ = 0.0023 PF

Thomas of Tree

Some area capacitance caluclations:

consider the area of capacitance



at the contract of the state of

phone of the fell of

Total area =  $20\lambda \times 3\lambda = 60\lambda^2$ 

min-channel area = 2xx2x = 4x2
(feature Size)

Relative area = 
$$\frac{\text{Total area}}{\text{feature size}} = \frac{60\lambda^{2}}{4\lambda^{2}} = 15$$

1. consider the area in metal 1.

capacitance to substrate = Relative area x Relative c value

I WELL

The first file of the property of the second states of the second second

a. consider the same area in polysilicon

capacitance to substrate = Relative areax Relative c value

3. consider the same area in n-type diffusion

capacitance to substrate = Relative area x Relative c Value

consider the capacitance caluclation in multilayer



and standard gate capacitance unit  $\square cg$ . If we consider the case of one standard (feature size square) gate area capacitance being charged through one feature size square of n-charmel resistance (that is, through Rs for an mos pass transistor charmel) as shown in fig.



Time constant (r) = (IRs (n-channel) ×10 cg) seconds

54m technology

 $P = 10^4 \text{ N} \times 0.01 \text{ PF} = 0.1 \text{ msec}$ 

orbit 24m technology

~= 2x104 N x 0.0032 PF = 0.064 msec

orbit 1.24m technology

N= 12x104 N x 0.00 23 PF = 0.046 nsec

To consider circuit wiring and parasitic capacitances must be allowed for so that the figure taken for N is often Increased

by a factor of 2 to 3 so that for 5 km circuit the worst case delay will be around T = 0.2 to 0.3 nsec.

p is not much different from x Transit time 15d.

$$\frac{\gamma_{Sd}}{Sd} = \frac{L^{2}}{\mu_{m} \gamma_{dS}}$$

Ys varies as of changes from a volles to 63% of 400 in period N.

consider 
$$V_{DD} = 5V$$
 in figure.

$$\begin{array}{ll}
\text{and} & V_{DD} = 5V \text{ in figure.} \\
\text{and} & V_{DD} = 650 \text{ cm}^2 | V_{DD} - 800 \text{ cm}^2 | V_{DD} \\
\text{and} & V_{DD} = 650 \text{ cm}^2 | V_{DD} - 800 \text{ cm}^2 | V_{DD} - 800 \text{ cm}^2 | V_{DD} \\
\text{and} & V_{DD} = 650 \text{ cm}^2 | V_{DD} - 800 \text{ cm}^2 |$$

This is very close to the theoritical time constant n caluclated above.

Thus the transit time and time constant are synchronous and can be Interchangeably used. The Stray capacitances are usually allowed for doubling the theoritical values caluclated.

For 54m mos technology, N=0.3 insec

For orbit 24m mos technology, N=0.2 nsec

For orbit 1.24m mos technology, N=0.1 nsec

AND AND SOLD TO LOCATE THE CARREST OF THE PARTY OF THE PA

the manual ration in the land and and and and the

consider the basic 4:1 nmos Inverter. In order to achieve 4:1 zp.4 to zp.d ratio, Rp.4 will be 4 Rp.d and if Rp.d is contributed by the minimum size transistor then

so that the delay associated with the Inverter will depend on whether it is being turned on or off.



Fig:- Nmos Inverter pair delay

consider a Pair of cascaded Inverters, then the delay over the pair will be constant irrespective of the sense of the logic level transition of the Input to the first. This is shown in the fig. below and assuming  $\nu=0.3$  nsec and making no extra allowances for wiring capacitance, overall delay  $\nu+4\nu=5\nu$ .

be a notice of all the sent that a sent was a sent when I was

THE PROPERTY OF

Generally, the delay through a Pair of similar Nmos Inverter

is
$$T_{d} = \left(1 + \frac{z_{P,U}}{z_{P,d}}\right) \gamma$$

$$= \left(1 + \frac{u}{1}\right) \gamma = 5 \gamma$$

$$= 2 \gamma \sqrt{5 \gamma}$$

$$|askn R_{P,U}|$$

$$|askn R_{P,U}|$$

$$|askn R_{P,U}|$$

$$|askn R_{P,U}|$$

$$|askn R_{P,U}|$$

$$|askn R_{P,U}|$$

Figu:- minimum size cmos Inverter pair delay

when considering cmos Inverters, the Nmos ratio rule no longer applies, but we must allow for the natural resistance asymmetry in Rs between Pull-up and Pull-down devices usually equal size.

fig (2) shows the theoritical delay associated with a Pair of minimum size Lambda based Inverters. Note that the capacitance (=20cg) is double that of the Nmos Inverter Since the Input to a cmos Inverter is connected to both transistor gates. Note the allowances made for different channel resistances.

The Asymmetry of resistance values can be eliminated by Increasing the width of P-device channel by a factor 2 to 3.

Note that gate Input capacitance of P-device transistor Increased by same factor.

cros Inverter delay estimated by splitting the output transitions into rise time the and fall time the corresponding to changing and dischanging of the capacitive load cu.

#### i) Rise time estimation :-

assume that P-device stays in saturation for the entire charging period of the load apacitor of.

The current may be modeled as shown in fig. below

saturation current for p-type transistor is given by

This current charges of and magnitude approximately constant,

Vout = 
$$\frac{T_{dsp}t}{c_L}$$
  
=)  $t = \frac{V_{out}c_L}{T_{dsp}}$  — ②



Fig: - Rise-time model

$$\frac{P_{\gamma}}{P_{p}} = \frac{\frac{2 V_{DD} C_{L}}{P_{p} (V_{DD} - 10.2 V_{DD})^{2}}}{\frac{2 V_{DD} C_{L}}{P_{p} (0.8 V_{DD})^{2}}} = \frac{\frac{2 V_{DD} C_{L}}{P_{p} (0.8 V_{DD})^{2}}}{\frac{2 V_{DD} C_{L}}{P_{p} (0.8 V_{DD})^{2}}}$$

$$\frac{\gamma_{\gamma}}{\beta_{P}} = \frac{3C_{L}}{\beta_{P}} \frac{3C_{DD}}{\gamma_{DD}} - 3$$

#### 2. Fall-time estimation :-

similar reasoning can be applied to the discharge of CL through the n-transistor. The circuit model as shown in fig. below



to many a state of any and a

Hence, the fall time

$$\frac{\gamma_{f}}{F_{n}} = \frac{3C_{L}}{F_{n} V_{DD}} - 0$$

Fig: Fall-time model

From eq 3, 4) we deduce that

But Mn = 2.5 Mp hence Bn = 2.5 Bp, so that the rise time

is slower by a factor 2.5 when both the n and p-devices are minimum sizes.

To achieve symmetrical operation using minimum channel length we need to keep cp = 2.5 cm and minimum size Lambda based geometries this would result in Invertentaving Input capacitance of IDcg (n-device) + 2.5 Dcg (p-device) = 3.5 Dcg in total.

This is simple model is quite adequate for most Practical situations, but it should be recognized that it gives optimistic results. It provide rise time and fall time as

- 1. Ty and if are proportional to 1/VDD
- 2. Ty and te are proportional to ci
- 3. My = 2.5 Mp for equal n-and p-transistor geometries.

Driving Large capacitive Loads:

The problem of driving comparatively large capacitive loads arises when signals must be propagated from chip to off chip destinations.

Generally off thip capacitances,

CL > 104 11 Cg

The capacitances of this order must be driven through low resistances, otherwise excessively long delays will occur.

For driving large capacitive loads, Invertous should present Low Pull-up and Pull-down resistances

It means that mos devices must be designed with Low L:w ratios to have Low resistance values for Zp.u and Zp.d. These channels must be made very wide to reduce resistance value. which consequence makes the Inverten occupy a larger area.

the minimum feature Size which makes Liw ratio large. Hence gate region area Lxw becomes significantly and large capacitance is presented at the Input. which in turn slows down the rates of change of voltage which takes place at the Input.

This situation can be Improved by using Neascaded Inverters, each one of which is larger than the preceding stage by a width factor of as shown in fig.



Fig:- Driving large capacitive Loads

The capacitive load presented at the Inverter Input Increases in Proportion to the Increasing width and area also Increases.

Let DVin Indicate a logic o to 1 transition and TVin Indicates Logic 1 to 0 transistion of Input voltage Vin.

Delay per stage = fr for DVin = 4fr for DVin

Total delay per Nmos Pair = 5ft.
similarly delay per cmos Pair = 7ft.

Let  $Y = \frac{C_L}{Dcg} = f^N$ 

f, N -> Independent values.

Henry English William S. Their ages

Ji 107 9 - 1 16 " " 195'8 28, 1]

Elita man mark hallon

1.00 6

To determine value of if which will minimize the overall delay for given value of y.

ln(Y) = N ln(f)

N= In(y)

N> even

73 Photo

Total delay = Ng stN = 2.5 Nf N (NMOS)

 $(0.7) = \frac{N}{2}7fN = 3.5 NfN (cmos)$ 

In all cases, delay & NFT = Incy) Fr

Total delay is minimized if 'f' assumes the value e.

Assume f=e,

Noiof stages N= In(4)

FOY N EVEN, ty = 2.5 ent (nmos) = 3.5 ent (cmos)

Nodd, 
$$t_d = [2.5 (N-1) + 1]e^{\gamma t}$$
 (NMOS)  $t_d = [3.5 [N-1) + 2]e^{\gamma t}$  (cMOS)

and
$$t_d = [2.5 (N-1) + 4]e^{\gamma t}$$
 (NMOS)  $t_d = [3.5 (N-1) + 4]e^{\gamma t}$  (NMOS)  $t_d = [3.5 (N-1) + 5]e^{\gamma t}$  (CMOS)

### 2. super buffers:

Theore are 2 types 1. Inverting type nmos super buffer 2. Non-Inverting type Nmos super buffer

# 1. Inverting type Nmos super buffer:

consider a positive going logic transition Vin at the Input, it will be seen that the Inverten formed by T, To is turned on and gate of To is pulled down



toward o' voits with a small delay. Fig:- Inverting type Nmas super buffer Thus To is cut off while To is turned on and octput is allowed pulled down Quickly.

Now consider the opposite transition: when Vin drops to 'o' volt, then gate of T3 is allowed to rise exickly to VD.

Thus Ty also turned off by Vin, T3 is made to conduct with VDD on its gate, in twice the average voltage that would appear if the gate was connected to source in the conventional Nmos Inverter. Now, as Ids x ys, doubling the effective Vgs, Increases the current and reduce the delay in charging at load capacitance of the output, so more symmetrical transitions are achieved.

2. Non-Inverting type Nmos Super buffer:- 0 1

To gain an Idea of the effective ness of super buffers designs, we note that the structures fabricated in 54m technology are capable of driving capacitance of 2pp with rise time of 5msec.

IlP olp



Fig:- Non- Inverting type Nmos Super buffer

#### 3. Bicmos drivers:-

Bicmos technology presents the Possibility of using bipolar transistor drivers as the output stage of Inverter and logic gate circuits.

In bipolar transistors, there is an exponential dependence of the collector current Ic on base to emitter voltage VBE. Hence the bipolar transistors operated with much smaller Input voltage swings than mos transistors and still switch larger currents. only a small amount of charge must be moved during switching.

Another consideration in bipolar devices is that of the temperature effect on Input voltage Vbe. Vbe dependent on base width wb, doping level NA, electron mibility un and collector-current Ic, it is linearly dependent on temperature. Now the temperature dependences across an Ic are not very high Thus the Vbe values of bipolar devices spread over the chip remain matched and donot differ by more than few millivolts.

The switching performance of a bipolar transistor having a capacitive load can be analyzed with help of equivalent circuit as shown in figure



Fig:- Driving ability of bipolan transistor

The time st revuired to change the output voltage Yout by an amount evual to Input Voltage Vin is

c<sub>L</sub> > Load capacitance

<sub>m</sub> > Transconductance

The value of st is small because transconductance (9m) is higher.

The delay due to bipolar transistor reveals that it has a components Tim and TL

- i) Tin: To charge the base emitter Junction of bipolar transistor.

  The time is typically 2ns for Bicmos transistor based driver.

  Charge driver require Ins for Tin to charge Input gate capacitance.

  In case of GaAs driver is almost 50-100 ps.
- ii) IL:- Time revuired to charge output load capacitance cland equals (VIId) (1/hpe) cl. This is less for bipolar driver by a factor he as compared to mos drivers.



#### propagation delays :-

#### 1. cascaded pass transistors :-

The pars transistors used as parallel (or) series combination of switches in logic arrays.

consider a chain of 4 pass transistors connected in series, the gate of each transistor connected to VDD (109ic1).



Fig:- propagation delays in pass transistor chain

Apply KCL at node 1/2,

J. Spedinger Specie

$$c \frac{dV_2}{dt} = I_1 - I_2 = \frac{(V_1 - V_2) - (V_2 - V_3)}{R}$$

Assume that there are large no of pass transistors in series. Then equation reduces to

$$RC \frac{dV}{dt} = \frac{d^2V}{dx^2}$$

where,
R→ Resistance per unit length

c→ capacitance per unit length

x→ distance along the Network
from Input

(12)

The propagation time  $r_p$  for a signal to propagate a distance x is  $r_p \propto x^p$ 

Let define variables rand c such that R=rRs and c=clicy are lumped Network elements Rand c. Then total network element is

Actually 'r' is relative resistance per section in terms of Rs and c. Total time delay r, for 'n' section is

$$\left[\frac{\gamma}{d} = \gamma^{\gamma} C \gamma (\gamma)\right]$$

If 'n' Increases, total delay Increases and in Practice no more than upass transistors connected in series. If the number can exceeded a buffer is Inserted between each group of 4 Pass transistors.

# 2. Design of Long polysilicon wires:

For long polysilicon wires, use buffers mainly

i) signal propagation is speeded up ii) Reduction in sensitivity to noise

Introduction of delays in signal Propagation makes the signal move susceptible to noise as shown in fig. below

In diagram, the slow rise time of a signal at the Input of Inverter means that the Input Voltage spends a long time in the Vicinity of Vinu so that Small disturbances due to noise will

The fact of the first of the state of the st

switch the Inverter state between 'o' and 'I' as shown in output.



Hence it is necessary that long polysilicon wires by suitable buffers to avoid the effects of noise and to speed up the rise time of propagated signal edges.

### wiring capacitances:

The Various sources of capacitance that contribute overall wiring capacitance are

- 1. Fringing fields
  - a. Inter layer capacitances
  - 3. Peripheral capacitances

## 1. Fringing fields:-

It can be major component of the overall capacitance for Interconnect wires.

For fine line metallization, the value of fringing field capacitance (Cff) can be same order as the area capacitance.

Thus cfe should be taken into account if accurate prediction is needed

$$c_{ff} = \xi_{io_2} \varepsilon_{ol} \left[ \frac{\pi}{2n \xi_{1} + 2d (1 + \sqrt{1 + \frac{1}{d}})^2} - \frac{t}{4d} \right]$$

where l-> length

to thickness of cuire

d > wire to substrate separation

Total capacitance, 
$$c_w = c_{area} + c_{ff}$$

#### White of the mat thinger is on the said 2. Inter layer capacitance:-

There is a chance to exists a capacitance between the layers due to parallel plate effects. 10-10 coli 3

This capacitance will depends on the Layout, whether the layers cross (or) when one layer underlies another

For regular structures it is readily calliclated and contributes significantly to the accuracy of circuit modeling and delay caluclations. nicing conditional algorith throat proversh again

# 3. peripheral capacitance:-

The source and drain n-diffusion regions forms Junctions with the p-substrate (or) p-well at well defined and uniform abstract and event assistant demand depths.

Similarly for P-diffusion regions in forms Junctions with the n-substrate (or) n-well at well defined and uniform depth s.

For diffusion regions each diode thus formed has associated with it a Peripheral capacitance of diffusion region to substrate; smaller the source (or) drain area greater the relative value of peripheral capacitance.

Total diffusion capacitance is given by

Fan-IN and FAN-OUT:

FAN-IN: - The no. of Inputs to a gate is called FAN-IN

FAN-OUT: - The no. of gates and length of metal tracks connected to its output

Choice of Layers:

The following rules must be considered for the proper choice of layers

- is except for very small distances, polysilicon should not be used for routing you and yss.
- ii) long lengths should be used only after careful considerations because Polysilicon layer has relatively high value of Rg.
- iii) VDD and Vgs (GND) must be distributed only on metal layers. This is also due to Rs Value

### scaling of mos circuits

scaling models and scaling factors:

scaling means reduce the dimensions (size) of the mos transistors.

There are 3 Scaling models

- 1. constant field scaling
- 2. constant Voltage saling
- 3. Lateral Scaling ...



Fig:- scaled nmos transistor

#### 1. constant field Scaling: - 22 and the state of the

All the parameters in mosfET are scaled by the factor & except supply voltage you and gate oxide thickness tox.

## 2. constant Voltage scaling:

The supply voltage boo and gate oxide thickness tox are 1 - 100 de 0 scaled down by B.

# 3. Lateral scaling:-

This is combined model of constant field and constant voltage scaling.

# scaling factors for device parameters:

1. Grate area (Ag):

L> channel width

the character of heart, some south

police of willy dreder to the

Ag is scaled by 1/2

2. Gate capacitance per unit area Co (or) Cox:

$$c_0 = \frac{\varepsilon_{ox}}{D}$$

where  $\epsilon_{ox} \rightarrow Permitivity of gate oxide to$ 

D -> gate oxide thickness

$$c_0$$
 is scaled by  $\frac{1}{VB} = B$ 

3. Gate capacitance (cg):-

G > B, WL > L

$$c_g = \frac{\beta}{\alpha^{\gamma}}$$

cg is scaled by B

torisional water je scaling.

A Cortions of a thirt includes

of pd such helos

forther harting

Ax > Area of depletion region d > depletion width

Ax is scaled by 1/22 d is Scaled by 'k'

$$c_{\chi} = \frac{1}{\alpha^{\gamma}}/\frac{1}{\alpha} = \frac{\kappa}{\alpha^{\gamma}} = \frac{1}{\alpha}$$

5. carrier density in channel (Quon):-

Ovon is average change per unit area in the channel in ON State.

of policed by P

co scaled by B Ygs scaled by 1/B

iscaling factor (Non) = Big =1 enations factor (Jase) = P(+)

channel resistance (Ron):-

4-4 covier mobility in channel

$$f_0 = \frac{\omega}{L} \cdot \frac{4 c_0 v_{0D}}{c_g}$$

$$J = \frac{I_{dss}}{A}$$

A -> cross sectional area which is scaled by 1/2"

di di balan ai

: Scaling factor (J) = 
$$\frac{1/\beta}{1/\alpha^{\gamma}}$$
 =  $\frac{\alpha^{\gamma}}{\beta}$ 

## 10 Power dissipation per gate (Pg):



power dissipation per gate consists of atypes in Static and dynamic.

static power dissipation takes place when device holds a particular state Dynamic power dissipation takes place when device changes its state.

Pg (dynamic) scaled by 
$$\frac{1}{\sqrt{p}} \cdot \frac{\sqrt{r}}{p} = \frac{1}{p^{r}}$$

rof somboni prilar.

- T all brokenson

en hand eight midden in of halver entre ditional halberton it flore 1. switching energy per gate ( ):-

$$E_g = (V_{00})^{\gamma} \cdot \frac{c_g}{2}$$

Power dissipation per unit area (&):-13.

$$a = \frac{p_9}{p_9}$$

V. B. B. V. J. b.

: saling factor for 
$$R_0 = \frac{(-\frac{1}{12})}{(-\frac{1}{2})}$$

14. speed Power Product (Pr):-

: scaling factors for PT = pr. Br.

Breezewich enabled the Jest and

The transfer (Sincore (La))

Limitations of scaling:-

is substrate doping:-

Substrate doping level has a direct relation with the built in Potential Counction potential) VB. consider VB is Small as compared with VDD.

Scaling factors for substrate doping:

scaling Involves reduce the charmel length of nos transiste. The depletion region widths must also be scaled down to prevent the source and drain depletion regions from meeting.

Depletion region width d'is

where

Esi > Relative Permitivity of silicon = 12

Es > Permitivity of free space = 8.85 x1014 F/cm

V> Effective Voltage = Va+VB

4 > Applied voltage (maximum value = VDD)

VB > Built-in Potential (Junction Potential)

a > change of electron

My - doping level of substrate

and 
$$V_B = \frac{kT}{v} ln \left( \frac{N_B N_D}{n_i v} \right) - 2$$

ND -> donar concentration of source/drain n; -> Intrinsic covier concentration

For the combined voltage and dimension saling model applied riniand of 19000 per to transistor, m-> Real number,

scaling factor for elepholica colding

31 - E = xo. d  $V = \frac{1}{4} + \frac{1}{8}$   $V = \frac{1}{8} + \frac{1}{8}$ 

If the applied voltage Va is scaled by 1/13 we have

$$= \frac{m V_B + \beta V_B}{\beta} = \frac{(m+\beta) V_B}{\beta}$$

From ears, 
$$V_B = \frac{V}{m+1}$$
 — 6

substitute ev6 in 3

=) 
$$\frac{1}{s} = \frac{m+\beta}{\beta(m+1)}$$
  $\rightarrow$  scaling factor

Vs is the effective scaled voltage across depletion region.

No should scaled by  $\frac{\alpha^{\nu}(\beta+m)}{m+1}$  so that scaling factor for d is mider in the seal to see the

1/2.

scaling factor for depletion width:-

from ev. O, as NB Increased to reduce the depletion width and Increases the threshold voltage 4.

maximum electric field Induced in the Junction is

If NB is Increased by a factor & and if \20, then VB Increased by Ina and dis decreased by Ina

Therefore, the electric field E' across the depletion region is Increased by Ja/end and will reach critical level Forit with Increasing Ng.

The maximum fig (a) shows the depletion width it as a function of substrate concentration NB and supply voltage VDD. The dashed lines Indicates the max. depletion width for Emax = Ecrit. adaptil servente e pl se andresse e Fromer

$$d = \int \frac{2 \varepsilon_0 \varepsilon_{si}}{N_B q} (v) - 8$$

sub. ear 6 in 8

$$\begin{cases}
:: E_{max} = \frac{2V}{d} \\
=) V = E_{max} d$$

sub ear in @



substrate concentration

In Fig (w), the region above dashed line is that whom the Increased electric field 'E' will induce breakdown.

The maximum electric field Emax in depletion layer as a function of No as shown in fig (b).



Fig (b): d and E Versus substrate doping (NB)

From fig.(b), Any applied voltage more than 1/20 causes breakdown at lower values of NB.

# ii) Limits of miniaturization:-

The minimum size of a transistor is determined by

- in the physics of the transistor
  - ii) The technology Involved in fabrication process.

The reduction of device geometry currently dependson alignment accuracy and resolution of Photolithographic technology the limit on feature size is at 0.3 4m which may be reduce the Size using E-beam technology.

The size of transistor is defined in terms of channel length The time for an electron to travel from source to drain 1. depends on channel length L.

Bryo T world Housest 1 (U) F (i) The maximum carrier drift velocity is approximately equal to Librates of Uniter Cornect and Contact Desire Sab.

The minimum transit time corresponds to minimum size formula been a smaller dalate will transistor for Va=oV.

Transit time as a function of NB and L is shown in fig 3(a), (b)



Fig 3(a) assumes transistor size L=2d with zero space between source and drain depletion regions.



Fig 3(b): Transit time 1 versus L.

# iii) Limits of Interconnect and contact resistance:

The width, thickness and spacing of Interconnects are scaled by 1/2, cross-section areas must scaled by 1/2.

For short distance Interconnections the conductor length is scaled by 1/2, so sheet resistance is Increased by x. For constant field scaling, current I is scaled by 1/2 so that IR drop remains constant as a device is scaled.

to use optical Interconnection techniques where a very high level of Integration is revuired for high speed circuits. To use techniques of optical fibers, Later diodes, receivers and

Amplifiers must be Included in the Integration circuit. performance will vary with motal materials used. But, rough estimations can be made for with metal Interconnects.



The propagation delay Tp along a single aluminum Interconnect can be calculated from

$$T_{p} = R_{int} c_{int} + 2.3((R_{on} c_{int}) + (R_{on} c_{L}) + (R_{int} c_{L}))$$
where  $T_{p} = 2.3(R_{on} + R_{int})^{C}$  int

where L, ward H are length, width, Height of Interconnect P-> Resitivity of Interconnect

Pon > Transistor on resistance

Rint -> Resistance of Interconnect cint -> capacitance of Interconnect

tox -> Thickness of dielectric oxide

Eox = 3.45 x 105 PF/4m -> Permitivity of sion

consider P=3410-cm for Aluminium, tox = 0.814m for thick oxide L=1cm, w=34m, H=14m then Tp is given by

Tp = (a.3x5kN+ 0.1 KN) 2.5x1012 = 29 nsec

optical fibers used to replace metal Interconnects in critical applications. From fig, Rint, cint may be assumed to zero.



Fig:- Electro-optical Interconnection

Tp = 2.3 Ron CL + tlaser + tint + trec

where, c, > Input capacitance of laser diode

taser > Delay time through laxer diode

tint > propagation delay time along the optical fiber

Interconnect

trec -> Receiver delay time

tint = nL where L-> length of fiber

n > Refractive Index of optical fiber

material

c > Speed of light

Laser diodes and receivers are high speed devices raving self-delays arround 100 psec. The refractive Index is between 1.5 and 2. capacitance of discrete laser dide about IPF. The propagation delay using these values,

$$T_p = 2.3 \times 5 \times 10^3 \times 1 \times 10^{12} + 1 \times 10^{10} + 2 \times 10^{4} + 1 \times 10^{10} = 11.7 \text{ nsec}$$

The propagation delay against varying length 'L' of Interconnects are shown in below fig (a) & propagation delay against width of Interconnect as shown in fig (b) 13 Podlov rush Amond





The subthreshold current I sub a cos- VE) W/KT

The transistor is in off state, then the Ygs-Ve is -ve and should be large as possible to minimize Isub.

with the scaling voltages,  $(v_{gs}-v_t)/kT$  reduces due to which subthreshold current Increases. To avoid this, both  $v_{gs}$  and  $v_t$  may be scaled along with  $v_{bo}$  by a larger factor. However, this causes electric field Strength to Increase and there by lower break down voltages.

we derive that  $E_{max}$  is scaled by  $\alpha(\beta+m)/\beta(\ell+1)$ . Junction break down voltage BV is given by

Thus scaling factor for BV is  $\beta(+1)/\alpha(\beta+1)$  and decrease with scaling.

Limits on Logic levels and Supply voltage due to noise:-

main advantages in the scaling of devices are Smaller gate delay time, higher operating frequencies and Lower power dissipation.

scaling is accompanied by decreased Inter-feature spacing and greater switching speed.

is given by

Rn -> equivalent noise resistance at Input

Af -> Bandwidth

when transistor works in saturation,  $g_m$  does not have a linear relation ship with gate voltage  $V_g$ .

where 
$$B = \frac{\omega 4 c_{ox}}{L}$$

Vp is Pinch-off voltage given by

$$v_p = v_9' - \frac{1}{2} \left( \frac{\alpha}{c_{ox}} \right)^{\gamma} \left[ \frac{(1+4)^{1/2} c_{ox}}{\alpha} - 1 \right]$$

domination of the second state of the first

where, 
$$v_g' = v_g - v_E + v_B$$

$$\alpha = (2 \, \varepsilon_s; \, q \, N_B)^{1/2}$$

Equivalent resistance Rn is given by

$$R_{m} = \left(\frac{1 \sqrt{9}}{2 \sqrt{p}} + \frac{1}{6}\right) 9_{m} - 1$$

where v' = Yp+ VB

since  $V_p$  is a monotonically decreasing function of gate oxide thickness  $t_{ox}$  and substrate doping  $N_B$ .

Thermal noise Rn 2m is given by

Thormal noise Rn Im directly dependent on tox and NB and some what on Vg. This is graphically shown in fig.



Fig:-a)Thermal noise versus oxide thickness



fig: b) substrate doping

In constant field scaling model,  $V_g$  is scaled by  $V_{lx}$ ,  $V_g$  and  $V_{ox}$  are scaled by  $v_{ox}$ . Hence the product of Rn  $V_{ox}$  is decreased to Increased Value of  $V_{ox}$ . As a result, the ratio of Logic level to thermal noise under Joes degradation by almost same factor.

Another type of noise is flicker noise which is the result of fluctuations of carriers tapped in channel by surface states. current fluctuations Di at the output is given by

where  $S = dn_{t}|dn$  is the surface state efficiency I = D.c drain current  $V_{d} = Applied drain voltage$ 

's' is a process dependent factor, the flicker roise has a scaling factor of one for constant field scaling (or) a process or combined scaling model.

Another noise sources considered those occurring due to mutual Inductive and mutual capacitive coupling and Locuest usual operating voltages. There can occur a cross talk between

a parallel signal lines on a chip. The cross talk noise Increases as operating frequency Increased and ty, the rise time of coupled signal, is reduced.

The designer has to provide precautions against other noise sources due to external Influences, such as radio frequency signals, unterminated signal lines and lines with non-uniform Impedance characteristics, voltage drops on Power lines (or) ground connections etc.

one serious effect of scaling down is that it enhances the effect of Internally and externally generated noise which degrades both reliability and production of high density chip Layouts.

Limits due to current density:-

most widely used material for Interconnections in visit chips is high purity aluminium. Aluminium has high conductivity scaling down of dimensions also Increases the current density in Interconnects by same factor for the constant field scaling.

Interconnects by same factor for the constant field scaling.

hence current density through Interconnects Increases.

The d current density of aluminium approaches 106 Amps/cm² the Inter connects are likely to be burned off owing to metal migration. current densities are set well below this limit and figures of J=1 to 2 ma/4m² are preferred.

Switch Logic is based on the pustransistor (or) on transmission gates. This approach is fast for small arrays and takes no static current from the supply rails.

Basic And and or connections are set out but many combinations of switches are possible.



(Voca Logic levels will be degraded by \frac{1}{2}

effects)

The transfer of the control of the

Vout = ? when A.B.C.D.E.F.G.H=1





Note the avangement to satisfy both logic'l' and logic b' states

pass transistors and Transmission gates:-

Citable N in home for

switches and switch Logic may be formed from simple n- (07) P- pass transistors (on) transmission gates. comprising an P-pass transistor in parallel as shown in figure. n-pass and



is driven from another pars transistor

when using Nmos switch Logic, there is one restriction which must always observed: No Pass transistor gate Imput may be driven through one (or) more pass transistors as shown in above figure.

Logic levels propagated through pass transistors are degraded by threshold voltage effects.

Grate Logic (Restoring Logic):-

#### i) Inverten

The Nmos Inverter, having zp.u/zp.d ratio and/or the channel length to width for mos transistor as shown in figure.



a) circuit symbols (Note:- n-and P-transistors assumed to be min-size unless stated otherwise)



b) Logic Symbols

Lavadrana that I make strail a said 1.2 - 1 1 1



Overall ratio = 4/w

## Two-Imput nmos, cmos and Bicmos NAND gates



103V

$$z_{p,u} = L_{p,u} | w_{p,u} = 8$$
 $R_{p,u} = z_{p,u} \times R_S = 80 \text{ keV (Nmos)}$ 

Similarly,

 $R_{p,d} = z_{p,d} \times R_S = 10 \text{ keV}$ 
 $z_{p,u} = z_{p,u} \times R_S = 10 \text{ keV}$ 

Power dissipation (on)  $z_{p,u} = z_{p,u} + z_{p,u}$ 
 $z_{p,u} = z_{p,u} \times R_S = 10 \text{ keV}$ 
 $z_{p,u} = z_{p,u} \times R_S = 10 \text{ keV}$ 
 $z_{p,u} = z_{p,u} \times R_S = 10 \text{ keV}$ 
 $z_{p,u} = z_{p,u} \times R_S = 10 \text{ keV}$ 
 $z_{p,u} = z_{p,u} \times R_S = 10 \text{ keV}$ 
 $z_{p,u} = z_{p,u} \times R_S = 10 \text{ keV}$ 
 $z_{p,u} = z_{p,u} \times R_S = 10 \text{ keV}$ 
 $z_{p,u} = z_{p,u} \times R_S = 10 \text{ keV}$ 
 $z_{p,u} = z_{p,u} \times R_S = 10 \text{ keV}$ 
 $z_{p,u} = z_{p,u} \times R_S = 10 \text{ keV}$ 
 $z_{p,u} = z_{p,u} \times R_S = 10 \text{ keV}$ 
 $z_{p,u} = z_{p,u} \times R_S = 10 \text{ keV}$ 
 $z_{p,u} = z_{p,u} \times R_S = 10 \text{ keV}$ 
 $z_{p,u} = z_{p,u} \times R_S = 10 \text{ keV}$ 
 $z_{p,u} = z_{p,u} \times R_S = 10 \text{ keV}$ 
 $z_{p,u} = z_{p,u} \times R_S = 10 \text{ keV}$ 
 $z_{p,u} = z_{p,u} \times R_S = 10 \text{ keV}$ 
 $z_{p,u} = z_{p,u} \times R_S = 10 \text{ keV}$ 

Fig: - 8:1 Nmos Inverter



$$Z_{p,u} = L_{p,u}/\omega_{p,u} = 4$$
 $R_{p,u} = Z_{p,u} \times R_{s} = Lokn \text{ (Nmos)}$ 
 $Similarly,$ 
 $R_{p,d} = Z_{p,d} \times R_{s} = 5kn$ 
 $Power dissipation \text{ (on) } Q = \frac{v^{\nu}}{R_{p,u} + R_{p,d}}$ 
 $= 0.56m\omega$ 

Input capacitance = 2.05

Fig: - 8:1 Nmos Inverter (Alternative)

below.



a) circuit diagrams



b) Logic symbol

consider the simple circuit model of the gate in the condition when all n-pull down transistors are conducting as shown in fig. below

Grand Sing La

zpd applies for any one Pull-down transistor.

The boundary condition is,



skable Barrian product

(13)

Fig: Nmos NAND ratio determination

DO'DE CAND

The Pull-up to Pull-down ratio must be 4:1

It contains a significant factors

1. NMOS NAND gate area revuirements are considerably greater than Nmos Inverter, since not only Pull-down transistors added in series to provide desired no of Inputs, but as Inputs added, so must there be a corresponding adjustment of length of pull-up transistor, channel to maintain, required overall ratio.

2. NMOS NAND gate delays are increased in direct proportion to the NO. OF Imputs added.

If each Pull-down transistor kept to minimum size (22×27) then each will present 100g at its Input.

ogy that - it - the

The delay associated with Nmos NAND are

The Straig Arrest out to the the n -> No. of Imputs Inv -> Nmos Inverter delay

Two-Input Nmos, cmos and Bicmos NOR gates:-







b) Logic symbol

(and and section ) with the middle

The area occupied by the NMOS NOR gate is reasonable since the pull-up transistor dimensions are unaffected by the no. of Inputs.

In cmos NOR gate, consists of a pull-up transistor based structure which implements the logici conditions, and a n-transistor implements the logic o conditions at the output.

For Bicmos NOR gate most useful where a large fan-out is required.

other forms of cmos Logic: mainly consists of

1. pseudo- Nmos Logic

tarta con .

- a. Dynamic cmos Logic
- 3. clocked cmos (c<sup>n</sup>mos) Logic
- cmos domino Logic
- 5. n-p cmos Logic

#### 1. Pseudo-Nmos Logic:

If we replace the depletion mode pull-up transistor of standard Nmos circuits with a p-transistor gate connected to Vss.

natio, we consider the avangement



Figo: - Pseudo - Nmos Logic

as shown in fig@ is which pseudo-Nmos Inverter is being driven by another Similar Inverter.

For Nmos analysis, we consider  $V_{inv} = \frac{V_{DD}}{2}$ 

At this point n-device is in saturation (0< ysn-ven< dsn)

and p-device is operating in resistive region (0× ysp < ygp - Vep)



Fig: - pseudo - nmos Inverter when driven from a Similar Inverter

Equating currents of the n-transistor & P-transistor,

$$\frac{V_{inv}}{v_{inv}} = \frac{V_{inv}}{v_{inv}} + \frac{(24p(4n)^{1/2}[(-V_{DD}-V_{in})V_{inv})V_{inv}^{2}}{(2p.4(2p.d)^{1/2})^{1/2}}$$

where  $z_{p.u} = 4plwp$ 

Zpd = Lm/wn

we obtain 
$$\frac{2p.u}{2p.d} = \frac{3}{1}$$

The channel sheet resistance of the P-Pullup is about 2.5 times that of the n-Pull down, and allowing for the ratio 3:1,

## 2) Dynamic cmos Logic :\_\_\_\_\_

1. charge sharing may be problem unless the Inputs are constrained not to change during the on period of the clock.

2. Single phase dynamic Logic structures can not be cascaded since, owing to circuit delays, an Incorrect Input to next stage may be present when evaluation begins, so cutput is discharged and wrong output results.





Fig: Type 3 avangement

Fig: - Dynamic cmas Logic three-Input NAND gate

p-transistor is used for the non-time-critical pre charging of the outline 'z' so that output capacitance is changed to You during the off Peniod of the clock signal of.

To avoid erroneous evaluations, the gates must be connected in allowable servuences as shown in table.

| Gate type    | Evaluate clock         | Transmission gate clock | Allowable next types |
|--------------|------------------------|-------------------------|----------------------|
| TypeI        | $\overline{\phi}_{34}$ | Pul                     | Types 2 (08) 3       |
| Typea        | Pul                    | A <sub>2</sub>          | Types 3(08)4         |
| Type 3       | Ø12                    | Ø <sub>23</sub>         | Types 4 (07) 1       |
| Type4        | Ø <sub>2.3</sub>       | ø <sub>34</sub>         | Types 1(07) 2        |
| 1 9/3/22 (-) |                        | N                       |                      |

### 4. Chip Input and output circuits

#### Introduction :-

- The Ilo circuits, clock generation, Distribution circuits are essential to VISI chip designi
- The design Quality of these circuits is a critical factor that determines Reliability, Integrity and Interchip communication speed of chip.
- → If the Package is considered a primary protection layer of silicon chip. Ilo frame, clock circuits are secondary protection layers.

and of the many Manner of the

That per to the till the property

should be filtered out before Propagating to Internal circuits for their protection.

### ESD Protection:

Electro Static discharge is one of the most prevelant cause for chip failures in both chips manufacturing and field openings.

ESD occurs when the charge stored in machines (or) humans discharged to chip on contact (or) by Static Induction.





a) Human Body model (HBM) b) machine model (MM)



c) charged - Device modet (CDM)

1. HBm :-

- -> A human walking across synthetic carpet in 80% of relative humidity can Potentially Induce 1.5 KV of Static Voltage stress.
- > In HBM mode, touch from a charged person's finger is simulated by discharging loops capacitor through a the second of th 1.5 KN resistor.
- A protection Network must be Inserted into Ib circuits of chip so that the ESD effect can be filtered out before it is propagated to Internal Logic circuit. Torrest State of the terms of the to the telephone to the

## Protection Network

The Protection Network consists of a diffused resistor diode structure with evulvalent circuit is



The Input resistance is normally 2000 to 3km

The diades clamp the signal level with in a certain Voltage rang in order to minimize ESD.

MANUAL TO THE PARTY OF THE PARTY.

To Protect the diode structure, the current through the diode is limited to less than several tens of milliamperes. For this purpose, polysilicon series resistors are used but they are damaged due to dielectric break down at high electric fields.

# 2. mm (machine model):

In this model, the body resistance is absent, contact with machines can be higher stress.

The ten attended to out the second water

## 3. charged device model (com):

Electro static charge builds up on a chip due to Improper grounding and then discharges when a Low-resistance The A Charles While While path becomes available.

-> simplified HBM ESD and mm ESD Lumped circuit models are shown in below.



| component | HBM -                     | mm                                                                                                   |
|-----------|---------------------------|------------------------------------------------------------------------------------------------------|
| Cc (PF)   | 100                       | 200                                                                                                  |
| Rs (N)    | 1500                      | 25                                                                                                   |
| Ls (MH)   | 5                         | 2.5                                                                                                  |
| Cs (PF)   | . 1                       |                                                                                                      |
| Ct(PF)    | 10                        | 10 to the beginning                                                                                  |
|           | Co (PF)  Co (PF)  Co (PF) | C <sub>c</sub> (PF) 100<br>R <sub>s</sub> (N) 1500<br>L <sub>s</sub> (MH) 5<br>C <sub>s</sub> (PF) 1 |

This is overcome by the use of additional thick oxide transistions as shown in figure. of satisfy of the description of the state of the same

In this circuit,

m1 → Thick oxide punch through device M2 > Thick oxide Nmos transistor

m3 -> Thick oxide N-mos transistor in saturation make

For positive Imputs mi, m2 have threshold values of 20 to 304



Fig:- Protection Network with thick oxide transistor.

## ESD failure modes:- >

The below figure shows ESD failure modes caused by ESD Induced heat dissipation is an Nmos transistor along with a scanning Electron Microscopy (SEM).



Typical ESD failure modes.

Treation a suit ministry and that that have been a such to

The 11 11 - 25 30

A simple Input circuit consists of transmission Jake

and Enable (E) is shown in fig.



TGI> Transmission Jate

PN > protection network.



A> External (OFF-chip) Input signal

E> Internal (ON-chip) Enable signal

X> Internal (ON-chip) output signal

### conditions

- 1. when E=0, X=A
- 2. when E=1, X = High-Impedance State

The Incoming signal A is fed to transmission gate through protection Network. The Input Pad circuit modules have a built in Internal Pullup (or) Pull down resistor with a resistance of 200 KN to IMN.

For Example,

Inverting Input circuit with TIL- TO- cmos level shift. within the starting of as shown in fig.





Fig: - Inverting Input circuit with Protection N/w





Fig:- Noise margin levels

Fig:-voltage transfer characteristics. m transfer and many a court formation of the same

FOR TTL, The Noise margin levels are VOL=0.8V, VOH=2V For cmos, the noise margin levels are VIL=0.3 VDD, VIH=0.8 VDD

$$NML = V_{IL} - V_{IH}$$

$$NMH = V_{0H} - V_{IH}$$

From Voltage transfer characteristics curve, the Voltages below 0.8V should be Interpreted as Low and voltages above 2V Interpreted as High.

After PN circuit, the Input levels are shifted to a desired level, depends on their voltage levels.

The variations in level-shift voltage threshold transfer characteristics dueto process variations as shown in fig.



pm-NM -> Nominal processing

PH-NL -> strong Pmos, weak Nmos Process corner

PL-NH > weak pmos, strong nmos process corner

\* strong nmos, pmos -> Low Ytho, Low 1/topl; high kn, high kp

\* weak Nmos, Pmos -> High Ymo, High | Ytop |; Low kn, Low kp

## Tristable output circuit as shown in fig. below



If clk=1, Z=D

If clk=0, Z=High Impedance

|   | CLK | D | P | N | Z     |                  |
|---|-----|---|---|---|-------|------------------|
|   | 1   | 1 | 0 | 0 | 1=D   | Trespose Bregara |
| 1 | T.  | 0 | 1 |   | 0=D   | hours again 5    |
| ) | 0   | × |   | 0 | Highz |                  |

The large will requires sufficient current sink (or) source



in the current did and cause significant on-chip Noise problems due to  $L(\frac{di}{dt})$  drop across bonding wire connecting of pad to the package.





Fig:- olp circuit current waveform during switching

$$= \frac{dt}{ds} = \frac{dt}{dt}$$

$$= \frac{dt}{ds} = \frac{dt}{dt}$$

$$\left(\frac{di}{dt}\right)_{max} > \frac{T_{max}}{t_{s|2}} = \frac{2T_{max}}{t_{s}}$$
 (2)

substitute ev 1 in ev 2

For example,

The circuit for output reduces L di noise as shown in fig.



when ST=1, CLK=0 > Inverter of z to VDD

on-chip clock generation and distribution:-

clock signals are the heart beats of digital systems. Hence, the stability of clock signals is highly Important.

Ideally, clock signals should have minimum rise time and fall times specified by duty cycles and zero skew.

In reality clock signals have nonzero skews and rise time & fall times; duty cycles can also vary.

A simple technique for on-chip generation of a Primary clock signal would be use a ring oscillator as shown in fig.



Fig:- A simple on-chip clock generation circuit

The generated clock signal can be ouite process dependent and unstable

To use separate clock chips use in crystal oscillators have been used for high performance. The below fig. shows circuit schematic of pierce crystal oscillator with good frequency stability.



Fig:- Pierce crystal ascillator circuit

3

From above fig, crystal can be represented as series RLC circuit i.e higher series resistance, Lower oscillation frequency.

The Inverter across the crystal provides necessary voltage differential and External Inverter provides Amplification to drive clock Loads.

A simple circuit that generates CLKI, CLK2 from Original clock signals as shown in fig.



The clock decoder circuit that takes in the primary clock signals and generates 4 phase signals as shown in: fig. below.



since clock signals are revuired almost uniformly over the chip area, it is desirable that all clock signals are distributed with a uniform delay.

An ideal distribution Network would be H-tree Structure as shown in fig. below.



Fig: - General Layout of an H-tree clack distribution Network

points are same and hence signal delays would be same. This structure is difficult to Implement in practice due to routing constraints and different famouts requirements.

rolled concellence involves a feet during the

Régardless of the exact geometry of clock distribution (8) Network, the clock signals must be buffered in multiple stages as shown in fig. to handle high famout loads.



Fig: - Three-level buffered clock distribution Network

Hope to the coders of

# Design for testability

Fault: It is defined as the representation of a defect reflecting a physical condition that causes a circuit to fail to perform in a required manner.

Failure: A failure is a deviation in the performance of a circuit (or) system from its specified behaviour and represents irreversible state of component such that it must be repaired in order for it to provide its Intended design function.

# Fault types and models:-

A Fault is a physical defect of one (or) more components

(or) connections of the circuits.



permanent faults: - These are caused by breaking (or) wearing out of a component.

such faults are always present and donot appear, disappear, (or) change their nature during operation.

remporary faults: These faults occur during certain Interval of time. These faults can be either transient (or) Intermittent.

- i) Transient fault: These faults caused by some externally Induced signal perturbation, such as power supply fluctuations.
- ii) Intermittent fault: These faults occurs when a component is in the Process of developing a permanent fault.

# pepending on effect

Faults can be classified as i) logical fault ii) parametric fault

Harris Man Santage (189)

bill reputhers I beginn a major

## Logical fault:-

Logical fault changes the Boolean function realized by the digital circuits.

### Parametric faults:

These faults alters the magnitude of a circuit
Parameter, causing a change in a factor such as circuit speed,
Voltage (or) current.

# Logical faults: - It contains 3 types

- i) Stuck-At faults
- ii) Bridging faults
- iii) Log cross point faults
- i) stuck-at-fault: These faults occurred If a signal line appears to have its value fixed at either a logical o (0%) logical 1, irrespective of Imput signals applied to the circuit.

when the signal line is always at rogical I' the fault is known as stuck-at-one (SAI) and the signal line is at rogical o a the fault is known as stuck-at-zero (SAO)

# ii) Bridging faults:-

These faults occurred if two signals lines are shorted together. The faults may be either AND-of an OR-type of Bridging fault.

# iii) cross point faults:-

These faults occur in Programmable Logic arrays due to extra (or) missing devices (such as diodes (or) transistor) are called cross Point faults.

Based on effect, These are classified into Growth (G) faults, Shrinkage (S) faults, Appearance (A) faults, Disappearance (D) faults.

If a circuit has only one fault at given time, is called. "single fault". If there are 2 (or) more faults in the circuit, the circuit is said to have "multiple faults".

#### Fault models :-

A model for how fault occur and their Impact on circuits is called fault model.

There are a fault models: i) stuck-at faults
ii) stuck-open (or) stuck-short
faults

# i) stuck-at faults:stuck-at zero (SAO), stuck-at one (SAI)

- → stuck-at fault model corresponds to real faults but it does not represent all possible faults.
- The fault models can be deduced by using basic circuits such as AND, OR, Inverter and tri-state buffer.



Fig: - Stuck-At-0 (SAO)

Fig: - Styck-At-1 (SAI)

> stuck-at-faults mostly occur due to shorting of gate oxide (NMOS gate to GIND (OT) PMOS gate to VDD) (OT) metal-to-metal shorts.

These faults are usually referred as "transistor faults". physical facilts which occurs at manufacturing level we called as defects.

The electrical (or) Logic-level faults by Physical defects are referred as defect oriented faults such as open links, Improper semiconductor doping, Bridging faults etc...

Initializa

Ex:- consider cmos NOR Logic

a, a2 > PMOS transistors

as, and > nmos transistors

when I/p A&B are o'

connected to VDD.

ey, or are shorted and oz, ou are

open.

Therefore, when A=B=O, olpic'is

-c 1/0 (OIP)

Fig:-cmas NOR Logic

to a contract the party of

my A=B=1, olp c'is connected to ground ie o'.

> suppose fault a, stuck-open. If A=B=0 then a, a2 are shorted in fault free sircuit but  $\omega_2$  is shorted in faulty as, an are open. hence olp c'is i' in good circuit but is floating (neither bo nor ground) in faulty circuit.

-> The good and faulty states of O/P ic' are denoted by Z & z.

The output mode c has a parasitic capacitance. for detecting a fault, it should be ensured that the value of z is o. It can be done by preceding A=B=0 as Initializing vector to A=1, B=0. This sets of node c to 'o' in faulty circuit by discharging mode capacitance to ground potential.

→ To complete the test another Ilp from 10 to 00 is applied.

It Produces an of 0>1 in good circuit & 0>0 in faulty circuit.

# Example of Stuck at fault:-



It consists of 9 possible modes (au,b,c,d,e,f,g,h,i).

Any of these nodes can be shorted to power supply (stuck at i)

(or) shorted to ground (stuck at o). hence 9 nodes contribute

18 Potential fault sites.

The truth table for fault free circuit (Y) and faulty circuits as given by

· Musicia Phone and there are and endiform for bear

the first transfer of the second state of the second state of the second second

|                      | All lands                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |               |             |     |                  |                                       |            |                                       |       |
|----------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|-------------|-----|------------------|---------------------------------------|------------|---------------------------------------|-------|
|                      | オ, オ2オ                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | 3 000         | 001         | 010 | 011-             | 100                                   | 101        | 110                                   | (11   |
| fault<br>free<br>ole | 7                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | 0             | Joly .      | 0   | 0                | 0                                     | F- 91      | 1                                     | 175   |
|                      | a SAO                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | 0             | Ĩ.          | 0   | 0                | 0                                     |            | 0                                     | 0     |
| f                    | asAl                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 0             | Herri       | 115 | 10° 4 10°        | 0                                     | 1          | I ja Linka                            | ilija |
| Ð                    | bsao                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 0             | l.          | o   | di pala          | 0                                     | Frankley i | 0                                     | · (-) |
| L                    | b SAI                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | O.1 17        | 0           | . 0 | 0                | el L                                  | L          | χ [ <b>1</b> , 1,,                    | 1     |
| y                    | CSAO                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 0             | 0           | 0   | 0                | 0                                     | 0          | ui ska                                | - (   |
| 0                    | C SAI                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | T T           | Î           | 0   | 0                | î.,                                   | , . l. 1   | 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 | 10000 |
| U<br>T               | dsAo                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | o Mi          | 14 6 1 to   | 0   | 0                | 0                                     | 1 132 3    | 0                                     | 0     |
| Р                    | dsAl                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 0             | 1           | 0   | 0                | 1                                     | 1          | 1                                     | 1     |
| U                    | esao                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 0             |             | 0   |                  | 0                                     | T          | to the entire                         | ı     |
| S                    | esal                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | . 0           | , o         | 0   | 0                | 0                                     | 0          | 1                                     | l L   |
|                      | f SAO                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | 0             | 0           | 0   | 0                | 0                                     | 0          | 1                                     | L.    |
|                      | f SAI                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | 0             | ı           | 0   | 1                | 0                                     | 1          | 1                                     | 1     |
|                      | gsAo                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 0             | in corn     | 0   | 0                | 0                                     | The second | О                                     | 0     |
| 0 1                  | 9sal .                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | reg bio it,   | The Address | 1   | l.               | 1                                     | T.         | j.                                    | 1     |
|                      | hsAo                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 0             | 0           | 0   | 0                | 0                                     | 0          | ELIO -                                | 1     |
|                      | hsal                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |               | 1           | I I | 7<br>2<br>2<br>2 | 1                                     | 1          | F SHI                                 | 1     |
|                      | isAo                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 1-1,0 231101  | 0110        | 0   | 0                | 0                                     | 0          | 0                                     | 0     |
|                      | isal                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | MILL OUT      | id Sylve    | L.  | 1                | 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 |            | 1                                     | 1     |
|                      | Charles and the same of the sa | Maria Welling | MIT III     |     |                  |                                       | 376        | Ar arda 3                             | 1     |

Scanned with CamScanner

The task of determining whether a fault is Present (or) not is called "fault detection" and the task of Isolating the fault is called "fault location".

Fault diagnosis: - The combined task of fault detection and fault location is referred as fault diagnosis.

rest vector (TV):- The Input combination which in the presence of a fault produces an output different from fault free output is known as Test vector.

Test set: The set of test vectors used for testing the circuit is called test set.

Fault Simulation: It is the Process of Verifying test set.

Fault coverage: The test set refers to the 1. of faults that can be detected by the test set.

From truth table,

consider node is stuck at 0, i.e connect to ground.

For Input values 110 and node a is stuck at zero produces output'o'

but the fault free circuit as olp'i'. hence we observe that.

but the fault free circuit as olp'i'.

The line dsal, esao, fsal all other faults detect with 2 (or) more test vectors. Test vectors oil, loo must be

Included in any set of test vectors that will obtain 100%.

These 2 test vectors detect total of 10 faults, remaining 8 faults can be detected with test vectors ool & 110; therefore, this single set of 4 test vectors obtain 100%. Stuck-at fault coverage for this circuit.

# controllability and observability (design strategies for testing):-

controllability: It is defined as the capability of a node being driven to 1 (or) o through a circuit Inputs. If this node can be driven faithfully to 1 and 0, it is regarded as controllable.

- > controllability is Important while accessing the degree of difficulty of testing a particular signal in a circuit.
- A mode with little controllability may take hundreds of cycles to get it to right state.
- observability:- It is defined as the capability of logic state of this node being observed at the circuit's outputs.
- > observability is a degree to which it can be observed that node at output operates correctly.
- -> Higher observability Indicates less no of cycles required to measure output node value.
- > Poor observability Indicates serviential circuits with long feedback 100 Ps.

restability is based on a concepts be controllability and observability.

some nodes in sequential circuits can be very difficult to control and observe. for example, most significant bit of an n-bit counter can only observed after 2n-1 clock cycles.

### It contains 3 types

- aritan) Wilderston been Beindraum in 1. Ad-hoc testing
- 2. scan based (or) Level-Sensitive Scan design (LSSD)
- 3. Built-im self-test (BIST)

### 1. Ad-hoc DFT techniques:

Ad hoc testing combines a collection of tricks and techniques that can be used to Increase the observability and controllability of a design and that are generally applied in an application - dependent fashion.



Fig (a): Design with Low testability



Fig (b): Adding a multiplexen (selector)

From fig (a), represents simple Processor with its data memory. under normal configuration, the memory is only accessible through the Processor.

writing and reading a data value into and out of a single memory position revuires a no of clock cycles.

The controllability and observability of the memory can be Improved by adding multiplexers on the data and address buses as shown in fig (b).

buring normal mode, these selectors direct the memory ports to the processor. During test mode, the data and address ports are connected directly to Ilo Pins, and testing memory can proceed more efficiently.

To reduce the extra test Pins, one can multiplex test signals and functional signals on same Pads. For example, the Ilo bus in fig (b) serves as a data bus during normal operation and provides and collects the test patterns during testing.

en extensive collection of ad-hoc testing approaches has been devised. Examples include the Partioning of large state machines, addition of extra test points, provision of reset states, and Introduction of test buses.

most of these techniques depends on application and architecture.

### 2. Boundary scan testing:

Boundary scan as defined by the IEEE std.-1149.1 standard, is an Integrated method for testing Interconnects on Printed circuit boards (PCBS) that are Implemented at Integrated circuit (IC) level.

### Architecture:

The boundary-scan test architecture provides a test Interconnects between Integrated circuits on a board with out using physical test probes.

cells on device Primary Inputs are referred as "Input cells", cells on primary outputs are referred as "output cells". The Input and output is relative to the Internal Logic of the device.



The collection of boundary-scan cells is configured into a Parallel-in, Parallel-out shift register.

A parallel load operation called a capture operation causes the signal values on device Input Pins to be loaded into Input cells and signal values passing from Internal Logic to device output Pins to be loaded into output cells.

A Parallel unload operation called an update operation causes signal values already present in the output scan cells to be passed out through the device output Pins.

signal values already present in Input Scan cells will be passed into the Internal Logic. data can also be shifted around the shift register from Test data in (TDI) and terminating at device output Pin is called Test data out (TDO).

The test clock, TCK, is fed in via yet another dedicated device Imput Pin and various modes of operation are controlled by a dedicated Test mode select (TMS) serial control signal.

The boundary Scan elements contribute nothing the functionality of the Internal cogic. The boundary Scan Path is Independent of the function of the device. The value of scan Path is at board level as shown in fig. below.

Justice from solvel loos of tell-prof or following of Anti-se

Fig:- Boundary scan test (Board level)

From above figure, "Boundary scan devices. There is an edge-content connector Input called TDI connected to the TDI of 1st device.

The from the first device connected to TDI of 2nd solevice, and so on creating a global scan path terminating at edge connector cutfut called TDO. Tck is connected in Parallel to each device. Tck Input.

Tms is connected in Parallel to each device Tms Input.

expected results. Forced test data is serially shifted into Boundary scan cells. All of this is controlled from a serial data Path called "scan path" (07) "scan chain".

Each scan has 4 modes of operation:

- 1. Normal mode: Data\_In is Passed straight though to Data\_out
- 2. update mode: content of the update hold cell is passed through Data\_out.
- 3. capture mode: Data\_In signal is routed to the Input capture

  Scan cell and the value is captured by next

  clock DR. clock DR is derivative of Tck

4. Shift mode: scan\_out of one capture scan cell is passed to scan\_In of the next capture scan cell via a hard-wired Path.

# Advantages of Boundary scan :-

- 1. Shorter test time
- 2. High test coverage
- 3. Increased Diagnostic capability
  - 4. Lower captial equipment cost

# 3. Built-In Self test (BIST):-

BIST is the technique of designing additional hardware and software features into Integrated circuits to

allow them to perform self-testing ie testing of their own operation using own circuits, there by reducing dependence on an external automated test equipment (ATE).

BIST is a design-for-testability (DFT) technique, because it makes the electrical testing of a chip eaiser, faster, more efficient and less costly. This concept is applicable to any kind of circuit.

The general format of a Buit-In self test as shown in fig.



Fig:- General format of a BIST

supplying Test patterns to the device under test and means of comparing the device's response to a known correct sequence.

Stimulus Generator: - It is also called Test Pattern generator.

There are many ways for generating stimuli. most widely used are

- 1. Exhaustive approach
- 2. Random approach

# the to the state of the state o 1. Exhaustive approach:

In exhaustive approach the test length is 2", where 'n' is no of Inputs. In this method all detectable faults will be detected. single-re armonist which aid square gir

ex:- N-bit counter

### tidatrier and among planning this sin Random approach :-

In this method, random testing that Implies the application of a randomly choosen subset of 2<sup>m</sup> possible Input Patterns This subset should be selected so that a reasonable fault coverage red to the late of the same of is obtained.

Ex:- Pseudo random Pattern generator is the Linear feed back shift register (or LFSR). aring home and the pro-

# Response Analyzer:-

The response analyzer compares the generated response 7 10 1777 with the expected response stored in an on-chip memory, but this approach represents too much area overhead to be practical. A cheaper technique is to compress the responses before comparing them.

storing the compressed response of the correct circuit requires only a minimum amount of memory. The compressed output is called the signature of the circuit, and overall approach is dubbed signature analysis.

An example of signature analyzer that compresses a single bit stream is shown in fig.



Fig:- single bit stream signature analysis

This circuits simply counts the number of 0->1 and 1->0 transitions in the Input Stream. This compression does not guarantee that the received sequence is the correct one.

Another technique, represents a modification of the Linear feed back shift register and has the advantage that the same hardware can be used for both pattern generation and signature analysis.



Fig :- BIBO register

Each Incoming data word is xor'd with the contents Limear feed back shift register (LFSR). At the end of test seavence, LFSR contains the signature, of data servience,

which can be compared with signature of correct circuit, us This register not only used for signature analyzer but also used as a scan register, depend on the values of control signals Bo, BI.

This test approach, which combines all the different techniques, is called Built-In Logic block observation (BILBO).

| Bo  | В         | operation mode                             |
|-----|-----------|--------------------------------------------|
| •   | 0         | Normal                                     |
| . 0 | : 1       | 5can                                       |
| 1   | diretario | Pattern generation (or) signature analysis |
| ı   | T         | Reset analysis                             |

The circuit can be used as register and scan register depends on the values of control signals Bo and B1. refer to and inch to assessing

# Advantages of BIST.

<sup>1.</sup> Lower cost of test

<sup>2.</sup> Better fault coverage

<sup>3.</sup> Shorter test times if the BIST can be designed to test more structures in Parallel.

<sup>4.</sup> Easier customer support

<sup>5.</sup> capability to perform tests outside the production-

<sup>6.</sup> Allow the consumers them selves to test the chips prior to mounting

- 1. Additional silicon area and fab Processing requirements for of the and the sale of the sale of the the BIST circuits.
- a. Reduced Access times
  - 3. Additional Pin requirements to connect out side world
  - 4. Possible issues with the correctness of BIST results

# Issues need to consider when Implementing BIST:

- 1. Faults to be covered by BIST and how these will be tested for
- 2. How much chip area will be occupied by BIST circuits
- 3. External supply and excitation revuirements of BIST
- Test time and effective news of the BIST.
- 5. Flexibility and changeability of BIST

the second second

How much the BIST will Impact the production electrical test to souther all mo processes that are already in place.

Joins 1 1056 of brief

as stories there fires if the fire on the most sent to beat

- And on the second sure of the second

and distance that ourselves as the fitted of the

#### Chapter 2 FPGA Architectures: An Overview

Field Programmable Gate Arrays (FPGAs) were first introduced almost two and a half decades ago. Since then they have seen a rapid growth and have become a popular implementation media for digital circuits. The advancement in process technology has greatly enhanced the logic capacity of FPGAs and has in turn made them a viable implementation alternative for larger and complex designs. Further, programmable nature of their logic and routing resources has a dramatic effect on the quality of final device's area, speed, and power consumption.

This chapter covers different aspects related to FPGAs. First of all an overview of the basic FPGA architecture is presented. An FPGA comprises of an array of programmable logic blocks that are connected to each other through programmable interconnect network. Programmability in FPGAs is achieved through an underlying programming technology. This chapter first briefly discusses different programming technologies. Details of basic FPGA logic blocks and different routing architectures are then described. After that, an overview of the different steps involved in FPGA design flow is given. Design flow of FPGA starts with the hardware description of the circuit which is later synthesized, technology mapped and packed using different tools. After that, the circuit is placed and routed on the architecture to complete the design flow.

The programmable logic and routing interconnect of FPGAs makes them flexible and general purpose but at the same time it makes them larger, slower and more power consuming than standard cell ASICs. However, the advancement in process technology has enabled and necessitated a number of developments in the basic FPGA architecture. These developments are aimed at further improvement in the overall efficiency of FPGAs so that the gap between FPGAs and ASICs might be reduced. These developments and some future trends are presented in the last section of this chapter.

#### 2.1 Introduction to FPGAs

Field programmable Gate Arrays (FPGAs) are pre-fabricated silicon devices that can be electrically programmed in the field to become almost any kind of digital circuit or system. For low to medium volume productions, FPGAs provide cheaper solution and faster time to market as compared to Application Specific Integrated Circuits (ASIC) which normally require a lot of resources in terms of time and money to obtain first device. FPGAs on the other hand take less than a minute to configure and they cost anywhere around a few hundred dollars to a few thousand dollars. Also for varying requirements, a portion of FPGA can be partially reconfigured while the rest of an FPGA is still running. Any future updates in the final product can be easily upgraded by simply downloading a new application bitstream. However, the main advantage of FPGAs i.e. flexibility is also the major cause of its draw back. Flexible nature of FPGAs makes them significantly larger, slower, and more power consuming than their ASIC counterparts. These disadvantages arise largely because of the programmable routing interconnect of FPGAs which comprises of almost 90% of total area of FPGAs. But despite these disadvantages, FPGAs present a compelling alternative for digital system implementation due to their less time to market and low volume cost.

Normally FPGAs comprise of:

- Programmable logic blocks which implement logic functions.
- Programmable routing that connects these logic functions.
- I/O blocks that are connected to logic blocks through routing interconnect and that make off-chip connections.

A generalized example of an FPGA is shown in Fig. 2.1 where configurable logic blocks (CLBs) are arranged in a two dimensional grid and are interconnected by programmable routing resources. I/O blocks are arranged at the periphery of the grid and they are also connected to the programmable routing interconnect. The "programmable/reconfigurable" term in FPGAs indicates their ability to implement a new function on the chip after its fabrication is complete. The reconfigurability/programmability of an FPGA is based on an underlying programming technology, which can cause a change in behavior of a pre-fabricated chip after its fabrication.

#### 2.2 Programming Technologies

There are a number of programming technologies that have been used for reconfigurable architectures. Each of these technologies have different characteristics which in turn have significant effect on the programmable architecture. Some of the well known technologies include static memory [122], flash [54], and anti-fuse [61].



Fig. 2.1 Overview of FPGA architecture [22]

#### 2.2.1 SRAM-Based Programming Technology

Static memory cells are the basic cells used for SRAM-based FPGAs. Most commercial vendors [76, 126] use static memory (SRAM) based programming technology in their devices. These devices use static memory cells which are divided throughout the FPGA to provide configurability. An example of such memory cell is shown in Fig. 2.2. In an SRAM-based FPGA, SRAM cells are mainly used for following purposes:

- 1. To program the routing interconnect of FPGAs which are generally steered by small multiplexors.
- 2. To program Configurable Logic Blocks (CLBs) that are used to implement logic functions.

SRAM-based programming technology has become the dominant approach for FPGAs because of its re-programmability and the use of standard CMOS process technology and therefore leading to increased integration, higher speed and lower

Fig. 2.2 Static memory cell



dynamic power consumption of new process with smaller geometry. There are however a number of drawbacks associated with SRAM-based programming technology. For example an SRAM cell requires 6 transistors which makes the use of this technology costly in terms of area compared to other programming technologies. Further SRAM cells are volatile in nature and external devices are required to permanently store the configuration data. These external devices add to the cost and area overhead of SRAM-based FPGAs.

#### 2.2.2 Flash Programming Technology

One alternative to the SRAM-based programming technology is the use of flash or EEPROM based programming technology. Flash-based programming technology offers several advantages. For example, this programming technology is non-volatile in nature. Flash-based programming technology is also more area efficient than SRAM-based programming technology. Flash-based programming technology has its own disadvantages also. Unlike SRAM-based programming technology, flash-based devices can not be reconfigured/reprogrammed an infinite number of times. Also, flash-based technology uses non-standard CMOS process.

#### 2.2.3 Anti-fuse Programming Technology

An alternative to SRAM and flash-based technologies is anti-fuse programming technology. The primary advantage of anti-fuse programming technology is its low area. Also this technology has lower on resistance and parasitic capacitance than other two

programming technologies. Further, this technology is non-volatile in nature. There are however significant disadvantages associated with this programming technology. For example, this technology does not make use of standard CMOS process. Also, anti-fuse programming technology based devices can not be reprogrammed.

In this section, an overview of three commonly used programming technologies is given where all of them have their advantages and disadvantages. Ideally, one would like to have a programming technology which is reprogrammable, non-volatile, and that uses a standard CMOS process. Apparently, none of the above presented technologies satisfy these conditions. However, SRAM-based programming technology is the most widely used programming technology. The main reason is its use of standard CMOS process and for this very reason, it is expected that this technology will continue to dominate the other two programming technologies.

#### 2.3 Configurable Logic Block

A configurable logic block (CLB) is a basic component of an FPGA that provides the basic logic and storage functionality for a target application design. In order to provide the basic logic and storage capability, the basic component can be either a transistor or an entire processor. However, these are the two extremes where at one end the basic component is very fine-grained (in case of transistors) and requires large amount of programmable interconnect which eventually results in an FPGA that suffers from area-inefficiency, low performance and high power consumption. On the other end (in case of processor), the basic logic block is very coarse-grained and can not be used to implement small functions as it will lead to wastage of resources. In between these two extremes, there exists a spectrum of basic logic blocks. Some of them include logic blocks that are made of NAND gates [101], an interconnection of multiplexors [44], lookup table (LUT) [121] and PAL style wide input gates [124]. Commercial vendors like Xilinx and Altera use LUT-based CLBs to provide basic logic and storage functionality. LUT-based CLBs provide a good trade-off between too fine-grained and too coarse-grained logic blocks. A CLB can comprise of a single basic logic element (BLE), or a cluster of locally interconnected BLEs (as shown in Fig. 2.4). A simple BLE consists of a LUT, and a Flip-Flop. A LUT with k inputs (LUT-k) contains  $2^k$  configuration bits and it can implement any k-input boolean function. Figure 2.3 shows a simple BLE comprising of a 4 input LUT (LUT-4) and a D-type Flip-Flop. The LUT-4 uses 16 SRAM bits to implement any 4 inputs boolean function. The output of LUT-4 is connected to an optional Flip-Flop. A multiplexor selects the BLE output to be either the output of a Flip-Flop or the LUT-4.

A CLB can contain a cluster of BLEs connected through a local routing network. Figure 2.4 shows a cluster of 4 BLEs; each BLE contains a LUT-4 and a Flip-Flop. The BLE output is accessible to other BLEs of the same cluster through a local routing network. The number of output pins of a cluster are equal to the total number of BLEs in a cluster (with each BLE having a single output). However, the number of input pins of a cluster can be less than or equal to the sum of input pins required



Fig. 2.3 Basic logic element (BLE) [22]

by all the BLEs in the cluster. Modern FPGAs contain typically 4 to 10 BLEs in a single cluster. Although here we have discussed only basic logic blocks, many modern FPGAs contain a heterogeneous mixture of blocks, some of which can only be used for specific purposes. Theses specific purpose blocks, also referred here as hard blocks, include memory, multipliers, adders and DSP blocks etc. Hard blocks are very efficient at implementing specific functions as they are designed optimally to perform these functions, yet they end up wasting huge amount of logic and routing resources if unused. A detailed discussion on the use of heterogeneous mixture of blocks for implementing digital circuits is presented in Chap. 4 where both advantages and disadvantages of heterogeneous FPGA architectures and a remedy to counter the resource loss problem are discussed in detail.

#### 2.4 FPGA Routing Architectures

As discussed earlier, in an FPGA, the computing functionality is provided by its programmable logic blocks and these blocks connect to each other through programmable routing network. This programmable routing network provides routing

**Fig. 2.4** A configurable logic block (CLB) having four BLEs [22]



connections among logic blocks and I/O blocks to implement any user-defined circuit. The routing interconnect of an FPGA consists of wires and programmable switches that form the required connection. These programmable switches are configured using the programmable technology.

Since FPGA architectures claim to be potential candidate for the implementation of any digital circuit, their routing interconnect must be very flexible so that they can accommodate a wide variety of circuits with widely varying routing demands. Although the routing requirements vary from circuit to circuit, certain common characteristics of these circuits can be used to optimally design the routing interconnect of FPGA architecture. For example most of the designs exhibit locality, hence requiring abundant short wires. But at the same time there are some distant connections, which leads to the need for sparse long wires. So, care needs to be taken into account while designing routing interconnect for FPGA architectures where we have to address both flexibility and efficiency. The arrangement of routing resources, relative to the arrangement of logic blocks of the architecture, plays a very important role in the overall efficiency of the architecture. This arrangement is termed here as global routing architecture whereas the microscopic details regarding the switching topology of different switch blocks is termed as detailed routing architecture. On the basis of the global arrangement of routing resources of the architecture, FPGA architectures can be categorized as either hierarchical [4] or island-style [22]. In this section, we present a detailed overview of both routing architectures.



Fig. 2.5 Overview of mesh-based FPGA architecture [22]

## 2.4.1 Island-Style Routing Architecture

Figure 2.5 shows a traditional island-style FPGA architecture (also termed as mesh-based FPGA architecture). This is the most commonly used architecture among academic and commercial FPGAs. It is called island-style architecture because in this architecture configurable logic blocks look like islands in a sea of routing interconnect. In this architecture, configurable logic blocks (CLBs) are arranged on a 2D grid and are interconnected by a programmable routing network. The Input/Output (I/O) blocks on the periphery of FPGA chip are also connected to the programmable routing network. The routing network comprises of pre-fabricated wiring segments and programmable switches that are organized in horizontal and vertical routing channels.

The routing network of an FPGA occupies 80–90% of total area, whereas the logic area occupies only 10–20% area [22]. The flexibility of an FPGA is mainly dependent on its programmable routing network. A mesh-based FPGA routing network consists of horizontal and vertical routing tracks which are interconnected through switch boxes (SB). Logic blocks are connected to the routing network through connection boxes (CB). The flexibility of a connection box (Fc) is the number of routing tracks of adjacent channel which are connected to the pin of a block. The connectivity of input pins of logic blocks with the adjacent routing channel is called as Fc(in); the connectivity of output pins of the logic blocks with the adjacent routing channel is called as Fc(out). An Fc(in) equal to 1.0 means that all the tracks of adjacent routing channel are connected to the input pin of the logic block. The flexibility of switch box (Fs) is the total number of tracks with which every track entering in the switch

**Fig. 2.6** Example of switch and connection box



box connects to. The number of tracks in routing channel is called the channel width of the architecture. Same channel width is used for all horizontal and vertical routing channels of the architecture. An example explaining the switch box, connection box flexibilities, and routing channel width is shown in Fig. 2.6. In this figure switch box has Fs = 3 as each track incident on it is connected to 3 tracks of adjacent routing channels. Similarly, connection box has Fc(in) = 0.5 as each input of the logic block is connected to 50% of the tracks of adjacent routing channel.

The routing tracks connected through a switch box can be bidirectional or unidirectional (also called as directional) tracks. Figure 2.7 shows a bidirectional and a unidirectional switch box having Fs equal to 3. The input tracks (or wires) in both these switch boxes connect to 3 other tracks of the same switch box. The only limitation of unidirectional switch box is that their routing channel width must be in multiples of 2.

Generally, the output pins of a block can connect to any routing track through pass transistors. Each pass transistor forms a tristate output that can be independently turned on or off. However, single-driver wiring technique can also be used to connect output pins of a block to the adjacent routing tracks. For single-driver wiring, tristate elements cannot be used; the output of block needs to be connected to the neighboring routing network through multiplexors in the switch box. Modern commercial FPGA architectures have moved towards using single-driver, directional routing tracks. Authors in [51] show that if single-driver directional wiring is used instead of bidirectional wiring, 25% improvement in area, 9% in delay and 32% in area-delay can be achieved. All these advantages are achieved without making any major changes in the FPGA CAD flow.

In mesh-based FPGAs, multi-length wires are created to reduce delay. Figure 2.8 shows an example of different length wires. Longer wire segments span multiple blocks and require fewer switches, thereby reducing routing area and delay. However, they also decrease routing flexibility, which reduces the probability to route a hardware circuit successfully. Modern commercial FPGAs commonly use a combination of long and short wires to balance flexibility, area and delay of the routing network.



Fig. 2.7 Switch block, length 1 wires [51]



Fig. 2.8 Channel segment distribution

#### 2.4.1.1 Altera's Stratix II Architecture

Until now, we have presented a general overview about island-style routing architecture. Now we present a commercial example of this kind of architectures. Altera's Stratix II [106] architecture is an industrial example of an island-style FPGA (Fig. 2.9). The logic structure consists of LABs (Logic Array Blocks), memory blocks, and digital signal processing (DSP) blocks. LABs are used to



Fig. 2.9 Altera's stratix-II block diagram

implement general-purpose logic, and are symmetrically distributed in rows and columns throughout the device fabric. The DSP blocks are custom designed to implement full-precision multipliers of different granularities, and are grouped into columns. Input- and output-only elements (IOEs) represent the external interface of the device. IOEs are located along the periphery of the device.

Each Stratix II LAB consists of eight Adaptive Logic Modules (ALMs). An ALM consists of 2 adaptive LUTs (ALUTs) with eight inputs altogether. Construction of an ALM allows implementation of 2 separate 4-input Boolean functions. Further, an ALM can also be used to implement any six-input Boolean function, and some seven-input functions. In addition to lookup tables, an ALM provides 2 programmable registers, 2 dedicated full-adders, a carry chain, and a register-chain. Full-adders and carry chain can be used to implement arithmetic operations, and the register-chain is used to build shift registers. Outputs of an ALM drive all types of interconnect provided by the Stratix II device. Figure 2.10 illustrates a LAB interconnect interface.

Interconnections between LABs, RAM blocks, DSP blocks and the IOEs are established using the Multi-track interconnect structure. This interconnect structure consists of wire segments of different lengths and speeds. The interconnect wire-segments span fixed distances, and run in the horizontal (row interconnects) and vertical (column interconnects) directions. The row interconnects (Fig. 2.11) can be used to route signals between LABs, DSP blocks, and memory blocks in the same row. Row interconnect resources are of the following types:



Fig. 2.10 Stratix-II logic array block (LAB) structure

- Direct connections between LABs and adjacent blocks.
- R4 resources that span 4 blocks to the left or right.
- R24 resources that provide high-speed access across 24 columns.

Each LAB owns its set of R4 interconnects. A LAB has approximately equal numbers of driven-left and driven-right R4 interconnects. An R4 interconnect that is driven to the left can be driven by either the primary LAB (Fig. 2.11) or the adjacent LAB to the left.

Similarly, a driven-right R4 interconnect may be driven by the primary LAB or the LAB immediately to its right. Multiple R4 resources can be connected to each other to establish longer connections within the same row. R4 interconnects can also drive C4 and C16 column interconnects, and R24 high speed row resources.

Column interconnect structure is similar to row interconnect structure. Column interconnects include:

- Carry chain interconnects within a LAB, and from LAB to LAB in the same column.
- Register chain interconnects.
- C4 resources that span 4 blocks in the up and down directions.
- C16 resources for high-speed vertical routing across 16 rows.

Carry chain and register chain interconnects are separated from local interconnect (Fig. 2.10) in a LAB. Each LAB has its own set of driven-up and driven-down C4 interconnects. C4 interconnects can also be driven by the LABs that are immediately



Fig. 2.11 R4 interconnect connections

adjacent to the primary LAB. Multiple C4 resources can be connected to each other to form longer connections within a column, and C4 interconnects can also drive row interconnects to establish column-to-column interconnections. C16 interconnects are high-speed vertical resources that span 16 LABs. A C16 interconnect can drive row and column interconnects at every fourth LAB. A LAB local interconnect structure cannot be directly driven by a C16 interconnect; only C4 and R4 interconnects can drive a LAB local interconnect structure. Figure 2.12 shows the C4 interconnect structure in the Stratix II device.

# 2.4.2 Hierarchical Routing Architecture

Most logic designs exhibit locality of connections; hence implying a hierarchy in placement and routing of connections between different logic blocks. Hierarchical routing architectures exploit this locality by dividing FPGA logic blocks into separate groups/clusters. These clusters are recursively connected to form a hierarchical structure. In a hierarchical architecture (also termed as tree-based architecture), connections between logic blocks within same cluster are made by wire segments at the lowest level of hierarchy. However, the connection between blocks residing in different groups require the traversal of one or more levels of hierarchy. In a hierarchical architecture, the signal bandwidth varies as we move away from the bottom level and generally it is widest at the top level of hierarchy. The hierarchical routing architecture has been used in a number of commercial FPGA families including Altera Flex10K [10], Apex [15] and ApexII [16] architectures. We assume that Multilevel hierarchical interconnect regroups architectures with more than 2 levels of hierarchy and Tree-based ones.

13



Fig. 2.12 C4 interconnect connections

#### 2.4.2.1 HFPGA: Hierarchical FPGA

In the hierarchical FPGA called HFPGA, LBs are grouped into clusters. Clusters are then grouped recursively together (see Fig. 2.13). The clustered VPR mesh architecture [22] has a Hierarchical topology with only two levels. Here we consider multilevel hierarchical architectures with more than 2 levels. In [1] and [129] various hierarchical structures were discussed. The HFPGA routability depends on switch boxes topologies. HFPGAs comprising fully populated switch boxes ensure 100% routability but are very penalizing in terms of area. In [129] authors explored the HFPGA architecture, investigating how the switch pattern can be partly depopulated while maintaining a good routability.



Fig. 2.13 Hierarchical FPGA topology

## 2.4.2.2 HSRA: Hierarchical Synchronous Reconfigurable Array

An example of an academic hierarchical routing architecture is shown in Fig. 2.14. It has a strictly hierarchical, tree-based interconnect structure. In this architecture, the only wire segments that directly connect to the logic units are located at the leaves of the interconnect tree. All other wire segments are decoupled from the logic structure. A logic block of this architecture consists of a pair of 2-input Look Up Table (2-LUT) and a D-type Flip Flop (D-FF). The input-pin connectivity is based on a choose-k strategy [4], and the output pins are fully connected. The richness of this interconnect structure is defined by its base channel width c and interconnect growth rate p. The base channel width c is defined as the number of tracks at the leaves of the interconnect Tree (in Fig. 2.14, c = 3). Growth rate p is defined as the rate at which the interconnect bandwidth grows towards the upper levels. The interconnect growth rate can be realized either using non-compressing or compressing switch blocks. The details regarding these switch blocks is as follows:

- Non-compressing (2:1) switch blocks—The number of tracks at the upper level are equal to the sum of the number of tracks of the children at lower level. For example, in Fig. 2.14, non-compressing switch blocks are used between levels 1, 2 and levels 3, 4.
- Compressing (1:1) switch blocks—The number of tracks at the upper level are equal to the number of tracks of either child at the lower level. For example, in Fig. 2.14, compressing switch blocks are used between levels 2 and 3.

A repeating combination of non-compressing and compressing switch blocks can be used to realize any value of p less than one. For example, a repeating pattern of (2:1, 1:1) switch blocks realizes p = 0.5, while the pattern (2:1, 2:1, 1:1) realizes p = 0.67. An architecture that has only 2:1 switch blocks provides a growth rate of p = 1.

Another hierarchical routing architecture is presented in [132] where the global routing architecture (i.e. the position of routing resources relative to logic resources



Fig. 2.14 Example of hierarchical routing architecture [4]

of the architecture) remains the same as in [4]. However, there are several key differences at the level of detailed routing architecture (i.e. the way the routing resources are connected to each other, flexibility of switch blocks etc.) that separate the two architectures. For example the architecture shown in Fig. 2.14 has one bidirectional interconnect that uses bidirectional switches and it supports only arity-2 (i.e. each cluster can contain only two sub-clusters). On contrary, the architecture presented in [132] supports two separate unidirectional interconnect networks: one is downward interconnect whereas other is upward interconnect network. Further this architecture is more flexible as it can support logic blocks with different sizes and also the clusters/groups of the routing architecture can have different arity sizes. Further details of this architecture, from now on alternatively termed as tree-based architecture, are presented in next chapter.

**Fig. 2.15** The APEX programmable logic Devices [87]



#### 2.4.2.3 APEX: Altera

APEX architecture is a commercial product from Altera Corporation which includes 3 levels of interconnect hierarchy. Figure 2.15 shows a diagram of the APEX 20K400 programmable logic device. The basic logic-element (LE) is a 4-input LUT and DFF pair. Groups of 10 LEs are grouped into a logic-array-block or LAB. Interconnect within a LAB is complete, meaning that a connection from the output of any LE to the input of another LE in its LAB always exists, and any signal entering the input region can reach every LE.

Groups of 16 LABs form a MegaLab. Interconnect within a MegaLab requires an LE to drive a GH (MegaLab global H) line, a horizontal line, which switches into the input region of any other LAB in the same MegaLab. Adjacent LABs have the ability to interleave their input regions, so an LE in  $LAB_i$  can usually drive  $LAB_{i+1}$  without using a GH line. A 20K400 MegaLab contains 279 GH lines.

The top-level architecture is a 4 by 26 array of MegaLabs. Communication between MegaLabs is accomplished by global H (horizontal) and V (vertical) wires, that switch at their intersection points. The H and V lines are segmented by a bidirectional segmentation buffer at the horizontal and vertical centers of the chip. In Fig. 2.15, We denote the use of a single (half-chip) line as H or V and a double or full-chip line through the segmentation buffer as HH or VV. The 20K400 contains 100 H lines per MegaLab row, and 80 V lines per LAB-column.

In this section, so far we have given an overview of the two routing architectures that are commonly employed in FPGAs. Both architectures have their positive and negative points. For example, hierarchical routing architectures exploit the

Fig. 2.16 a Number of series switches in a mesh structure b Number of series switches in a tree structure



locality exhibited by the most of the designs and in turn offer smaller delays and more predictable routing compared to island-style architectures. The speed of a net is determined by the number of routing switches it has to pass and the length of wires. In a mesh-based architecture, the number of segments increase linearly with manhattan distance d between the logic blocks to be connected. However, for tree-based architecture the distance d between the blocks to be connected increases in a logarithmic manner [82]. This fact is illustrated in Fig. 2.16. On the other hand, scalability is an issue in hierarchical routing architectures and there might be some design mapping issues. But in the case of mesh-based architecture, there are no such issues as it offers a tile-based layout where a tile once formed can be replicated horizontally and vertically to make as large architecture as we wish.

## 2.5 Software Flow

FPGA architectures have been intensely investigated over the past two decades. A major aspect of FPGA architecture research is the development of Computer Aided Design (CAD) tools for mapping applications to FPGAs. It is well established that the quality of an FPGA-based implementation is largely determined by the effectiveness of accompanying suite of CAD tools. Benefits of an otherwise well designed, feature rich FPGA architecture might be impaired if the CAD tools cannot take advantage of the features that the FPGA provides. Thus, CAD algorithm research is essential to the necessary architectural advancement to narrow the performance gaps between FPGAs and other computational devices like ASICs.

The software flow (CAD flow) takes an application design description in a Hardware Description Language (HDL) and converts it to a stream of bits that is eventually programmed on the FPGA. The process of converting a circuit description into a format that can be loaded into an FPGA can be roughly divided into five distinct steps, namely: synthesis, technology mapping, mapping, placement and routing. The final output of FPGA CAD tools is a bitstream that configures the state of the memory

**Fig. 2.17** FPGA software flow



bits in an FPGA. The state of these bits determines the logical function that the FPGA implements. Figure 2.17 shows a generalized software flow for programming an application circuit on an FPGA architecture. A description of various modules of software flow is given in the following part of this section. The details of these modules are generally indifferent to the kind of routing architecture being used and they are applicable to both architectures described earlier unless otherwise specified.

# 2.5.1 Logic Synthesis

The flow of FPGA starts with the logic synthesis of the netlist being mapped on it. Logic synthesis [26, 27] transforms an HDL description (VHDL or Verilog) into a set of boolean gates and Flip-Flops. The synthesis tools transform the



Fig. 2.18 Directed acyclic graph representation of a circuit

register-transfer-level (RTL) description of a design into a hierarchical boolean network. Various technology-independent techniques are applied to optimize the boolean network. The typical cost function of technology-independent optimizations is the total literal count of the factored representation of the logic function. The literal count correlates very well with the circuit area. Further details of logic synthesis are beyond the scope of this book.

# 2.5.2 Technology Mapping

The output from synthesis tools is a circuit description of Boolean logic gates, flipflops and wiring connections between these elements. The circuit can also be represented by a Directed Acyclic Graph (DAG). Each node in the graph represents a gate, flip-flop, primary input or primary output. Each edge in the graph represents a connection between two circuit elements. Figure 2.18 shows an example of a DAG representation of a circuit. Given a library of cells, the technology mapping problem can be expressed as finding a network of cells that implements the Boolean network. In the FPGA technology mapping problem, the library of cells is composed of k-input LUTs and flip-flops. Therefore, FPGA technology mapping involves transforming the Boolean network into k-bounded cells. Each cell can then be implemented as an independent k-LUT. Figure 2.19 shows an example of transforming a Boolean network into k-bounded cells. Technology mapping algorithms can optimize a design for a set of objectives including depth, area or power. The FlowMap algorithm [64] is the most widely used academic tool for FPGA technology mapping. FlowMap is a breakthrough in FPGA technology mapping because it is able to find a depth-optimal solution in polynomial time. FlowMap guarantees depth optimality at the expense of logic duplication. Since the introduction of FlowMap, numerous technology mappers have been designed that optimize for area and run-time while still maintaining



Fig. 2.19 Example of technology mapping

the depth-optimality of the circuit [65–67]. The result of the technology mapping step generates a network of k-bounded LUTs and flip-flops.

## 2.5.3 Clustering/Packing

The logic elements in a Mesh-based FPGA are typically arranged in two levels of hierarchy. The first level consists of logic blocks (LBs) which are k-input LUT and flip-flop pairs. The second level hierarchy groups k LBs together to form logic blocks clusters. The clustering phase of the FPGA CAD flow is the process of forming groups of k LBs. These clusters can then be mapped directly to a logic element on an FPGA. Figure 2.20 shows an example of the clustering process.

Clustering algorithms can be broadly categorized into three general approaches, namely top-down [39, 78], depth-optimal [84, 100] and bottom-up [14, 17, 43]. Top-down approaches partition the LBs into clusters by successively subdividing the network or by iteratively moving LBs between parts. Depth-optimal solutions attempt to minimize delay at the expense of logic duplication. Bottom-up approaches are generally preferred for FPGA CAD tools due to their fast run times and reasonable timing delays. They only consider local connectivity information and can easily satisfy clusters pin constraints. Top-down approaches offer the best solutions; however, their computational complexity can be prohibitive.

### 2.5.3.1 Bottom-up Approaches

Bottom-up approaches build clusters sequentially one at a time. The process starts by choosing an LB which acts as a cluster seed. LBs are then greedily selected and added to the cluster, applying various attraction functions. The VPack [14] attraction



Fig. 2.20 Example of packing

function is based on the number of shared nets between a candidate LB and the LBs that are already in the cluster. For each cluster, the attraction function is used to select a seed LB from the set of all LBs that have not already been packed. After packing a seed LB into the new cluster, a second attraction function selects new LBs to pack into the cluster. LBs are packed into the cluster until the cluster reaches full capacity or all cluster inputs have been used. If all cluster inputs become occupied before this cluster reaches full capacity, a hill-climbing technique is applied, searching for LBs that do not increase the number of inputs used by the cluster. The VPack pseudo-code is outlined in algorithm 2.1.

T-VPack [22] is a timing-driven version of VPack which gives added weight to grouping LBs on the critical path together. The algorithm is identical to VPack, however, the attraction functions which select the LBs to be packed into the clusters are different. The VPack seed function chooses LBs with the most used inputs, whereas the T-VPack seed function chooses LBs that are on the most critical path. VPack's second attraction function chooses LBs with the largest number of connections with the LBs already packed into the cluster. T-VPack's second attraction function has two components for a LB *B* being considered for cluster *C*:

$$Attraction(B, C) = \alpha.Crit(B) + (1 - \alpha) \frac{|Nets(B) \cap Nets(C)|}{G}$$
 (2.1)

where Crit(B) is a measure of how close LB B is to being on the critical path, Nets(B) is the set of nets connected to LB B, Nets(C) is the set of nets connected to the LBs already selected for cluster C,  $\alpha$  is a user-defined constant which determines the relative importance of the attraction components, and G is a normalizing factor. The first component of T-VPack's second attraction function chooses critical-path LBs, and the second chooses LBs that share many connections with the LBs already packed into the cluster. By initializing and then packing clusters with

```
UnclusteredLBs = PatternMatchToLBs(LUTs,Registers);
LogicClusters = NULL;
while UnclusteredLBs != NULL do
   C = GetLBwithMostUsedInputs(UnclusteredLBs);
   while \mid C \mid < k \text{ do}
      /*cluster is not full*/
      BestLB = MaxAttractionLegalLB(C,UnclusteredLBs);
      if BestLB == NULL then
         /*No LB can be added to this cluster*/
         break:
      endif
      UnclusteredLBs = UnclusteredLB - BestLB;
      C = C \cup BestLB:
   endw
   if |C| < k then
      /*Cluster is not full - try hill climbing*/
      while \mid C \mid < k \text{ do}
         BestLB = MinClusterInputIncreaseLB(C,UnclusteredLBs);
         C = C \cup BestLB;
          UnclusteredLBs = UnclusteredLB - BestLB;
      if ClusterIsIllegal(C) then
         RestoreToLastLegalState(C,UnclusteredLBs);
   endif
   LogicClusters = LogicClusters \cup C;
endw
```

**Algorithm 2.1** Pseudo-code of the VPack Algorithm [22]

critical-path LBs, the algorithm is able to absorb long sequences of critical-path LBs into clusters. This minimizes circuit delay since the local interconnect within the cluster is significantly faster than the global interconnect of the FPGA. RPack [43] improves routability of a circuit by introducing a new set of routability metrics. RPack significantly reduced the channel widths required by circuits compared to VPack. T-RPack [43] is a timing driven version of RPack which is similar to T-VPack by giving added weight to grouping LBs on the critical path. iRAC [17] improves the routability of circuits even further by using an attraction function that attempts to encapsulate as many low fanout nets as possible within a cluster. If a net can be completely encapsulated within a cluster, there is no need to route that net in the external routing network. By encapsulating as many nets as possible within clusters, routability is improved because there are less external nets to route in total.

### 2.5.3.2 Top-down Approaches

The K-way partitioning problem seeks to minimize a given cost function of such an assignment. A standard cost function is net cut, which is the number of hyperedges that span more than one partition, or more generally, the sum of weights of such hyperedges. Constraints are typically imposed on the solution, and make the problem difficult. For example some vertices can be fixed in their parts or the total vertex weight in each part must be limited (balance constraint and FPGA clusters size). With balance constraints, the problem of partitioning optimally a hypergraph is known to be NP-hard [85]. However, since partitioning is critical in several practical applications, heuristic algorithms were developed with near-linear runtime. Such move-based heuristics for k-way hypergraph partitioning appear in [24, 34, 110].

### Fiduccia-Mattheyses Algorithm

The Fiduccia-Mattheyses (FM) heuristics [34] work by prioritizing moves by gain. A move changes to which partition a particular vertex belongs, and the gain is the corresponding change of the cost function. After each vertex is moved, gains for connected modules are updated.

```
partitioning = initial_solution;

while solution quality improves do

Initialize gain_container from partitioning;

solution_cost = partitioning.get_cost();

while not all vertices locked do

move = choose_move();

solution_cost += gain_container.get_gain(move);

gain_container.lock_vertex(move.vertex());

gain_update(move);

partitioning.apply(move);

endw

roll back partitioning to best seen solution;

gain_container.unlock_all();

endw
```

**Algorithm 2.2** Pseudo-code for FM Heuristic [38]

The Fiduccia-Mattheyses (FM) heuristic for partitioning hypergraphs is an iterative improvement algorithm. FM starts with a possibly random solution and changes the solution by a sequence of moves which are organized as passes. At the beginning of a pass, all vertices are free to move (unlocked), and each possible move is labeled with the immediate change to the cost it would cause; this is called the gain of the move (positive gains reduce solution cost, while negative gains increase it). Iteratively, a move with highest gain is selected and executed, and the moving vertex is locked, i.e., is not allowed to move again during that pass. Since moving a vertex can change gains of adjacent vertices, after a move is executed all affected gains are updated. Selection and execution of a best-gain move, followed by gain update, are repeated until every vertex is locked. Then, the best solution seen during the pass is adopted as the starting solution of the next pass. The algorithm terminates when a



Fig. 2.21 The gain bucket structure as illustrated in [34]

pass fails to improve solution quality. Pseudo-code for the FM heuristic is given in algorithm 2.2.

The FM algorithm has 3 main components (1) computation of initial gain values at the beginning of a pass; (2) the retrieval of the best-gain (feasible) move; and (3) the update of all affected gain values after a move is made. One contribution of Fiduccia and Mattheyses lies in observing that circuit hypergraphs are sparse, and any move's gain is bounded between plus and minus the maximal vertex degree  $G_{max}$  in the hypergraph (times the maximal hyperedge weight, if weights are used). This allows prioritizing moves by their gains. All affected gains can be updated in amortized-constant time, giving overall linear complexity per pass [34]. All moves with the same gain are stored in a linked list representing a "gain bucket". Figure. 2.21 presents the gain bucket list structure. It is important to note that some gains G may be negative, and as such, FM performs hill-climbing and is not strictly greedy.

### Multilevel Partitioning

The multilevel hypergraph partitioning framework was successfully verified by [31, 48, 49] and leads to the best known partitioning results ever since. The main advantage of multilevel partitioning over flat partitioners is its ability to search the solution space more effectively by spending comparatively more effort on smaller coarsened hypergraphs. Good coarsening algorithms allow for high correlation between good partitioning for coarsened hypergraphs and good partitioning for the initial hypergraph. Therefore, a thorough search at the top of the multilevel hierarchy is worthwhile because it is relatively inexpensive when compared to flat partitioning of the original hypergraph, but can still preserve most of the possible improvement.

The result is an algorithmic framework with both improved runtime and solution quality over a completely flat approach. Pseudo-code for an implementation of the multilevel partitioning framework is given in algorithm 2.3.

```
level = 0;
hierarchy[level] = hypergraph;
min_vertices = 200;
while hierarchy[level].vertex_count() > min_vertices do
    next_level = cluster(hierarchy[level]);
    level = level + 1;
    hierarchy[level] = next_level;
endw
partitioning[level] = a random initial solution for top-level hypergraph;
FM(hierarchy[level], partitioning[level]);
while level>0 do
    level = level - 1;
    partitioning[level] = project(partitioning[level+1], hierarchy[level]);
FM(hierarchy[level], partitioning[level]);
endw
```

**Algorithm 2.3** Pseudo-code for the Multilevel Partitioning Algorithm [38]

As illustrated in Fig. 2.22, multilevel partitioning consists of 3 main components: clustering, top-level partitioning and refinement or "uncoarsening". During clustering, hypergraph vertices are combined into clusters based on connectivity, leading to a smaller, clustered hypergraph. This step is repeated until obtaining only several hundred clusters and a hierarchy of clustered hypergraphs. We describe this hierarchy, as shown in Fig. 2.22, with the smaller hypergraphs being "higher" and the larger hypergraphs being "lower". The smallest (top-level) hypergraph is partitioned with a very fast initial solution generator and improved iteratively, for example, using the FM algorithm. The resulting partitioning is then interpreted as a solution for the next hypergraph in the hierarchy. During the refinement stage, solutions are projected from one level to the next and improved iteratively. Additionally, the hMETIS partitioning program [49] introduced several new heuristics that are incorporated into their multilevel partitioning implementation and are reportedly performance critical.

### 2.5.4 Placement

Placement algorithms determine which logic block within an FPGA should implement the corresponding logic block (instance) required by the circuit. The optimization goals consist in placing connected logic blocks close together to minimize the required wiring (wire length-driven placement), and sometimes to place blocks to balance the wiring density across the FPGA (routability-driven placement) or to maximize circuit speed (timing-driven placement). The 3 major classes of



Fig. 2.22 Multilevel hypergraph bisection

placers in use today are min-cut (Partitioning-based) [6, 40], analytic [32, 53] which are often followed by local iterative improvement, and simulated annealing based placers [37, 105]. To investigate architectures fairly we must make sure that our CAD tools are attempting to use every FPGA's feature. This means that the optimization approach and goals of the placer may change from architecture to architecture. Partitioning and simulated annealing approaches are the most common and used in FPGA CAD tools. Thus we focus on both techniques in the sequel.

### 2.5.4.1 Simulated Annealing Based Approach

Simulated annealing mimics the annealing process used to cool gradually molten metal to produce high-quality metal objects [105]. Pseudo-code for a generic simulated annealing-based placer is shown in algorithm 2.4. A cost function is used to evaluate the quality of a given placement of logic blocks. For example, a common cost function in wirelength-driven placement is the sum over all nets of the half perimeter of their bounding boxes. An initial placement is created by assigning logic blocks randomly to the available locations in the FPGA. A large number of moves, or local improvements are then made to gradually improve the placement. A logic block is selected at random, and a new location for it is also selected randomly. The change in cost function that results from moving the selected logic block to the proposed new location is computed. If the cost decreases, the move is always accepted and the block is moved. If the cost increases, there is still a chance to accept the move, even though it makes the placement worse. This probability of acceptance is

```
S = \text{RandomPlacement}();
T = \text{InitialTemperature}();
R_{limit} = InitialR_{limit};
\textbf{while } ExitCriterion() == false \textbf{ do}
while InnerLoopCriterion() == false \textbf{ do}
S_{new} = GenerateViaMove(S, R_{limit});
\Delta C = Cost(S_{new}) - Cost(S);
r = \text{random}(0,1);
\textbf{if } r < e^{-\frac{\Delta C}{T}} \textbf{ then}
S = S_{new};
\textbf{endif}
\textbf{endw}
T = \text{UpdateTemp}();
R_{limit} = UpdateR_{limit}();
\textbf{endw}
```

Algorithm 2.4 Generic Simulated Annealing-based Placer [22]

given by  $e^{-\frac{\Delta C}{T}}$ , where  $\Delta C$  is the change in cost function, and T is a parameter called temperature that controls probability of accepting moves that worsen the placement. Initially, T is high enough so almost all moves are accepted; it is gradually decreased as the placement improves, in such a way that eventually the probability of accepting a worsening move is very low. This ability to accept hill-climbing moves that make a placement worse allows simulated annealing to escape local minima of the cost function.

The  $R_{limit}$  parameter in algorithm 2.4 controls how close are together blocks must be to be considered for swapping. Initially,  $R_{limit}$  is fairly large, and swaps of blocks far apart on a chip are more likely. Throughout the annealing process,  $R_{limit}$  is adjusted to try to keep the fraction of accepted moves at any temperature close to 0.44. If the fraction of moves accepted,  $\alpha$ , is less than 0.44,  $R_{limit}$  is reduced, while if  $\alpha$  is greater than 0.44,  $R_{limit}$  is increased.

In [22], the objective cost function is a function of the total wirelength of the current placement. The wirelength is an estimate of the routing resources needed to completely route all nets in the netlist. Reductions in wirelength mean fewer routing wires and switches are required to route nets. This point is important because routing resources in an FPGA are limited. Fewer routing wires and switches typically are also translated into reductions of the delay incurred in routing nets between logic blocks. The total wirelength of a placement is estimated using a semi-perimeter metric, and is given by Eq. 2.2. N is the total number of nets in the netlist, bbx(i) is the horizontal span of net i, bby(i) is its vertical span, and q(i) is a correction factor. Figure 2.23 illustrates the calculation of the horizontal and vertical spans of a hypothetical net that has 6 terminals.

$$WireCost = \sum_{i=1}^{N} q(i) \times (bb_x(i) + bb_y(i))$$
 (2.2)

**Fig. 2.23** Bounding box of a hypothetical 6-terminal net [22]



The temperature decrease rate, the exit criterion for terminating the anneal, the number of moves attempted at each temperature (InnerLoopCriterion), and the method by which potential moves are generated are defined by the annealing schedule. An efficient annealing schedule is crucial to obtain good results in a reasonable amount of CPU time. Many proposed annealing schedules are "fixed" schedules with no ability to adapt to different problems. Such schedules can work well within the narrow application range for which they are developed, but their lack of adaptability means they are not very general. In [86] authors propose an "adaptive" annealing schedule based on statistics computed during the anneal itself. Adaptive schedules are widely used to solve large scale optimization problems with many variables.

### 2.5.4.2 Partitioning Based Approach

Partitioning-based placement methods, are based on graph partitioning algorithms such as the Fiduccia-Mattheyses (FM) algorithm [34], and Kernighan Lin (KL) algorithm [6]. Partitioning-based placement are suitable to Tree-based FPGA architectures. The partitioner is applied recursively to each hierarchical level to distribute netlist cells between clusters. The aim is to reduce external communications and to collect highly connected cells into the same cluster.

The partitioning-based placement is also used in the case of Mesh-based FPGA. The device is divided into two parts, and a circuit partitioning algorithm is applied to determine the adequate part where a given logic block must be placed to minimize the number of cuts in the nets that connect the blocks between partitions, while leaving highly-connected blocks in one partition.

A divide-and-conquer strategy is used in these heuristics. By partitioning the problem into sub-parts, a drastic reduction in search space can be achieved. On the whole, these algorithms perform in the top-down manner, placing blocks in the general regions which they should belong to. In the Mesh FPGA case, partitioning-based placement algorithms are good from a "global" perspective, but they do not actually attempt to minimize wirelength. Therefore, the solutions obtained are sub-optimal in terms of wirelength. However, these classes of algorithms run very fast. They are normally used in conjunction with other search techniques for further quality improvement. Some algorithms [130] and [95] combine multi-level clustering and hierarchical simulated annealing to obtain ultra-fast placement with good quality. In the following chapters, the partitioning-based placement approach will be used only for Tree-based FPGA architectures.

## **2.5.5** *Routing*

The FPGA routing problem consists in assigning nets to routing resources such that no routing resource is shared by more than one net. Pathfinder [80] is the current, state-of-the-art FPGA routing algorithm. Pathfinder operates on a directed graph abstraction G(V, E) of the routing resources in an FPGA. The set of vertices V in the graph represents the IO terminals of logic blocks and the routing wires in the interconnect structure. An edge between two vertices represents a potential connection between them. Figure 2.24 presents a part of a routing graph in a Meshbased interconnect.

Given this graph abstraction, the routing problem for a given net is to find a directed tree embedded in *G* that connects the source terminal of the net to each of its sink terminals. Since the number of routing resources in an FPGA is limited, the goal of finding unique, non-intersecting trees for all the nets in a netlist is a difficult problem.

Pathfinder uses an iterative, negotiation-based approach to successfully route all the nets in a netlist. During the first routing iteration, nets are freely routed without paying attention to resource sharing. Individual nets are routed using Dijkstra's shortest path algorithm [111]. At the end of the first iteration, resources may be congested because multiple nets have used them. During subsequent iterations, the cost of using a resource is increased, based on the number of nets that share the resource, and the history of congestion on that resource. Thus, nets are made to negotiate for routing resources. If a resource is highly congested, nets which can use lower congestion alternatives are forced to do so. On the other hand, if the alternatives are more congested than the resource, then a net may still use that resource.

The cost of using a routing resource n during a routing iteration is given by Eq. 2.3.

$$c_n = (b_n + h_n) \times p_n \tag{2.3}$$



Fig. 2.24 Modeling FPGA architecture as a directed graph [22]

 $b_n$  is the base cost of using the resource n,  $h_n$  is related to the history of congestion during previous iterations, and  $p_n$  is proportional to the number of nets sharing the resource in the current iteration. The  $p_n$  term represents the cost of using a shared resource n, and the  $h_n$  term represents the cost of using a resource that has been shared during earlier routing iterations. The latter term is based on the intuition that a historically congested node should appear expensive, even if it is slightly shared currently. Cost functions and routing schedule were described in details in [22]. The Pseudo-code of the Pathfinder routing algorithm is presented in algorithm 2.5.

```
Let: RT_i be the set of nodes in the current routing of net i
while shared resources exist do
   /*Illegal routing*/
   foreach net, i do
      rip-up routing tree RT_i;
       RT(i) = s_i;
      foreach sink tii do
          Initialize priority queue PQ to RT_i at cost 0;
          while sink t_{ij} not found do
             Remove lowest cost node m from PQ;
             foreach fanout node n of node m do
                 Add n to PQ at PathCost(n) = c_n + PathCost(m);
             endfch
          endw
          foreach node n in path t_{ij} to s_i do
             /*backtrace*/
             Update c_n;
             Add n to RT_i;
          endfch
      endfch
   endfch
   update h_n for all n;
```

**Algorithm 2.5** Pseudo-code of the *Pathfinder* Routing Algorithm [80]

An important measure of routing quality produced by an FPGA routing algorithm is the critical path delay. The critical path delay of a routed netlist is the maximum delay of any combinational path in the netlist. The maximum frequency at which a netlist can be clocked has an inverse relationship with critical path delay. Thus, larger critical path delays slow down the operation of netlist. Delay information is incorporated into *Pathfinder* by redefining the cost of using a resource *n* (Eq. 2.4).

$$c_n = A_{ij} \times d_n + (1 - A_{ij}) \times (b_n + h_n) \times p_n$$
 (2.4)

The  $c_n$  term is from Eq. 2.3,  $d_n$  is the delay incurred in using the resource, and  $A_{ij}$  is the criticality given by Eq. 2.5.

$$A_{ij} = \frac{D_{ij}}{D_{max}} \tag{2.5}$$

 $D_{ij}$  is the maximum delay of any combinational path going through the source and sink terminals of the net being routed, and  $D_{max}$  is the critical path delay of the netlist. Equation 2.4 is formulated as a sum of two cost terms. The first term in the equation represents the delay cost of using resource n, while the second term represents the congestion cost. When a net is routed, the value of  $A_{ij}$  determines whether the delay or the congestion cost of a resource dominates. If a net is near critical (i.e. its  $A_{ij}$  is close to 1), then congestion is largely ignored and the cost of using a resource is primarily determined by the delay term. If the criticality of a net is low, the congestion term in Eq. 2.4 dominates, and the route found for the net avoids congestion while potentially incurring delay.

Pathfinder has proved to be one of the most powerful FPGA routing algorithms to date. The negotiation-based framework that trades off delay for congestion is an extremely effective technique for routing signals on FPGAs. More importantly, Pathfinder is a truly architecture-adaptive routing algorithm. The algorithm operates on a directed graph abstraction of an FPGA's routing structure, and can thus be used to route netlists on any FPGA that can be represented as a directed routing graph.

# 2.5.6 Timing Analysis

Timing analysis [99] is used for two basic purposes:

- To determine the speed of circuits which have been completely placed and routed,
- To estimate the slack [68] of each source-sink connection during routing (placement and other parts of the CAD flow) in order to decide which connections must be made via fast paths to avoid slowing down the circuit.

First the circuit under consideration is presented as a directed graph. Nodes in the graph represent input and output pins of circuit elements such as LUTs, registers,

and I/O pads. Connections between these nodes are modeled with edges in the graph. Edges are added between the inputs of combinational logic Blocks (LUTs) and their outputs. These edges are annotated with a delay corresponding to the physical delay between the nodes. Register input pins are not joined to register output pins. To determine the delay of the circuit, a breadth first traversal is performed on the graph starting at sources (input pads, and register outputs). Then the arrival time,  $T_{arrival}$ , at all nodes in the circuit is computed with the following equation:

$$T_{arrival}(i) = \max_{j \in fanin(i)} \{T_{arrival}(j) + delay(j, i)\}$$

where node i is the node currently being computed, and delay(j, i) is the delay value of the edge joining node j to node i. The delay of the circuit is then the maximum arrival time,  $D_{max}$ , of all nodes in the circuit.

To guide a placement or routing algorithm, it is useful to know how much delay may be added to a connection before the path that the connection is on becomes critical. The amount of delay that may be added to a connection before it becomes critical is called the slack of that connection. To compute the slack of a connection, one must compute the required arrival time,  $T_{required}$ , at every node in the circuit. We first set the  $T_{required}$  at all sinks (output pads and register inputs) to be  $D_{max}$ . Required arrival time is then propagated backwards starting from the sinks with the following equation:

$$T_{required}(i) = \min_{j \in fanout(i)} \{T_{required}(j) - delay(j, i)\}$$

Finally, the slack of a connection (i, j) driving node, j, is defined as:

$$Slack(i, j) = T_{required}(j) - T_{arrival}(i) - delay(i, j)$$

### 2.5.7 Bitstream Generation

Once a netlist is placed and routed on an FPGA, bitstream information is generated for the netlist. This bitstream is programmed on the FPGA using a bitstream loader. The bitstream of a netlist contains information as to which SRAM bit of an FPGA be programmed to 0 or to 1. The bitstream generator reads the technology mapping, packing and placement information to program the SRAM bits of Look-Up Tables. The routing information of a netlist is used to correctly program the SRAM bits of connection boxes and switch boxes.

# 2.6 Research Trends in Reconfigurable Architectures

Until now in this chapter a detailed overview of logic architecture, routing architecture and software flow of FPGAs is presented. In this section, we highlight some of the disadvantages associated with FPGAs and further we describe some of the trends that

are currently being followed to remedy these disadvantages. FPGA-based products are basically very effective for low to medium volume production as they are easy to program and debug, and have less NRE cost and faster time-to-market. All these major advantages of an FPGA come through their reconfigurability which makes them general purpose and field programmable. But, the very same reconfigurability is the major cause of its disadvantages; thus making it larger, slower and more power consuming than ASICs.

However, the continued scaling of CMOS and increased integration has resulted in a number of alternative architectures for FPGAs. These architectures are mainly aimed to improve area, performance and power consumption of FPGA architectures. Some of these propositions are discussed in this section.

## 2.6.1 Heterogeneous FPGA Architectures

Use of hard-blocks in FPGAs improves their logic density. Hard-Blocks, in FPGAs increase their density, performance and power consumption. There can be different types of hard-blocks like multipliers, adders, memories, floating point units and DSP blocks etc. In this regard, [19] have incorporated embedded floating-point units in FPGAs, [30] have developed virtual embedded block methodology to model arbitrary embedded blocks on existing commercial FPGAs. Here some of the academic and commercial architectures are presented that make use of hard-blocks to improve overall efficiency of FPGAs.

### 2.6.1.1 Versatile Packing, Placement and Routing VPR

Versatile Packing, Placement and Routing for FPGAs (commonly known as VPR) [14, 22, 120] is the most widely used academic mesh-based FPGA exploration environment. It allows to explore mesh-based FPGA architectures by employing an empirical approach. Benchmark circuits are mapped, placed and routed on a desired FPGA architecture. Later, area and delay of FPGAs are measured to decide best architectural parameters. Different CAD tools in VPR are highly optimized to ensure high quality results.

Earlier version of VPR supported only homogeneous achitectures [120]. However, the latest version of VPR known as VPR 5.0 [81] supports hard-blocks (such as multiplier and memory blocks) and single-driver routing wires. Hard-blocks are restricted to be in one grid width column, and that column can be composed of only similar type of blocks. The height of a hard-block is quantized and it must be an integral multiple of grid units. In case a block height is indivisible with the height of FPGA, some grid locations are left empty. Figure 2.25 illustrates a heterogeneous FPGA with 8 different kinds of blocks.

**Fig. 2.25** A heterogeneous FPGA in VPR 5.0 [81]



### 2.6.1.2 Madeo, a Framework for Exploring Reconfigurable Architectures

Madeo [73] is another academic design suite for the exploration of reconfigurable architectures. It includes a modeling environment that supports multi-grained, heterogeneous architectures with irregular topologies. Madeo framework initially allows to model an FPGA architecture. The architecture characteristics are represented as a common abstract model. Once the architecture is defined, the CAD tools of Madeo are used to map a target netlist on the architecture. Madeo uses same placement and routing algorithms as used by VPR [120]. Along with placement and routing algorithms, it also embeds a bitstream generator, a netlist simulator, and a physical layout generator in its design suite. Madeo supports architectural prospection and very fast FPGA prototyping. Several FPGAs, including some commercial architectures (such as Xilinx Virtex family) and prospective ones (such as STMicro LPPGA) have been modeled using Madeo. The physical layout is produced as VHDL description.

#### 2.6.1.3 Altera Architecture

Altera's Stratix IV [107] is an example of a commercial architecture that uses a heterogeneous mixture of blocks. Figure 2.26 shows the global architectural layout of Stratix IV. The logic structure of Stratix IV consists of LABs (Logic Array Blocks), memory blocks and digital signal processing (DSP) blocks. LABS are distributed symmetrically in rows and columns and are used to implement general purpose logic. The DSP blocks are used to implement full-precision multipliers of different



Fig. 2.26 Stratix IV architectural elements

granularities. The memory blocks and DSP blocks are placed in columns at equal distance with one another. Input and Output (I/Os) are located at the periphery of architecture.

Logic array blocks (LABs) and adaptive logic modules (ALMs) provide the basic logic capacity for Stratix IV device. They can be used to configure logic functions, arithmetic functions, and register functions. Each LAB consists of ten ALMs, carry chains, arithmetic chains, LAB control signals, local interconnect, and register chain connection lines. The local interconnect connects the ALMs that are inside same LAB. The direct link allows a LAB to drive into the local interconnect of its left or right neighboring LAB. The register chain connects the output of ALM register to the adjacent ALM register in the LAB. A memory LAB (MLAB) is a derivative of LAB which can be either used just like a simple LAB, or as a static random access memory (SRAM). Each ALM in an MLAB can be configured as a  $64 \times 1$ , or  $32 \times 2$  blocks, resulting in a configuration of  $64 \times 10$  or  $32 \times 20$  simple dual-port SRAM block. MLAB and LAB blocks always coexist as pairs in Stratix IV families.

The DSP blocks in Stratix IV are optimized for signal processing applications such as Finite Impulse Response (FIR), Infinite Impulse Response (IIR), Fast Fourier Transform functions (FFT) and encoders etc. Stratix IV device has two to seven columns of DSP blocks that can implement different operations like multiplication, multiply-add, multiply-accumulate (MAC) and dynamic arithmetic or logical shift functions. The DSP block supports different multiplication operations such as  $9\times 9$ ,  $12\times 12$ ,  $18\times 18$  and  $36\times 36$  multiplication operations. The Stratix IV devices contain three different sizes of embedded SRAMs. The memory sizes include 640-bit memory logic array blocks (MLABs), 9-Kbit M9K blocks, and 144-Kbit M144K blocks. The MLABs have been optimized to implement filter delay lines, small FIFO buffers, and shift registers. M9K blocks can be used for general purpose memory applications, and M144K are generally meant to store code for a processor, packet buffering or video frame buffering.

## 2.6.2 FPGAs to Structured Architectures

The ease of designing and prototyping with FPGAs can be exploited to quickly design a hardware application on an FPGA. Later, improvements in area, speed, power and volume production can be achieved by migrating the application design from FPGA to other technologies such as Structured-ASICs. In this regard, Altera provides a facility to migrate its Stratix IV based application design to HardCopy IV [56]. Altera gives provision to migrate FPGA-based applications to Structured-ASIC. Their Structured-ASIC is called as HardCopy [56]. The main theme is to design, test and even initially ship a design using an FPGA. Later, the application circuit that is mapped on the FPGA can be seamlessly migrated to HardCopy for high volume production. Their latest HardCopy-IV devices offer pin-to-pin compatibility with the Stratix IV prototype, making them exact replacements for the FPGAs. Thus, the same system board and softwares developed for prototyping and field trials can be retained, enabling the lowest risk and fastest time-to-market for high-volume production. Moreover, when an application circuit is migrated from Stratix IV FPGA prototype to Hardcopy-VI, the core logic performance doubles and power consumption reduces by half.

The basic logic unit of HardCopy is termed as HCell. It is similar to Stratix IV logic cell (LAB) in the sense that the fabric consists of a regular pattern which is formed by tiling one or more basic cells in a two dimensional array. However, the difference is that HCell has no configuration memory. Different HCell candidates can be used, ranging from fine-grained NAND gates to multiplexors and coarse-grained LUTs. An array of such HCells, and a general purpose routing network which interconnects them is laid down on the lower layers of the chip. Specific layers are then reserved to form via connections or metal lines which are used to customize the generic array into specific functionality. Figure 2.27 illustrates the correspondence between an FPGA and a compatible structured ASIC. There is a one to one layout-level correspondence between MRAMs, phase-lock loops (PLLs), embedded memories, transceivers, and I/O blocks. The soft-logic DSP multipliers and logic cell fabric of the FPGA are re-synthesized to structured ASIC fabric. However, they remain functionally and electrically equivalent in FPGAs and HardCopy ASICs.

Apart from Altera, there are several other companies that provide a solution similar to that of Altera. For example, the eASIC Nextreme [41] uses an FPGA-like design flow to map an application design on SRAM programmable LUTs, which are later interconnected through mask programming of few upper routing layers. Tierlogic [113] is a recently launched FPGA vendor that offers 3D SRAM-based TierFPGA devices for prototyping and early production. The same design solution can be frozen to a TierASIC device with one low-NRE custom mask for error-free transition to an ASIC implementation. The SRAM layer is placed on an upper 3D layer of TierFPGA. Once the TierFPGA design is frozen, the bitstream information is used to create a single custom mask metal layer that will replace the SRAM programming layer.



Fig. 2.27 FPGA/Structured-ASIC (HardCopy) Correspondence [59]

# 2.6.3 Configurable ASIC Cores

Configurable ASIC Core (cASIC) [35] is another example of reconfigurable devices that can implement a limited set of circuits which operate at mutually exclusive times. cASICs are intended as accelerator in domain-specific systems-on-a-chip, and are not designed to replace the entire ASIC-only chip. The host would execute software code, whereas compute-intensive sections can be executed on one or more cASICs. So, to execute the compute intensive sections, cASICs implement only data-path circuits and thus supports full-word blocks only (such as 16-bit wide multipliers, adders, RAMS, etc). Since the application domain of cASICs is more specific, they are significantly smaller than FPGAs. As hardware resources are shared between different netlists, cASICs are even smaller than the sum of the standard-cell based ASIC areas of individual circuits.

## 2.6.4 Processors Inside FPGAs

Considerable amount of FPGA area can be reduced by incorporating a microprocessor in an FPGA. A microprocessor can execute any less compute intensive task, whereas compute-intensive tasks can be executed on an FPGA. Similarly, a microprocessor based application can have huge speed-up gains if an FPGA is attached with it. An FPGA attached with a microprocessor can execute any compute intensive functionality as a customized hardware instruction. These advantages have compelled commercial FPGA vendors to provide microprocessor in their FPGAs so that complete system can be programmed on a single chip. Few vendors have integrated fixed hard processor on their FPGA (like AVR Processor integrated in Atmel FPSLIC [18] or PowerPC processors embedded in Xilinx Virtex-4 [126]). Others provide soft processor cores which are highly optimized to be mapped on the programmable resources of FPGA. Altera's Nios [90] and Xilinx's Microblaze [88] are soft processor meant for FPGA designs which allow custom hardware instructions. [96] have shown that considerable area gains can be achieved if these soft processors for FPGAs are optimized for particular applications. They have shown that unused instructions in a soft processor can be removed and different architectural tradeoffs can be selected to achieve on average 25% area gain for soft processors required for specific applications. Reconfigurable units can also be attached with microprocessors to achieve execution time speedup in software programs. [28, 70, 104] have incorporated a reconfigurable unit with microprocessors to achieve execution-time speedup.

# 2.6.5 Application Specific FPGAs

The type of logic blocks and the routing network in an FPGA can be optimized to gain area and performance advantages for a given application domain (controlpathoriented applications, datapath-oriented applications, etc). These types of FPGAs may include different variety of desired hard-blocks, appropriate amount of flexibility required for the given application domain or bus-based interconnects rather than bit-based interconnects. Authors in [83] have presented a reconfigurable arithmetic array for multimedia applications which they call as CHESS. The principal goal of CHESS was to increase arithmetic computational density, to enhance the flexibility, and to increase the bandwidth and capacity of internal memories significantly beyond the capabilities of existing commercial FPGAs. These goals were achieved by proposing an array of ALUs with embedded RAMs where each ALU is 4-bit wide and supports 16 instructions. Similarly, authors in [42] present a coarse-grained, field programmable architecture for constructing deep computational pipelines. This architecture can efficiently implement applications related to media, signal processing, scientific computing and communications. Further, authors in [128] have used bus-based routing and logic blocks to improve density of FPGAs

for datapath circuits. This is a partial multi-bit FPGA architecture that is designed to exploit the regularity that most of the datapath circuits exhibit.

## 2.6.6 Time-Multiplexed FPGAs

Time-multiplexed FPGAs increase the capacity of FPGAs by executing different portions of a circuit in a time-multiplexed mode [89, 114]. An application design is divided into different sub-circuits, and each sub-circuit runs as an individual context of FPGA. The state information of each sub-circuit is saved in context registers before a new context runs on FPGA. Authors in [114] have proposed a time-multiplexed FPGA architecture where a large circuit is divided into sub-circuits and each sub-circuit is sequentially executed on a time-multiplexed FPGA. Such an FPGA stores a set of configuration bits for all contexts. A context is shifted simply by using the SRAM bits dedicated to a particular context. The combinatorial and sequential outputs of a sub-circuit that are required by other sub-circuits are saved in context registers which can be easily accessed by sub-circuits at different times.

Time-Multiplexed FPGAs increase their capacity by actually adding more SRAM bits rather than more CLBs. These FPGAs increase the logic capacity by dynamically reusing the hardware. The configuration bits of only the currently executing context are active, the configuration bits for the remaining supported contexts are inactive. Intermediate results are saved and then shared with the contexts still to be run. Each context takes a micro-cycle time to execute one context. The sum of the micro-cycles of all the contexts makes one user-cycle. The entire time-multiplexed FPGA or its smaller portion can be configured to (i) execute a single design, where each context runs a sub-design, (ii) execute multiple designs in time-multiplexed modes, or (iii) execute statically only one single design. Tabula [109] is a recently launched FPGA vendor that provides time-multiplexed FPGAs. It dynamically reconfigures logic, memory, and interconnect at multi-GHz rates with a Spacetime compiler.

# 2.6.7 Asynchronous FPGA Architecture

Another alternative approach that has been proposed to improve the overall performance of FPGA architecture is the use of asynchronous design elements. Conventionally, digital circuits are designed for synchronous operation and in turn FPGA architectures have focused primarily on implementing synchronous circuits. Asynchronous designs are proposed to improve the energy efficiency of asynchronous FPGAs since asynchronous designs offer potentially lower energy as energy is consumed only when necessary. Also the asynchronous architectures can simplify the design process as complex clock distribution networks become unnecessary.

The first asynchronous FPGA was developed by [57]. It consisted the modified version of previously developed synchronous FPGA architecture. Its logic block was

similar to the conventional logic block with added features of fast feedback and a latch that could be used to initialize an asynchronous circuit. Another asynchronous architecture was proposed in [112]. This architecture is designed specifically for dataflow applications. Its logic block is similar to that of synchronous architecture, along with it consists of units such as split unit which enables conditional forwarding of data and a merge unit that allows for conditional selection of data from different sources. An alternative to fully asynchronous design is a globally asynchronous, locally synchronous approach (GALS). This approach is used by [69] where authors have introduced a level of hierarchy into the FPGA architecture. Standard hard or soft synchronous logic blocks are grouped together to form large synchronous blocks and communication between these blocks is done asynchronously. More recently, authors in [131] have applied the GALS approach on Network on Chip architectures to improve the performance, energy consumption and the yield of future architectures in a synergistic manner.

It is clear that, despite each architecture offering its own benefits, a number of architectural questions remain unresolved for asynchronous FPGAs. Many architectures rely on logic blocks similar to those used for synchronous designs [57, 69] and, therefore, the same architectural issues such as LUT size, cluster size, and routing topology must be investigated. In addition to those questions, asynchronous FPGAs also add the challenge of determining the appropriate synchronization methodology.

## 2.7 Summary and Conclusion

In this chapter initially a brief introduction of traditional logic and routing architectures of FPGAs is presented. Later, different steps involved in the FPGA design flow are detailed. Finally various approaches that have been employed to reduce few disadvantages of FPGAs and ASICs, with or without compromising their major benefits are described. Figure 2.28 presents a rough comparison of different solutions used to reduce the drawbacks of FPGAs and ASICs. The remaining chapters of this book will focus on the exploration of tree-based FPGA architectures using hard-blocks, tree-based application specific Inflexible FPGAs (ASIF), and their automatic layout generation methods.

This book presents new environment for the exploration of tree-based heterogeneous FPGAs. This environment is used to explore different architecture techniques for tree-based heterogeneous FPGA architecture. This book also presents an optimized environment for mesh-based heterogeneous FPGA. Further, the environments of two architectures are evaluated through the experimental results that are obtained by mapping a number of heterogeneous benchmarks on the two architectures.

Altera [11] has proposed a new idea to prototype, test, and even ship initial few designs on an FPGA, later the FPGA based design can be migrated to Structured-ASIC (known as HardCopy). However, migration of an FPGA-based product to Structured-ASIC supports only a single application design. An ASIF retains this



Fig. 2.28 Comparison of different solutions used to reduce ASIC and FPGA drawbacks

property, and can be a possible future extension for the migration of FPGA-based applications to Structured-ASIC. Thus when an FPGA-based product is in the final phase of its development cycle, and if the set of circuits to be mapped on the FPGA are known, the FPGA can be reduced to an ASIF for the given set of application designs. This book presents a new tree-based ASIF and a detailed comparison of tree-based ASIF is performed with mesh-based ASIF. This book also presents automatic layout generation techniques for domain-specific FPGA and ASIF architectures.



http://www.springer.com/978-1-4614-3593-8

Tree-based Heterogeneous FPGA Architectures Application Specific Exploration and Optimization Farooq, U.; Marrakchi, Z.; Mehrez, H.

2012, XVI, 188 p., Hardcover

ISBN: 978-1-4614-3593-8

#### Altera FLEX 8000 Block Diagram



- FLEX 8000 chip contains 26–162 LABs
  - Each LAB contains 8 Logic Elements (LEs), so a chip contains 208–1296 LEs, totaling 2,500–16,000 usable gates
  - LABs arranged in rows and columns, connected by FastTrack Interconnect, with I/O elements (IOEs) at the edges

#### Altera FLEX 8000 Logic Array Block



■ LAB = 8 LEs, plus local interconnect, control signals, carry & cascade chains

#### **Altera FLEX 8000 Logic Element**



#### ■ Each Logic Element (LE) contains:

- 4-input Look-Up Table (LUT)
  - Can produce any function of 4 variables
- Programmable flip-flop
  - Can configure as D, T, JR, SR, or bypass
  - Has clock, clear, and preset signals that can come from dedicated inputs, I/O pins, or other LEs
- Carry chain & cascade chain

# Altera FLEX 8000 Carry Chain (Example: n-bit adder)



- Figure from Altera technical literature
- Carry chain provides very fast (< 1ns) carry-forward between LEs
  - Feeds both LUT and next part of chain
  - Good for high-speed adders & counters

#### Altera FLEX 8000 Cascade Chain



Figure from Altera technical literature

- Cascade chain provides wide fan-in
  - Adjacent LE's LUTs can compute parts of the function in parallel; cascade chain then serially connects intermediate values
  - Can use either a logical AND or a logical OR (using DeMorgan's theorem) to connect outputs of adjacent LEs
  - Each additional LE provides 4 more inputs to the width of the function

#### **Altera FLEX 8000 LE Operating Modes**



- Each mode uses LE resources differently
  - 7 out of 10 inputs (4 data from LAB local interconnect, feedback from register, and carry-in & cascade-in) go to specific destinations to implement the function
  - Remaining 3 provide clock, clear, and preset for register

## Altera FLEX 8000 Operating Modes (cont.)

#### Normal mode

 Used for general logic applications, and wide decoding functions that can benefit from the cascade chain

#### Arithmetic mode

- Provides two 3-input LUTs to implement adders, accumulators, and comparators
  - One LUT provides a 3-bit function
  - Other LUT generates a carry bit

#### ■ Up/down counter mode

- Provides counter enable, synchronous up / down control, and data loading options
- Uses two 3-input LUTs
  - One LUT generates counter data
  - Other LUT generates fast carry bit
  - Use 2-to-1 multiplexer for synchronous data loading, clear and preset for

50

### Altera FLEX 8000 FastTrack Interconnect



Note:

(1) See Table 4 for the number of row channels.

#### Device-wide rows and columns

- Each LE in LAB drives 2 column (total 16) channels, which connects... that column
- Each LE in LAB drives 1 row channel, which connects to other LABs in that row
  - 3-to-1 muxs connect either LE outputs or column channels to row channels Fall 2004, Lecture 21

#### Altera FLEX 8000 I/O Elements



- Eight I/O Elements (IOEs) are at the end of each row and column
  - Some restrictions on how many row / column channels each IOE connects to
  - Contains a register that can be used for either input or output
    - Associated I/O pins can be used as either input, output, or bidirectional pins

#### Altera FLEX 8000 Configuration

- Loading the FLEX 8000's SRAM with programming information is called *configuration*, and takes about 100ms
  - After configuration, the device initializes itself (resets its registers, enables its I/O pins, and begins normal operation)
  - Configuration & initialization = command mode, normal operation = user mode
- Six configuration schemes are available:
  - Active serial FLEX gives configuration EPROM clock signals (not addresses), keeps getting new values in sequence
  - Active parallel up, active parallel down FLEX 8000 gives configuration EPROM sequence of addresses to read data from
  - Passive parallel synchronous, passive parallel asynchronous, passive serial passively receives data from some host

## Altera FLEX 8000 Block Diagram (Review)



- FLEX 8000 chip contains 26–162 LABs
  - Each LAB contains 8 Logic Elements (LEs), so a chip contains 208–1296 LEs, totaling 2,500–16,000 usable gates
  - LABs arranged in rows and columns, connected by FastTrack Interconnect, with I/O elements (IOEs) at the edges

#### **Altera FLEX 10K Block Diagram**



- FLEX 10K chip contains 72–1520 LABs
  - Each LAB contains 8 Logic Elements (LEs), so a chip contains 576–12,160 LEs, totaling 10,000–250,000 usable gates
- Each chip also contains 3–20 Embedded Array Blocks (EABs), which can provide 6,164–40,960 bits of RAM

#### Altera FLEX 10K Embedded Array Blocks (EABs)

- Each chip contains 3–20 EABs, each of which can be used to implement either logic or memory
- When used to implement logic, an EAB can provide 100 to 600 gate equivalents (in contrast, a LAB provides 96 g.e.'s)
  - Provides a very large LUT
    - Very fast faster than general logic, since it's only a single level of logic
    - Delay is predictable each RAM block is not scattered throughout the chip as in some FPGAs
  - Can be used to create complex logic functions such as multipliers (e.g., a 4x4 multiplier with 8 inputs and 8 outputs), microcontrollers, large state machines, and DSPs
  - Each EAB can be used independently, or combined to implement larger functions

#### Altera FLEX 10K Embedded Array Blocks (cont.)

- Using EABs to implement memory, a chip can have 6K–40K bits of RAM
  - Each EAB provides 2,048 bits of RAM, plus input and output registers
  - Can be used to implement synchronous RAM, ROM, dual-port RAM, or FIFO
  - Each EAB can be configured in the following sizes:
    - 256x8, 512x4, 1024x2, or 2048x1
  - To get larger blocks, combine multiple EABs:
    - Example: combine two 256x8 RAM blocks to form a 256x16 RAM block
    - Example: combine two 512x4 RAM blocks to form a 512x8 RAM block
    - Can even combine all EABs on the chip into one big RAM block
    - Can combine so as to form blocks up to 2048 words without impacting timing

#### Altera FLEX 10K Embedded Array Blocks (cont.)



Figure from Altera technical literature

- EAB gets input from a row channel, and can output to up to 2 row channels and 2 column channels
- Input and output buffers are available

#### Altera APEX 20K Overview

#### APEX 20K chip contains:

- 256–3,456 LABs, each of which contains 10 Logic Elements (LEs), so a chip contains 2,560–51,840 Les, 162,000–2,391,552 usable gates
- 16–216 Embedded System Blocks (EABs), each of which can provide 32,768–442,368 bits of memory
  - Can implement CAM, RAM, dual-port RAM, ROM, and FIFO

#### Organization:

- MultiCore architecture, combining LUT, product-terms, & memory in one structure
  - Designed for "system on a chip"
- MegaLAB structures, each of which contains 16 LABs, one ESB, and a MegaLAB interconnect (for routing within the MegaLAB)
  - ESB provides product terms *or* memory

#### APEX LABs and Interconnect

- Logic Array Block (LAB)
  - 10 LEs
  - Interleaved local interconnect (each LE connects to 2 local interconnect, each local interconnect connects to 10 LEs)
    - Each LE can connect to 29 other Les through local interconnect
- Logic Element (LE)
  - 4-input LUT, carry chain, cascade chain, same as FLEX devices
  - Synchronous and asynchronous load and clear logic
- Interconnect
  - MegaLAB interconnect between 16 LABs, etc. inside each MegaLAB
  - FastTrack row and column interconnect between MegaLABs

## APEX Embedded System Blocks (ESBs)

- Each ESB can act as a macrocell and provide product terms
  - Each ESB gets 32 inputs from local interconnect, from adjacent LAB or MegaLAB interconnect
  - In this mode, each ESB contains 16
    macrocells, and each macrocell contains
    2 product terms and a programmable
    register (parallel expanders also provided)
- Each ESB can also act as a memory block (dual-port RAM, ROM, FIFO, or CAM memory) configured in various sizes
  - Inputs from adjacent local interconnect, which can be driven from MegaLAB or FastTrack interconnect
  - Outputs to MegaLAB and FastTrack, some outputs to local interconnect

### Power consumption: -

The Average Power consumption in cmos digital circuits can be expressed as

- i) Dynamic (or) Switching Power consumption
- ii) short circuit power consumption
  - iii) Leakage power consumption

# is switching power dissipation:

when the output node voltage of a cmos logic gate makes a logic transition. In digital cmos circuits, switching power is dissipated when emergy is drawn from the power supply to change up the output node capacitance.

Fig: NoR gate driving two NAND gates through Interconnection lines

During this charge up phase, output node voltage typically makes...

During discharging (charge down phase), output voltage drops from

one half of the energy is drawn from the power supply is dissipated as heat in conducting pmos transitors. No energy is drawn from the power supply during charge-down phase.

From the figure, two Input NOR gate drives a NAND gates, through Interconnection lines. The total capacitive load at the output of NOR gate consists of

- i) The output mode capacitance of Jate it self
- ii) The total Interconnect capacitance
- iii) Input capacitances of driven gates.

The cros Logic gate for switching power caluctation as shown in fig.



Fig: - cmos Logic gate for switching power caluclation

The energy required to charge up the output node to VDD and charge down the total output load capacitance to ground level.

The Average dynamic power consumption in cmos Logic gates

(07)

To Introduce of (node transition factor), which is the effective no of Power consuming voltage transitions experienced per clock cycle.

Then, average switching power consumption becomes

consider 2-Input NOR gate results in dynamic Power dissipation even if the output node voltage remains unchanged.



The Dynamic Power consumption can be reduced by

- i) Reduction of Power Supply Voltage VDD
- ii) Reduction of switching probability
- iii) Reduction of load capacitance

The circuit current component which passes through both Nmos and Pmos devices during switching does not contribute to the charging of the capacitances in the circuit hence it is called "short circuit current component".



Fig: - short circuit current during switching

consider a symmetric cmos Inverter, with  $k_m = k_p = k$  and  $V_m = |V_{pp}| = V_T$  and with a Very small capacitive Load.

The Nmos transistor in the circuit starts conducting when the Input Voltage exceeds the threshold voltage VIn. The Pmos transistor on when the Input reaches the voltage level (VoD-VTP).

If the Inverter is driven with an Input voltage waveform with equal rise time and fall times ( $\gamma_{rise} = \gamma_{fall} = \gamma$ ).

The average short circuit current drawn from the power supply is

整件 九十二年 自己的知识 题 下外的人物的

to the transfer of the contract of the

Note that short circuit power dissipation is linearly Poinportional to Input signal rise & fall times and also transconductance of the transistors.

The Input - output voltage waveforms as shown in fig.



Fig:- short circuit current as a function of Ilp rise time/fall time

# iii) Leakage power dissipation:

Nmos, pmos transistors used in cmos Logic gate have.
Nonzero reverse Leakage and subthreshold currents.

anned with CamScanne

In cmos VLSI chip containing very large no of transistors. these currents contribute overall power dissipation.

The magnitude of Leakage currents determined by

- is Reverse Leakage current
- ii) sub threshold Leakage current

# i) Reverse Leakage current in a cmos Inverter:

The reverse diode

Leakage occurs when

P-n junction between

drain and bulk of the

transistor is reverse biased.



Fig:- Reverse Leakage airrent Paths in a cmos Inverter with high Input Voltage

High Input Voltage

By Applying high imped voltage (logici), Nmos ON and output node voltage discharged to zero. Pmos OFF, The reverse potential difference of Voo between drain and N-well causing a diode leakage current through drain Junction. N-well region of pmos transistor current through drain Junction. N-well region of pmos transistor also reverse biased with VDD wish to p-substrate.

PMOS ON, The output node Voltage charged to VDD. The reverse Potential difference between Nmos drain region and p-type substrate causes reverse leakage current which is drawn from Power supply.

The reverse leakage current of P-n Junction is

I reverse = 
$$A J_S \left( e^{\frac{VV_{bias}}{KT}} - 1 \right)$$

where

Vblas > Reverse bias voltage

Js -> Reverse saturation current density

A -> Area

# ii) sub threshold leakage current:-

the min and the plant of control

which is due to carrier diffusion between source and drain regions of the transistor in weak Inversion.

The behaviour of

mos transistor in subthreshold operating region

is similar to Bipolar device.

If there is no

switching activity in the

circuit, subthreshold

leakage current can occur.



Fig: - subthreshold Leakage current

Path In a cmos Inverter with

high Input Voltage

POWEY The total dissipation in cmos digital circuits is

Ptotal = or Good VDD fork + VDD (Ishort-circuit + ILeakage + Istatic)

Low Power design through voltage scaling:-

The average power Parg = Good VDD felk

The Average switching power dissipation is proportional to the square of the power supply voltage, hence reduction of VDD will reduce power dissipation.

If the power supply voltage is scaled down while all Parameters are kept constant, the propagation delay time Increase.

The normalized variation of delay as a function of VDD. where threshold voltages of Nmos and pmos transistor are VTn = 0.8 V, VTP = -0.8 V respectively. the property of the second second



Normalized Power dissipation

power supply voltage (YDD)

For Example, reducing the threshold voltage from 0.8v to 0.2v can Improve the delay at VDD=2v by a factor 2.



Fig:- Variation of Normalized Propagation delay of cmos
Inverter, as a function of VDD and Threshold voltage (4)

The Influence of threshold voltage reduction upon Propagation delay is especially Pronounced at Low Power Supply voltages for VDD < 2V.

It should be noted, however, that using low VT transistors vaises

It should be noted, however, that using low VT transistors vaises

Significant concerns about noise margins and subthreshold conduction.

Significant concerns about noise margins and subthreshold conduction.

Smaller threshold voltages lead to smaller noise margins for

the cmos Logic gates.

the threshold Voltages smaller than 0.2V, Leakage Power dissipation due to subthreshold conduction may become a Very significant component of the overall Power consumption.

Two circuit design techniques used to overcome the leakage and high power dissipation associated with Low-4 circuits the techniques are

- is variable Threshold cmos (YT cmos) circuits
- ii) multiple- Threshold cmos (mTcmos) circuits

### is variable- Threshold cmos circuits

Low supply voltage (YDD) and Low Threshold voltage (YT) in cmos logic circuits is an efficient method for reducing overall power dissipation & maintaining high speed performance.

In cmos Logic circuits, the substrate terminals of all mmos transistors are connected to ground while substrate termina of all Pmos transistors are connected to Vop.

In Victors circuit technique, transistors are designed with Low threshold voltage, and substrate bias of Nmos, pmos transistors are generated by a variable threshold substrate bias control circuit as shown in fig.

people a magnetical season that he had not been as a graph of the

and the second of the form that the first second first and



Fig: - Variable Threshold (VTCMOS) cmos Enverter circuit

In Active mode, substrate bias voltage for mos transistor VBn=0 and p. mos transistor VBp=VDD. Thus Inverter transistors do not experience any body effect (back gate - bias effect).

In stand-by mode, substrate bias control circuit generates at tower substrate bias voltage for nmos transistor and higher substrate bias voltage for pmos transistor. The magnitudes of the threshold bias voltage for pmos transistor. The magnitudes of the threshold voltages of V<sub>Tn</sub> and V<sub>Tp</sub> both Increase in stand-by mode due to back-gate bias effect.

The VTCMOS circuit can also used to control the threshold-Voltages of transistors in order to reduce leakage currents.

The block diagram of Low-Power chip with Low Internal supply voltage VDDL and threshold voltage control as shown in fig. below



Fig: - Block diagram of Low-Power chip

- > Ilo circuit of chip usually operate with a higher external supply voltage, in order to Increase noise margins to enable communication with Peripheral devices.
- Voltage VDDL.
  - Two signal level converters are used to reduce Voltage swing of the Incoming Input signals and Increase voltage swing of the outgoing output signals.
- The Internal Low-voltage circuitaly can be designed using victors techniques, where the threshold voltage control whit adjust the substrate bias in order to suppress leakage currents.

in the factor of the part of the part of the part of

In this technique, stand-by mode is based on a different types of transistors (Nmos, pmos) with a different threshold voltages:

- i) Low-Vy transistors used to design Logic gates where switching speed is essential.
- ii) High-Vy transistors used to prevent leakage dissipation.



Prevents subthreshold Leakage in stand-by mode

High-speed operation with Low per power consumption

Prevents subthreshold Leakage in stand-by mode

Fig:- Structure of mrcmos Logic gate

The Active mode, high-V<sub>T</sub> transistors are turned on and the Logic gates consists of Low-V<sub>T</sub> transistors operate with Low switching power dissipation and small propagation delay.

In stand-by mode, high-VT transistors are turned off and conduction Paths for any subthrestobl Leakage current that may originate from Internal Low-VT circuitry are effectively cutoff.

do animan ner

# Estimation and optimization of switching activity:

The mode transition factor of, which is the effective no of power consuming voltage transitions experienced by output capacitance per clock cycle. This parameter also called "switching Activity factor".

consider a signal Probabilities Po, Pi.

Po > Probability of having Logic'd at the output

Pi -> Probability of having Logic 1' at the output.

The Probability that power-consuming (oto 1) transition occurs at the output node is product of a output signal probabilities

$$P_{(0\rightarrow 1)} = P_0 \cdot P_1$$

For example,

consider a static cmos Norgate.

The probability that output

getting '1' > +

Probability that output getting 'o'

Miller wilder Shortyng

| A | В | z (fifb)   |
|---|---|------------|
| 0 | 0 | <b>9</b> 1 |
| 0 | 1 | 0          |
| 1 | 0 | 0          |
| 1 | j | 0          |

The probability that power-consuming transition at output node is

$$P_{0\rightarrow 1} = P_{0} \cdot P_{1}$$
  
=  $\frac{3}{4} \times \frac{1}{4} = \frac{3}{16}$ 

The state diagram as



Fig: - State transition diagram for 2-Ilp NOR gate

Generally, cmos Logic gate with n-Ilp variables, contains en output combinations.

By Probability of output will be 'o'

m, > probability of output will be '1'

Then 
$$P_0 = \frac{n_0}{2n}$$
,  $P_1 = \frac{n_1}{2n}$ 

The output node voltage from o'to 1' is

$$P_{0 \to 1} = P_0 \cdot P_1 = \frac{m_0}{2^m} \cdot \frac{m_1}{2^m}$$

$$= \left(\frac{m_0}{2^m}\right) \cdot \left(\frac{2^m - m_0}{2^m}\right)$$

$$= \left(\frac{m_0}{2^m}\right) \cdot \left(\frac{2^m - m_0}{2^m}\right)$$

$$m_1 = 2^m - m_0$$

$$m_1 = 2^m - m_0$$

The output transition probabilities of different logic gates, as a function of No. of Inputs as shown in fig.



consider, 2-Input NOR Jate,

Pin > Probability that having logic i' at Imput 'A'
Pis > Probability that having logic i' at Imput 'B'

The Probability of logic'i' at output node is

Power consuming output transition is

Similarly, 2-Imput NAND gate,

Probability of Logic o' at output is  $P_1 = 1 - P_0$ Probability of Logic 1' at output is  $P_1 = 1 - P_0$   $= 1 - (P_A P_B)$ 

The Power consuming output transition is

Reduction of switching activity:

switching Activity in cmos digital Integrated circuits can be reduced by Algorithmic optimization, architecture optimization, by proper choice of Logic topology, circuit level optimization.

In Algorithmic optimization, the characteristics of the data such as dynamic range, correlation, statistics of data transmission.

- i) To use Gray codes Instead of binary codes because to reduce no of transitions.
  - ii) to use Sign-magnitude representation Instead of 2's complement representation.
- -> Architecture-level measure to reduce switching activity is based on delay balancing and reduction of glitches.

### Gilitch reduction:-

In multilevel Logic circuits, the propagation delay from one Logic block to next can cause signal transitions (or) Glitches as a result of critical races (or) dynamic hazards.

Generally, all Input signals of a gate change Simultaneously no glitching occurs. Glitch (or) dynamic hazard occur if Input signals change at different times.

In some cases, the signal glitches are only partial in node.

Voltage does not a make full transition between ground and VDD levels.



Fig:- Signal glitching in multi-level static cmos circuits

Glitches occur Perimarily due to mismatch (or) Impalance in the Path lengths in 109ic Networks.

consider 4-Ilp xor gate structure, All xor gates have same delay and 4-Ilp signals arrive at same time.



From fig (a), will suffer from glitching due to Input assival times are different.

From fig (b), All Imputs avoive at same time so there is no glitching occurs.

from fig (b), results in smaller propagation delay.

# Reduction of Switched capacitance:

Switched capacitance plays a Significant role in dynamic

Power dissipation of the circuit. hence reduction of Parasitic capacitance

is a major goal for Low-Power design of digital Integrated circuits.

Interconnect design (or) system level measures:

At the System level, to reduce the Switched capacitance

is to limit the use of Shared resources.

A simple example is the use of Global bus structure for data transmission between large no of operational modules.



Sous I release (a)

Chair Chai

from fig (a), this structure results in a large capacitance due to in the large no of drivers and receivers sharing the same transmission medium.

ii) the Parasitic capacitance of the long bus line.
Obviously, driving the large bus capacitance will require a

significant amount of power consumption during each bus access.

Alternatively, the global bus structure can be partioned into no of smaller dedicated buses to handle data transmission between neighbouring modules as shown in fig (b).

In this case, the switched capacitance during each bus access is significantly reduced, Yet multiple buses may Increase the overall routing area on chip.

# circuit-level measures:-

The Physical capacitance is a function of no. of transistors that are required to Implement a given function.

for example, to reduce the load capacitance is to use transfer gates (Pass transistor Logic) Instead of conventional cmos Logic gates to Implement's Logic functions.

Paus gate Logic design is attractive Since fewer transistors are required certain functions such as XOR, XNOR. In authoretic operations where binary adders and multipliers are used, Paus transistor Logic offers significant advantages.

e incil and great all in assembly place of the best place.

# clock design:-

Military - proportion to electronic trice

To reduce the switching activity in cmos Logic circuits is the use of gated clock signals.

starting and other day had belief and many the transfer

to the catyon to the spinger form

The block diagram of an N-bit companator circuit which is designed using the gated clock technique.



Fig:- Block diagram of an N-bit no. comparator with gated clock

The circuit compares the magnitudes of the two cinsigned N-bit binary numbers (A and B) and produces an output to Indicate which one is larger.

- → All Imput bits are first Latched into & bi N-bit registers, and applied to comparator circuit.
- → In this case, two N-bit registers arrays dissipate Power in eve clock cycle.
- → Yes, if the most significant bits, A(N-1) and B(N-1) of two binary numbers are different from each other, then decision can be made by comparing most significant bits (MSBs) only.
- > The two msss are latched in a two-bit register which is driven by original system clock.
- AND gate.
- The two msb's are different (i.e of (or) 10) the xnor gate produces Logico at the output, disabling the clock signal of the lower-order registers.
  - In this case, a separate MSB comparator circuit is used to decide which one of the 2 no!'s is larger.
- If 2 msBs are Identical (i.e 00 (07) 11), the gated clock signal is applied to lower-order registers and decision is made by (N-1) bit comparator circuit.
  - The amount of power dissipated in lower-order registers and (N-1) bit comparator circuit can be quite significant, if the bit-length (N) is large.

> Assume that, the Incoming binary no's are randomly distributed, we can see that the gated clock strategy effectively reduces over all switching power dissipation of slm approximately 50%. since a large portion of the s/m is disabled for half of all Imput combinations

# The contract of the place and the second of the

Power Grid:

The Power distribution system design is an area of Increasing semi conductor Industry. According to data in more than 50% of tape outs using .0.13 micron technology would fail, if the power distributed system were not validated.

Lower operating voltages, Increased device Integration density and leakage currents, higher operating frequencies and use of Low Power design techniques; they all tend to stress the Power grid and the state of such proper printing as technology evolves.

mainly there are 4 major problems that affect Power-M Then to the state of the state of the distribution system are at any of harmoning of the state of

- 1. Voltage drop
- . I bianisa in the mineral breveller a. Ground bounce
- 3. Ldildt Noise
- 4. Electro migration

Voltage drop also called IR drop, is the voltage reduction that occurs on power supply networks.

The IR drop can be static (or) dynamic and results from the existence of non-ideal elements: The resistance with in the power and ground supply wiring and capacitance between them. Static voltage drop considers only Average currents, dynamic voltage drop considers current waveforms with in clock cycles.

similar effects may be found in ground wiring, usually referred as ground bounce.

Both effects contribute lower operating voltages, which Increase overall time response of a device.

The Ldi Noise is caused by current spikes on wires that will Induce Voltage changes on these wires and their neighboring wires, due to Inductance coupling.

The power grid verification is usually accomplished by simulation. The disadvantage of simulation is that stimuli must be generated be very carefully such that the relevant scenarios are accounted.

maider find or - and ...

ation Him I a



Fig: - simplified power grid model

simulating the power grid with all the devices might be Impossible for VLSI circuits, as it consume too many resources. Furthermore, simulating for all possible device settings is also Impossible, as it would take too long.

The Size of current designs, it is also Impossible to assume that designer Intervention will be sufficient to generate appropriate sets of Stimuli for grid Verification.

power grid simulation that considers all the circuit devices as simultaneously active is clearly unnecessary. Voltage drop and ground bounce may occur if there is a significant no of devices becoming active in a short period of time and drawing devices becoming active in a short period of time and drawing current from close regions of the power grid.

we propose a technique to determine, with in a time frame, how many devices become active on nearby regions of the power grid. The higher no of devices be in this situation, the greater the possibility of voltage drop/ground bounce effects in power grid.

Similarity in the Tropossible, as it

in the circuit de vices

private the best of thirty part of the primary of

First transfer that the first ending in the first services

visit in the purchase in the same of the s

Private sol start

are to a delice of met and the state of the

plainforme start of the

mailton i ser y fill off thought to be

were Tack of a fact of the agency of the property of the

Transfer to profession of the Contract Contract to the contract