### **Parallel Adders**

### Introduction

Binary addition is a <u>fundamental</u> operation in most digital circuits There are a variety of adders, each has certain performance. Each type of adder is selected depending on where the adder is to be used.

### Adders

Basic Adder Unit Ripple Carry Adder Carry Skip Adders Carry Look Ahead Adder Carry Select Adder Pipelined Adder Manchester carry chain adder Multi-operand Adders Pipelined and Carry save adders

### **Basic Adder Unit**

A combinational circuit that adds two bits is called a half adder

A full adder is one that adds three bits, the third produced from a previous addition operation



2. A brief introduction to Ripple Carry Adder

•Reuse carry term to implement full adder

Figure 2.2 1bit full adder CMOS complementary implementation

### **Ripple Carry Adder**

The ripple carry adder is constructed by cascading full adder blocks in series
 The carryout of one stage is fed directly to the carry-in of the next stage
 For an n-bit parallel adder, it requires n full adders



### **Ripple Carry Drawbacks**



- Not very efficient when large bit numbers are used
- Delay increases linearly with the bit length



Critical path in a 4-bit ripple-carry adder

Note: delay from carry-in to carry-out is more important than from A to carry-out or from carry-in to SUM, because the carry-propagation chain will determine the latency of the whole circuit for a Ripple-Carry adder.

#### •Delay

The latency of a 4-bit ripple carry adder can be derived by considering the above worst-case signal propagation path. We can thus write the following expression:

 $T_{\text{RCA-4bit}} = T_{\text{FA}}(\text{A0,B0} \rightarrow \text{Co}) + 7 \text{ FA} (C \text{ in} \rightarrow \text{C1}) + T_{\text{FA}} (C \text{ in} \rightarrow \text{C2}) + T_{\text{FA}} (C \text{ in} \rightarrow \text{S3})$ 

And, it is easy to extend to k-bit RCA:  $T_{\text{RCA-4bit}} = T_{\text{FA}}(A0,B0 \rightarrow Co) + (K-2) * T_{\text{FA}} (Cin \rightarrow Ci) + T_{\text{FA}} (Cin \rightarrow S_{k-1})$ 

#### Comparison of CMOS and TG Logic

#### •Simulation result

|   | CCT Logic<br>Struture | Area<br>(µm²) | Total# of<br>Transistor | Input<br>tr,tf<br>(ps) | Tp(max)<br>(ns) | Power<br>(mW)<br>Average | Power<br>(mW)<br>Max | AT     | AT <sup>2</sup> | DP     |  |
|---|-----------------------|---------------|-------------------------|------------------------|-----------------|--------------------------|----------------------|--------|-----------------|--------|--|
| Γ | CMOS<br>(Normal)      | 305.76        | 112                     | 10                     | 1.3             | 0.695                    | 19.5                 | 397.49 | 516.73          | 0.9035 |  |
|   |                       |               |                         | 250                    | 1.3             | 0.784                    | 9.06                 | 397.49 | 516.73          | 1.0192 |  |
|   | CMOS<br>(Optimized)   | 262.08        | 108                     | 10                     | 0.9             | 0.33                     | 13.3                 | 235.87 | 212.28          | 0.297  |  |
| 1 |                       |               |                         | 250                    | 0.9             | 0.372                    | 4.94                 | 235.87 | 212.28          | 0.3348 |  |
| Γ | TG<br>(Normal)        | 280.8         | 104                     | 10                     | 1.7             | 0.624                    | 22,2                 | 477.36 | 811.51          | 1.0608 |  |
|   |                       |               |                         | 250                    | 1.8             | 0.749                    | 7.98                 | 505.44 | 909.79          | 1.3482 |  |
|   | TG<br>(Optimized)     | 212.16        | 100                     | 10                     | 1.4             | 0.452                    | 17.3                 | 297.02 | 415.83          | 0.6328 |  |
| ( |                       |               |                         | 250                    | 1.5             | 0.504                    | 5.91                 | 318.24 | 477.36          | 0.756  |  |

4-bit RCA performance comparison of CMOS and TG logic (min size)

#### Comparison of CMOS and TG Logic

#### •Simulation result

|   | CCT<br>Logic<br>Struture | Area<br>(µm²) | Transistor | Input<br>tr,tf (ps) | Tp(max)<br>(ns) | Power<br>(mW)<br>Average | Power<br>(mW)<br>Max | AT     | AT <sup>2</sup> | DP     |
|---|--------------------------|---------------|------------|---------------------|-----------------|--------------------------|----------------------|--------|-----------------|--------|
|   | CMOS                     | 393.12        | 108        | 10                  | 0.8             | 0.695                    | 19.5                 | 314.50 | 251.60          | 0.556  |
| - | (2/1)                    |               |            | 250                 | 0.8             | 0.784                    | 9.06                 | 314.50 | 251.60          | 0.6272 |
|   | TG (2/1)                 | 280.8         | 100        | 10                  | 0.9             | 0.452                    | 17.3                 | 252.72 | 227.45          | 0.4068 |
|   |                          |               |            | 250                 | 1               | 0.504                    | 5.91                 | 280.80 | 280.80          | 0.504  |

4-bit RCA performance comparison of CMOS and TG logic (Wp/Wn=2/1)

### Carry Look-Ahead Adder

Calculates the carry signals in advance, based on the input signals

#### **Boolean Equations**

 $P_i = A_i \oplus B_i$ Carry propagate $G_i = A_i B_i$ Carry generate $S_i = P_i \oplus C_i$ Sum $C_{i+1} = G_i + P_i C$ Carry out

#### Signals P and G only depend on the input bits

### Carry Look-Ahead Adder

## Applying these equations for a 4-bit adder:

# $C_{1} = G_{0} + P_{0}C_{0}$ $C_{2} = G_{1} + P_{1}C_{1} = G_{1} + P_{1}(G_{0} + P_{0}C_{0}) = G_{1} + P_{1}G_{0} + P_{1}P_{0}C_{0}$ $C_{3} = G_{2} + P_{2}C_{2} = G_{2} + P_{2}G_{1} + P_{2}P_{1}G_{0} + P_{2}P_{1}P_{0}C_{0}$ $C_{4} = G_{3} + P_{3}C_{3} = G_{3} + P_{3}G_{2} + P_{3}P_{2}G_{1} + P_{3}P_{2}P_{1}G_{0} + P_{3}P_{2}P_{1}P_{0}C_{0}$



Look-Ahead Carry generator

#### Example Design of a large Carry Look-ahead Adder Equations are in the Notes



### **Carry Skip Adders**



 Are <u>composed of ripple carry adder blocks</u> of fixed size\* and a carry skip chain
 The size of the blocks are chosen so as to minimize the longest <u>life of a carry</u>

### **Carry Skip Mechanics**

#### **Boolean Equations**

- Carry Propagate:  $P_i = A_i \oplus B_i$
- Sum:  $S_i = P_i \oplus C_i$
- Carry Out:  $C_{i+1} = A_i B_i + P_i C_i$

#### Worthwhile to note:

- If  $\underline{A_i} = \underline{B_i}$  then  $P_i = 0$ , making the carry out,  $C_{i+1}$ , depend only on  $A_i$  and  $B_i \rightarrow \underline{C_{i+1}} = \underline{A_i} \underline{B_i}$
- • $C_{i+1} = 0$  if  $A_i = B_i = 0$
- • $C_{i+1} = 1$  if  $A_i = B_i = 1$

Alternatively if  $\underline{A_i \neq B_i}$  then  $P_i = 1 \rightarrow \underline{C_{i+1} = C_i}$ 

### Carry Skip (example)

- **Two Random Bit Strings:**
- A
   10100
   01011
   10100
   01011

   B
   01101
   10100
   01010
   01100

   block 3
   block 2
   block 1
   block 0
- compare the two binary strings inside each block
- •If all the bits inside are <u>unequal</u>, block 2, then the <u>carry</u> in from block 1 is propagated to block 3
- •Carry-ins from block 2 receive the carry in from block 1
- •If there exists a pair of bits that is <u>equal</u> carry skip mechanism <u>fails</u>

### Carry Skip Chain



computes  $(A_0 \oplus B_1) . . . . (A_5 \oplus B_5)$ 

#### Various Implementations of Multiplexer (MUX)



### Manchester Carry Adder



#### **Boolean Equations:**

1)  $G_i = A_i B_i$ 2)  $P_i = A_i \oplus B_i$ 3)  $S_i = P_i \oplus C_i$ 4)  $C_{i+1} = G_i + P_iC_i$  --carry generate of i<sup>th</sup> stage --carry propagate of i<sup>th</sup> stage --sum of i<sup>th</sup> stage --carry out of i<sup>th</sup> stage

#### Manchester Carry Adder with Skip Mechanism



### Carry Select Adder Example 8-bit Adder



It is <u>composed of 3 sections of one 4-bit and two four-bit ripple carry</u> <u>adders</u>.

Both sum and carry bits are calculated for the two alternatives of the input carry, "0" and "1"

#### **8-Bit Carry Select Adder**



### 32 bit Carry Select (Mechanics)

- The <u>carry out of each section determines the carry in of the next section</u>, which then selects the appropriate ripple carry adder
- The very <u>first section has a carry in of zero</u>
- <u>Time delay</u>: time to compute first section + time to select sum from subsequent sections



#### **Carry Select Adder Design**

#### **Linear Carry Select and Non\_Linear Adders**

The linear carry-select adder is constructed by chaining a number of equal-length adder stages

The Non-Linear Adder is constructed according to the delay of the MUX and the Adder.

### **Multi-Operand and Pipelining**





Signal propagation in serial blocks



Signal Propagation in Pipelined serial Blocks

### Pipelined Adder



The added complexity of such a pipelined adder pays off if long sequences of numbers are being added.



### **Pipelined Adder**

Pipelining a design will increase its throughput The trade-off is the use of registers • If pipelining is to be useful these three points has to be present: -It repeatedly executes a basic function. -The basic function must be divisible into independent stages having minimal overlap with each other. -The stages must be of similar complexity

### Carry Save adder



## The rest of these slides are for information only

#### Parallel Prefix Adder<sup>[13,15,2]</sup>

The parallel prefix adder is a kind of carry look-ahead adders that accelerates a n-bit addition by means of a parallel prefix carry tree.



#### Flagged Prefix Adder<sup>[13,15]</sup>



17

#### **Reference List**

[1] Reduced latency IEEE floating-point standard adder architectures. *Beaumont-Smith, A.; Burgess, N.; Lefrere, S.; Lim, C.C.;* Computer Arithmetic, 1999. Proceedings. 14th IEEE Symposium on , 14-16 April 1999

[2] M.D. Ercegovac and T. Lang, "Digital Arithmetic." San Francisco: Morgan Daufmann, 2004.

[3] Using the reverse-carry approach for double datapath floating-point addition. *J.D. Bruguera and T. Lang*. In Proceedings of the 15th IEEE Symposium on Computer Arithmetic, pages 203-10.

[4] A low power approach to floating point adder design. *Pillai*, *R.V.K.; Al-Khalili*, *D.; Al-Khalili*, *A.J.*; Computer Design: VLSI in Computers and Processors, 1997. ICCD '97. Proceedings. 1997 IEEE International Conference on, 12-15 Oct. 1997 Pages:178 – 185

[5] An IEEE compliant floating-point adder that conforms with the pipeline packet-forwarding paradigm. *Nielsen, A.M.; Matula, D.W.; Lyu, C.N.; Even, G.*, Computers, IEEE Transactions on, Volume: 49, Issue: 1, Jan. 2000 Pages:33 - 47

[6] Design and implementation of the snap floating-point adder. *N. Quach and M. Flynn.* Technical Report CSL-TR-91-501, Stanford University, Dec. 1991.

[7] On the design of fast IEEE floating-point adders. *Seidel, P.-M.; Even, G.* Computer Arithmetic, 2001. Proceedings. 15th IEEE Symposium on , 11-13 June 2001 Pages:184 – 194

[8] Low cost floating point arithmetic unit design. *Seungchul Kim; Yongjoo Lee; Wookyeong Jeong; Yongsurk Lee;* ASIC, 2002. Proceedings. 2002 IEEE Asia-Pacific Conference on, 6-8 Aug. 2002 Pages:217 - 220

[9] Rounding in Floating-Point Addition using a Compound Adder. J.D. Bruguera and T. Lang. Technical Report. University of Santiago de Compostela. (2000)

[10] Floating point adder/subtractor performing ieee rounding and addition/subtraction in parallel. W.-C. Park, S.-W. Lee, O.-Y. Kown, T.-D. Han, and S.-D. Kim. IEICE Transactions on Information and Systems, E79-D(4):297–305, Apr. 1996.

[11] Efficient simultaneous rounding method removing sticky-bit from critical path for floating point addition. Woo-Chan Park; Tack-Don Han; Shin-Dug Kim; ASICs, 2000. AP-ASIC 2000. Proceedings of the Second IEEE Asia Pacific Conference on , 28-30 Aug. 2000 Pages: 223 – 226
 [12] Efficient implementation of rounding units Burgess. N.; Knowles, S.; Signals, Systems, and Computers, 1999. Conference Record of the Thirty-Third Asilomar Conference on, Volume: 2, 24-27 Oct. 1999 Pages: 1489 - 1493 vol.2

[13] The Flagged Prefix Adder and its Applications in Integer Arithmetic. Neil Burgess. Journal of VLSI Signal Processing 31, 263–271, 2002

[14] A family of adders. Knowles, S.; Computer Arithmetic, 2001. Proceedings. 15th IEEE Symposium on , 11-13 June 2001 Pages: 277 – 281

[15] PAPA - packed arithmetic on a prefix adder for multimedia applications. *Burgess, N.;* Application-Specific Systems, Architectures and Processors, 2002. Proceedings. The IEEE International Conference on, 17-19 July 2002 Pages:197 – 207

[16] Nonheuristic optimization and synthesis of parallelprefix adders. *R. Zimmermann,* in Proc. Int.Workshop on Logic and Architecture Synthesis, Grenoble, France, Dec. 1996, pp. 123–132.

[17] Leading-One Prediction with Concurrent Position Correction. J.D. Bruguera and T. Lang. IEEE Transactions on Computers. Vol. 48. No. 10. pp. 1083-1097. (1999)

[18] Leading-zero anticipatory logic for high-speed floating point addition. *Suzuki, H.; Morinaka, H.; Makino, H.; Nakase, Y.; Mashiko, K.; Sumi, T.;* Solid-State Circuits, IEEE Journal of , Volume: 31, Issue: 8, Aug. 1996 Pages:1157 – 1164

[19] An algorithmic and novel design of a leading zero detector circuit: comparison with logic synthesis. *Oklobdzija*, *V.G.*; Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, Volume: 2, Issue: 1, March 1994 Pages: 124 – 128

[20] Design and Comparison of Standard Adder Schemes. Haru Yamamoto, Shane Erickson, CS252A, Winter 2004, UCLA

### Comparisons

| Adder            | Number of | Delay  | Area   | <b>Power Consumption</b> |  |  |
|------------------|-----------|--------|--------|--------------------------|--|--|
|                  | CLBs      | (ns)   |        | (VV)                     |  |  |
| Ripple-Carry     | 16        | 212.79 | 40.00  | 1.7318                   |  |  |
| Carry Look-Ahead | 34        | 143.69 | 51.00  | 1.9668                   |  |  |
| Carry-Select     | 44        | 102.74 | 108.00 | 3.3595                   |  |  |

#### Which one should we choose?

#### **Comparison of 64 bit Adders Using FPGA**

For this comparison Synopsys tools were used to perform logic synthesis.

 The implemented VHDL codes for all the 64-bit adders are translated into net list files.
 The virtex2 series library XC2V250-4 avg is used

The virtex2 series library, XC2V250-4\_avg, is used in those 64-bit adders synthesis and targeting
After synthesizing, the related power consumption, area, and propagation delay are reported.

| Primitive Component                      | Delay (ns) | Area  | Power (W) | AT       | AT <sup>2</sup> | PD                 |
|------------------------------------------|------------|-------|-----------|----------|-----------------|--------------------|
| - 4-bit carry ripple adder               | 72.1       | 160   | 0.8745784 | 11536    | 831745.6        | 63.058             |
| 8-bit carry ripple adder                 | 72.1       | 160   | 0.8745784 | 11536    | 831745.6        | 63.058             |
| 16-bit carry ripple adder                | 72.1       | 160   | 0.8745784 | 11536    | 831745.6        | 63.058             |
| 4-bit carry look-ahead adder             | 93.54      | 288   | 1.049     | 26939.52 | 2519922         | 98.12346           |
| 8-bit carry look-ahead adder             | 118.9      | 302   | 1.1627    | 35907.8  | 4269437         | 138.25             |
| 16-bit carry look-ahead adder            | 124.3      | 310   | 1.1757    | 38533    | 4789651         | 146.14             |
| wo-level 8-bit carry look-ahead<br>adder | 31.57      | 434   | 1.348     | 13701.38 | 432552          | 42.56              |
| 4-bit carry select adder                 | 24.72      | 422.5 | 1.6351    | 10444.2  | 258180          | 40.42              |
| 8-bit carry select adder                 | 20.48      | 394.5 | 1.5757    | 8079.36  | 165465          | 32.27              |
| 16-bit carry select adder                | 26         | 356.5 | 1.4792    | 9269     | 240994          | 38.4592            |
| Nonlinear Carry select adder             | 17.94      | 412   | 1.6267    | 7391.28  | 132599          | 29.183             |
| 4-bit Manchester adder                   | 27.58      | 256   | 1.0857    | 7060.48  | 194728          | 29.9436            |
| 8-bit Manchester adder                   | 27.58      | 256   | 1.0857    | 7060.48  | 194728          | 29.9436            |
| 16-bit Manchester adder                  | 27.58      | 256   | 1.0857    | 7060.48  | 194728          | 29.9436            |
| 16-bit Ladner-Fischer prefix<br>adder    | 24.79      | 326   | 1.23      | 8081.54  | 200341          | 30.4917            |
| 16-bit Brent-Kung prefix adder           | 26.94      | 290   | 1.15      | 7812.6   | 210471          | 30.981             |
| l6-bit Han-Carlson prefix adder          | 25.43      | 326   | 1.2758    | 8290.18  | 210819          | 32.4436            |
| l6-bit Kogge-Stone prefix adder          | 25.59      | 428   | 1.5546    | 10952.52 | 280274          | 39.78              |
| 64-bit Kogge-Stone adder                 | 11.97      | 611   | 1.919     | 7313.67  | 87544           | 22.97<br><b>38</b> |

#### Synthesis result parameter comparison listings:

### Compound Adder Design<sup>[2,13-16,20]</sup>

The Prefix Adder Scheme is chosen.

#### **Advantages:**

Simple and regular structure Well-performance A wide range of area-delay trade-offs

Moreover, the Flagged Prefix Adder is particular useful in compound adder implementation because, unlike other adder schemes which need a pair of adders to obtain sum and sum+1 simultaneously, it only use one adder.



### synthesis and targeting

- Synopsys tools are used to perform logic synthesis.
   the implemented VHDL codes for all the 64-bit adders are translated into net list files.
   The virtex2 series library XC2V250-4, avg. is used in
- The virtex2 series library, XC2V250-4\_avg, is used in those 64-bit adders synthesis and targeting because the area and the propagation delay is suitable for these adders.
- After synthesizing, the related power consumption, area, and propagation delay are reported.
- From the synthesis, the related FPGA layout schematic is reported.









### 64-bit adders conclusion

- Adders can be implemented in different methods according to the different requirements.
- Each kind of adder has different properties in area, propagation delay, and power consumption.
- There is no absolute advantages or disadvantages for an adder, and usually, one advantage compensates with another disadvantage.
- A ripple carry adder is easy to implemented, and for short bit length, the performances are good.
- For long bit length, a carry look-ahead adder is not practical, but a hierarchical structure one can improve much.



# Adders Using Tables (FPGAs)



# **Ripple Carry's VHDL**

```
library IEEE;
use ieee.std logic 1164.all;
entity ripple carry is
        port(A, B : in std logic vector(15 downto 0);
              C_in : in std_logic;
S : out std_logic_vector(15 downto 0);
              C out : out std logic);
end ripple carry;
architecture RTL of ripple carry is
begin
process(A, B, C in)
        variable tempC : std logic vector( 16 downto 0 );
                           : std logic vector( 15 downto 0 );
        variable P
        variable G
                           : std logic vector( 15 downto 0 );
        begin
```

# **Ripple Carry's VHDL**

tempC(0) := C in;for i in 0 to 15 loop P(i) := A(i) xor B(i);G(i) := A(i) and B(i); $S(i) \leq P(i) \text{ xor tempC}(i);$ tempC(i+1) := G(i) or (tempC(i) and P(i));end loop; C out  $\leq tempC(16)$ ; end process; 8 end; Ċо

# Carry Select's VHDL (ripple4)



```
library IEEE;
use ieee.std_logic_1164.all;
entity ripple_carry4 is
    port( e, f : in std_logic_vector( 3 downto 0);
        carry_in : in std_logic;
        S : out std_logic_vector( 3 downto 0);
        carry_out : out std_logic);
end ripple_carry4;
```

# Carry Select's VHDL (ripple4)

architecture RTL of ripple carry4 is

begin

process(e, f, carry in)

| variable tempC | : std_logic_vector( 4 downto 0 ); |
|----------------|-----------------------------------|
| variable P     | : std_logic_vector( 3 downto 0 ); |
| variable G     | : std_logic_vector( 3 downto 0 ); |

begin

tempC(0) := carry in;

for i in 0 to 3 loop P(i) := e(i) xor f(i);

G(i) := e(i) and f(i);

 $S(i) \le P(i)$  xor tempC(i);

tempC(i+1):=G(i) or (tempC(i) and P(i));

end loop;

carry out <= tempC(4);</pre>

end process;

end;

# Carry Select's VHDL (select4)

#### carry\_select4

library IEEE; use ieee.std logic 1164.all;

entity carry\_select4 is
 port( c, d : in std\_logic\_vector( 3 downto 0);
 C\_input : in std\_logic;
 Result : out std\_logic vector( 3 downto);
 }
}

Result : out std\_logic\_vector( 3 downto 0); C\_output : out std\_logic);

end carry select4;

architecture RTL of carry\_select4 is

component ripple\_carry4

port( e, f : in std\_logic\_vector( 3 downto 0); carry\_in : in std\_logic; S : out std\_logic\_vector( 3 downto 0); carry\_out : out std\_logic);

end component;

# Carry Select's VHDL (select4)

For S0: ripple\_carry4 Use entity work.ripple\_carry4(RTL);
For S1: ripple\_carry4 Use entity work.ripple\_carry4(RTL);

signal SUM0, SUM1 : std\_logic\_vector(3 downto 0); signal carry0, carry1 : std\_logic; signal zero, one : std logic;

begin

zero<='0';
one<='1';</pre>

```
S0: ripple_carry4 port map( e=>c, f=>d, carry_in=>zero, S=>SUM0,
carry_out=>carry0 );
S1: ripple_carry4 port map( e=>c, f=>d, carry_in=>one, S=>SUM1,
carry out=>carry1 );
```

```
Result<=SUM0 when C_input='0' else
    SUM1 when C_input='1' else
    "ZZZZ";</pre>
```

C\_output<= (C\_input and carry1) or carry0;

# Carry Select's VHDL (select16)

#### carry\_select16

library IEEE; use ieee.std logic 1164.all;

entity carry\_select16 is port(A, B : in std\_logic\_vector(15 downto 0); C\_in : in std\_logic; SUM : out std\_logic\_vector(15 downto 0); C out : out std\_logic);

end carry select16;

architecture RTL of carry\_select16 is

```
component carry select4
```

port( c, d : in std\_logic\_vector( 3 downto 0); C\_input : in std\_logic; Result : out std\_logic\_vector( 3 downto 0); C\_output : out std\_logic);

end component;

# Carry Select's VHDL (select16)

For S0: carry\_select4 Use entity work.carry\_select4(RTL);
For S1: carry\_select4 Use entity work.carry\_select4(RTL);
For S2: carry\_select4 Use entity work.carry\_select4(RTL);
For S3: carry\_select4 Use entity work.carry\_select4(RTL);

signal tempc1, tempc2, tempc3 : std logic;

begin

S0: carry\_select4 port map( c=>A ( 3 downto 0 ), d =>B ( 3 downto 0 ), C\_input=>C\_in, Result=>SUM ( 3 downto 0 ), C\_output=>tempc1 ); S1: carry\_select4 port map( c=>A ( 7 downto 4 ), d =>B ( 7 downto 4 ), C\_input=>tempc1, Result=>SUM ( 7 downto 4 ), C\_output=>tempc2 ); S2: carry\_select4 port map( c=>A ( 11 downto 8 ), d =>B ( 11 downto 8 ), C\_input=>tempc2, Result=>SUM ( 11 downto 8 ), C\_output=>tempc3 ); S3: carry\_select4 port map( c=>A ( 15 downto 12 ), d =>B ( 15 downto 12 ), C\_input=>tempc3, Result=>SUM ( 15 downto 12 ), C\_output=>C\_out );

end;

```
half adder
library IEEE;
use ieee.std logic 1164.all;
entity half adder is
         port(A, B : in std logic vector(16 downto 1);
               P, G : out std logic vector (16 downto 1));
end half adder;
architecture RTL of half adder is
begin
P \ll A \text{ xor } B;
G \leq A and B;
```

#### carry\_generator

library IEEE; use ieee.std\_logic\_1164.all; entity carry\_generator is port( P, G : in std\_logic\_vector(16 downto 1); C1 : in std\_logic; C : out std\_logic\_vector(17 downto 1)); end carry\_generator; architecture RTL of carry\_generator is begin process(P, G, C1) variable tempC : std\_logic\_vector(17 downto 1);

begin

```
tempC(1) := C1;
```

```
for i in 1 to 16 loop
```

```
tempC(i+1) := G(i) \text{ or } (P(i) \text{ and } tempC(i));
```

end loop;

```
C <= tempC;
```

```
end process;
```

end;

#### Look\_Ahead\_Adder

library IEEE; use ieee.std logic 1164.all;

entity Look\_Ahead\_Adder is

```
port( A, B : in std_logic_vector( 16 downto 1 );
carry_in : in std_logic;
carry_out : out std_logic;
S : out std_logic vector( 16 downto 1 ) );
```

end Look Ahead Adder;

architecture RTL of Look Ahead Adder is

component carry\_generator

port( P, G : in std\_logic\_vector(16 downto 1); C1 : in std\_logic; C : out std\_logic\_vector(17 downto 1));

end component;

component half adder

end component;

For CG: carry\_generator Use entity work.carry\_generator(RTL);
For HA: half adder Use entity work.half adder(RTL);

```
signal tempG, tempP : std_logic_vector( 16 downto 1 );
signal tempC : std logic vector( 17 downto 1 );
```

#### begin

```
HA: half_adder port map( A=>A, B=>B, P =>tempP, G=>tempG );
CG: carry_generator port map( P=>tempP, G=>tempG, C1=>carry_in, C=>tempC );
S <= tempC( 16 downto 1 ) xor tempP;
carry_out <= tempC(17);</pre>
```

#### **Ripple carry adder**

### Block diagram:



Critical path:



60

## Carry look-ahead adder

 $Pi = Ai \oplus Bi$ Carry propagate $Gi = Ai \cdot Bi$ Carry generate $Si = Pi \oplus Ci$ SummationCi+1 = Gi + PiCiCarryout

Ci+1= Gi + PiGi-1 + PiPi-1Gi-2 + ...PiPi-1....P2P1G0 + PiPi-....P1P0C0.

### **Carry look-ahead adder**

Block diagram





When n increases, it is not practical to use standard carry look-ahead adder since the fan-out of carry calculation becomes very large.

A hierarchical carry look-ahead adder structure could be implemented.

#### Hierarchical 2- level 8-bit carry look-ahead adder



#### **Carry select adder**

compute alternative results in parallel and subsequently select the carry input which is calculated from the previous stage.



compensate with an extra circuit to calculate the alternative carry input and summation result.

need multiplexer to select the carry input for the next stage and the summation result.



the drawback is that the area increases.



The summation part could be implemented by ripple carry adder, Manchester adder, carry look-ahead adder as well as prefix adder.....

### Carry select adder block diagram



#### **Carry select adder**

- For an n bit adder, it could be implemented with equal length of carry select adder, and this is called linear carry select adder.

However. the linear carry select adder does not always have the best performance.

- A carry select adder can be implemented in different length, and this is called nonlinear carry select adder.
- A 64-bit adder can be implemented in 4, 4, 5, 6, 7, 8, 9, 10,11 bit nonlinear structure.
- The performance of 64-bit nonlinear carry select adder is better than linear one in propagation delay.

#### 64-bit nonlinear carry select adder

#### Block diagram



#### **Manchester carry adder**

- A Manchester adder could be constructed in dynamic stage, static stage, and multiplexer stage structure.
- A Manchester adder, based on multiplexer, is called a conflict free Manchester Adder.

Block diagram:



#### 64-bit adders implemented in Manchester carry adder



69

### **Parallel prefix adder**

- Ike a carry look-ahead adder, the prefix adder accelerates addition by the parallel prefix carry tree.
- the production of the carries in the prefix adder can be designed in many different ways based on the different requirements.
- the main disadvantage of prefix adder is the large fan-out of some cells as well as the long interconnection wires.
- the large fan-out can be eliminated by increasing the number of levels or cells; as a result, there are different structure.
- the long inter-connections produce an increase in delay which can be reduced by including buffers.

### **Ladner-Fischer parallel prefix adder** Carry stages: $\log 2^n$ The number of cells: (n/2) \* $\log 2^n$

Maximum fan-out: n/2. Block diagram(16 bits):



#### **Kogge-Stone parallel prefix adder**

Carry stages:  $\log 2^n$ The number of cells: n ( $\log 2^n - 1$ ) +1. Maximum fan-out: 2 Block diagram(64 bits):



### Brent-kung parallel prefix adder Carry stages: $2 \log 2^{n}-1$ ; The number of cells: $2(n-1) - \log 2^{n}$ ; Maximum fan-out: 2 Block diagram(16 bits):



## Han-Carlson parallel prefix adder It is a hybrid structure combining from the Brent-Kung and Kogge-Stone prefix adder. Carry stages: $\log 2^{n} + 1$ . Maximum fan-out: 2.



### 64-bit adders implementations and simulations

- 18 kinds of adders are implemented, including ripple carry adders, carry look-ahead adders, carry select adders, Manchester carry adders, and parallel prefix adders.
- Each 64 bits adder might be consisted of 4 bits, 8 bits, and 16 bits adder component as well as different prefix adder component.
- Hierarchical carry look-ahead adder and nonlinear carry select adder are also implemented.
- A test bench is written to test the simulation result.
   In the test bench, each bit of the 64-bit adder should be verified in carry propagation and summation. 75

### **Test bench simulation result**

carry ripple adder, carry look-head adder, hierarchical carry look-ahead adder.

| Name           | Value                                   | Stimulator | 100 ns |
|----------------|-----------------------------------------|------------|--------|
| + ла           | <u> </u>                                |            |        |
| <b>- ու</b> Ե  | 555555555555555555555555555555555555555 |            |        |
| <b>™</b> cout  | 0                                       |            |        |
| + # sum        | FFFFFFFFFFFFFFFFFFF                     |            |        |
|                |                                         |            |        |
| Name           | Value                                   | Stimulator | 100 ns |
| + # a          | АААААААААААААА                          |            |        |
| + <b>ու</b> Ե  | 555555555555555555555555555555555555555 |            |        |
| .rr cout       | 0                                       |            |        |
| + nr sum       | FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF |            |        |
|                |                                         |            |        |
| Name           | Value                                   | Stimulator | 100 ns |
| + 👖 a          | ААААААААААААА                           |            |        |
| <u>+ ու</u> β  | 555555555555555555555555555555555555555 |            |        |
| ™ cout         | 0                                       |            |        |
| + <b>1</b> sum | тиничничнични                           |            |        |

### **Test bench simulation result- continued**

carry select adder, nonlinear carry select adder, Manchester carry adder.

| Name            | Value                                   | Stimulator | 100 ns                                   |
|-----------------|-----------------------------------------|------------|------------------------------------------|
| <u>+ лг</u> а   | <u> </u>                                |            |                                          |
| <u>+ տ</u> Ե    | 555555555555555555555555555555555555555 |            |                                          |
| <b>π</b> cout   | 0                                       |            |                                          |
| + 🗴 sum         | FFFFFFFFFFFFFFFFFF                      |            |                                          |
|                 |                                         |            |                                          |
| Name            | Value                                   | Stimulator | 1. 20. 1. 40. 1. 60. 1. 80. 1. 10 100 ns |
| + лга           | АААААААААААААА                          |            |                                          |
| + <u>տ</u> է    | 555555555555555555555555555555555555555 |            |                                          |
| <b>™</b> cout   | 0                                       |            |                                          |
| + <b>11</b> sum | тттттттттттт                            |            |                                          |
|                 | ·····                                   | •          |                                          |
| Name            | Value                                   | Stimulator | 100 ns                                   |
| + <u>11</u> a   | ААААААААААААА                           |            |                                          |
| + ու ի          | 555555555555555555555555555555555555555 |            |                                          |
| ™ cout          | 0                                       |            |                                          |
| + 💵 sum         | FFFFFFFFFFFFFFFFF                       |            |                                          |

### **Test bench simulation result- continued**

Ladner-Fischer, Brent-Kung, Han-Carlson. Kogge-Stone prefix adders

| Name               | Value                                   | Stimulator | ······································                |
|--------------------|-----------------------------------------|------------|-------------------------------------------------------|
| + <b>n</b> a       | АААААААААААААА                          |            |                                                       |
| <b>∔ ու</b> Ե      | 555555555555555555555555555555555555555 |            |                                                       |
| <b>π</b> cout      | 0                                       |            |                                                       |
| 🕂 🗷 ջար            | FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF |            |                                                       |
|                    |                                         |            |                                                       |
| Name               | Value                                   | Stimulator | ···· 20 · · · 40 · · · · 60 · · · 30 · · · .10 100 ns |
| + лга              | ААААААААААААА                           |            |                                                       |
| <del>+</del> ու Ե  | 555555555555555555555555555555555555555 |            |                                                       |
| nr cout            | 0                                       |            |                                                       |
| 🕂 🎜 sum            | FFFFFFFFFFFFFFFFFF                      |            |                                                       |
| Name               | Value                                   | Stimulator | 100 ns                                                |
| + <mark>л</mark> а | ААААААААААААА                           |            |                                                       |
| <del>+</del> ու Ե  | 555555555555555555555555555555555555555 |            |                                                       |
| <b>π</b> cout      | 0                                       |            |                                                       |
| 🕂 🎜 รแท            | FFFFFFFFFFFFFFFFFFFFFFFF                |            |                                                       |
| Name               | Value                                   | Stimulator | 100 ns                                                |
| + лга              | АААААААААААААА                          |            |                                                       |
| + ու ի             | 555555555555555555555555555555555555555 |            |                                                       |
| л cout             | 0                                       |            |                                                       |
| + <b>11</b> sum    | FFFFFFFFFFFFFFFFFFF                     |            |                                                       |