

#### IMPLEMENTATION OF MCC ADDER WITH LOW POWER DISSIPATION BASED ON SPST

Vidhya.M<sup>1\*</sup>, Kamalakannan.R.S<sup>2</sup> <sup>1\*2</sup>Shree Venkateswara Hi-Tech Engineering College, Chennai-600127, India Correspondence Author: <u>vidhyaanusri@gmail.com</u>

Keywords: Digital-signal processing chips, image coding, low-power design, video coding.

#### Abstract

This paper presents the implementation of an 8-bit Manchester carry chain (MCC) adder and design exploration and applications of a low-power suppression technique (SPST) which can dramatically reduce the power dissipation of combinational VLSI designs for multimedia/DSP purposes. In this brief, an efficient implementation of an 8-bit Manchester carry chain (MCC) adder in multi output domino CMOS logic is proposed. The carries of this adder are computed in parallel by two independent 4-bit carry chains. Due to its limited carry chain length, the use of the proposed 8-bit adder module for the implementation of wider adders leads to significant operating speed improvement compared to the corresponding adders based on the standard 4-bit MCC adder module. The proposed SPST separates the target designs into two parts, i.e., the most significant part and least significant part (MSP and LSP), and turns off the MSP when it does not affect the computation results to save power. The carries of this adder are computed in parallel by two independent 4-bit carry chains. Due to its limited carry chain adders are computed in parallel by two independent 4-bit carry chains. Due to its limited carry chain length, the use of the proposed 8-bit adder module for the standard 4-bit MCC adder module. The proposed SPST separates the target designs into two parts, i.e., the most significant part and least significant part (MSP and LSP), and turns off the MSP when it does not affect the computation results to save power. The carries of this adder are computed in parallel by two independent 4-bit carry chains. Due to its limited carry chain length, the use of the proposed 8-bit adder module for the implementation of wider adders leads to significant operating speed improvement compared to the corresponding adders based on the standard 4-bit MCC adder module. These two design examples have quite different hardware configurations, thus, the realization issues of the SPST on every design also remarkably differ from each other.

#### Introduction

One of the accompanying challenges in designing integrated circuits for portable electrical devices is lowering down the power consumption to prolong the operating time on the basis of given limited energy supply from batteries. Integrated circuits were made possible by experimental discoveries which showed that semiconductor devices could perform the functions of vacuum tubes, and by mid-20th-century technology advancements in semiconductor device fabrication. The integration of large numbers of tiny transistors into a small chip was an enormous improvement over the manual assembly of circuits using discrete electronic components. The integrated circuits mass production capability, reliability, and building-block approach to circuit design ensured the rapid adoption of standardized ICs in place of designs using discrete transistors. There are two main advantages of ICs over discrete circuits - cost and performance. Cost is low because the chips, with all their components, are printed as a unit by photolithography and not constructed a transistor at a time. Performance is high since the components switch quickly and consume little power, because the components are small and close together. As of 2006, chip areas range from a few square mm to around 250 mm<sup>2</sup>, with up to 1 million transistors per mm<sup>2</sup>.

Among the most advanced integrated circuits are the microprocessors, which control everything from computers to cellular phones to digital microwave ovens Digital memory chips are another family of integrated circuit that is crucially important to the modern information society. While the cost of designing and developing a complex integrated circuit is quite high, when spread across typically millions of production units the individual IC cost is minimized. The performance of ICs is high because the small size allows short traces, which in turn allows low power logic (such as CMOS) to be used at fast switching speeds. ICs have consistently migrated to smaller feature sizes over the years, allowing more circuitry to be packed on each chip. As the feature size shrinks, almost everything improves - the cost per unit and the switching power consumption go down, and the speed goes up. This process, and the expected progress over the next few years, is well described by the International Technology Roadmap for Semiconductors, or ITRS. In the proposed research An Area Efficient Static Cmos Carry-Select Adder Based on a Compact Carry Look-Ahead Unit explains the way-halting cache



Figure.1.1 Data analysis of the input data in the transform design



It is a four-way set-associative cache that stores the four lowest-order bits of all ways' tags into a fully associative memory, which we call the halt tag array. Halt tag array has an additional feature of using static logic only, rather than dynamic logic used in highly associative caches. 55% savings of memory-access related energy were obtained over a conventional four-way set-associative cache. We show that energy savings are greater than previous methods, while imposing no performance overhead and only 2% cache area overhead [1].

In the document High Speed Parallel-Prefix Vlsi Ling Adders, the High speed parallel-prefix vlsi adders reduces leakage power by implementing the minimum set-associative scheme. These two techniques are both developed based on the principle of locality and they work together nicely - experimental results show that the minimum set-associative scheme can cut static power consumption of the L1 data cache by 90% on average, while the execution times are reduced by 3% when the default 8-entry load/store queue is modified to the base-offset design. Furthermore, the SSA cache can trim the leakage power of L2 data cache by 96% on average while still accomplishing a 3% reduction in execution times [2].

In the paper Area Efficient High-Speed Carry Chain this paper proposes a method of saving energy by reducing the number of data cache accesses. It does so by modifying the Load/Store Queue design to allow "caching" of previously accessed data values on both loads and stores. The reduction in the number of L1 cache accesses results in up to a 40% reduction in the L1 data cache energy consumption [3].

In the research of Enhanced 32-Bit Carry Look-Ahead Adder Using Multiple Output Enable-Disable Cmos Differential Logic the technique is for eliminating redundant cache-tag and cache-way accesses to reduce power consumption. The basic idea is to keep a small number of Most Recently Used (MRU) addresses in a Memory Address Buffer (MAB) and to omit redundant tag and way accesses when there is a MAB-hit. Experiments for 32kB 2-way set associative caches show the power consumption of I-cache and D-cache can be reduced by 40% and 50%, respectively [4].

In the research process of Cmos Vlsi Design, A Circuit And System Prespective, explains on a set-associative data cache consumes a significant fraction of the total power budget in such embedded processors. This paper describes a technique for reducing the D-cache power consumption and shows its impact on power and performance of an embedded processor. The proposed mechanism is shown to reduce the average L1 data cache power consumption when running the MiBench embedded benchmark suite for 8, 16 and 32-way set-associate caches by, respectively, an average of 66%, 72% and 76%. It thus allows only the desired way to be accessed for both tags and data [5].

High-speed adder architectures include the carry look-ahead (CLA) adders, carry-skip adders, carry-select adders, conditional sum adders, and combinations of these structures. The Manchester carry chain (MCC) is the most common dynamic (domino) CLA adder architecture with a regular, fast, and simple structure adequate for implementation in VLSI [6], [7]. The recursive properties of the carries in MCC have enabled the development of multi output domino gates, which have shown area–speed improvements with respect to single-output gates. Implementation of wider adders based on the use of the proposed 8-bit adder module shows significant operating speed improvement compared to their corresponding adders based on the standard 4-bit MCC adder module.

However, advanced multimedia/DSP applications such as H.264 CODECs induce much more algorithmic complexity [8],[9], which increases the power consumption in real-time operation besides the cost in implementation. Therefore, dedicated low-power techniques are undoubtedly important for multimedia/DSP VLSI implementation. Various techniques have been developed for reducing the power consumption of VLSI designs, including voltage scaling, As the demand for higher performance processors grows, there is a continuing need to improve the performance of arithmetic units and to increase their functionality.

From those analyses and the implementing experience, we discover the damage of the glitches to the combinational circuits and further propose a glitch-diminishing technique, a novel idea of this paper, to visibly filter out useless switching power by asserting the data signals after the data transient period. This action is realized easily by controlling the three-bit output of the detection-logic unit with extremely tiny cost. A similar design concept has been found in [10].

#### **Designs strategies**

IC design productivity depends on the efficiency with which the design may be converted from concept to architecture, to logic and memory, to circuit and hence to a physical layout.

A good design strategy with a good design system should provide for consistent descriptions in various abstraction levels. The role of good design strategies is to reduce complexity, increase productivity, and assure working product.

Design is a continuous trade-off to achieve adequate results for:

- 1. Performance speed, power, function, flexibility
- 2. Size of die (hence cost of die)
- 3. Time to design
- 4. Ease of test generation and testability

ISSN: 2349- 5197



Figure 2.1 Diagram for design flow

This is the stage at which we define the important parameters of the system/design that you are planning to design. A simple example would be: I want to design a counter; it should be 4 bit wide, should have synchronous reset, with active high enable; when reset is active, counter output should go to "0". This is the stage at which you define various blocks in the design and how they communicate. Let's assume that we need to design a microprocessor: high level design means splitting the design into blocks based on their function; in our case the blocks are registers, ALU, Instruction Decode, Memory Interface, etc.

Low level design or Micro design is the phase in which the designer describes how each block is implemented. It contains details of State machines, counters, Mux, decoders, internal registers. It is always a good idea to draw waveforms at various interfaces. This is the phase where one spends lot of time.

#### **Excisting method**

In the following, the symbols  $\cdot$ , +,  $\Theta$ , and – are used to denote the AND, INCLUSIVE OR, EXCLUSIVE OR, and NOT logical operations, respectively. In binary addition, the computation of the carry signals is based on the following recursive formula:

----- (1)

 $ci = gi + zi \cdot ci - 1$ (c)

Fig. 3.1 Domino implementation for the (a) generate, (b) XOR propagate & (c) OR propagate signals

Where  $gi = ai \cdot bi$  and zi are the carry generate and the carry propagate terms, respectively. The latter, for the case of INCLUSIVE OR adders, is defined as zi = ti = ai + bi, while for the case of EXCLUSIVE OR adders, it is defined as  $zi = pi = ai \oplus bi$ . In Fig. 1, the implementation is to generate and the two types of propagate signals in domino CMOS logic is shown. The sum bits of the adder are defined as  $si = pi \bigoplus ci-1$ , where c-1 is the input carry.

## International Journal of Research science & management

#### Conventional domino 4-bit mcc

The MCC generates all the carries computed according to relation in parallel, using an iterative shared transistor structure. In practice, the CLA length is limited to four in order to cut down the number of series-connected transistors. Fig. 3.2.1 shows the conventional implementation of the 4-bit carry chain using multi output domino CMOS logic.



Figure. 3.2.1 Conventional Domino MCC 4-Bit

MCC adders are EXCLUSIVE OR adders, i.e., the carry propagate signal is defined as  $zi = pi = ai \bigoplus bi$ , to avoid false discharges produced at the output nodes of the carry chain due to higher OR–AND forms of multi output gates.



Figure. 3.2.2 Static CMOS implementation of the XOR gate for the sum computation

For the implementation of the sum signals, the domino chain is terminated, and the sum bits of the MCC adder are implemented using static CMOS XOR gates, the design of which is shown in Fig. 3.3 Several variations of the MCC adder in domino CMOS logic have been proposed in the literature. Moreover, static CMOS MCC implementations are also given. Among them, a high-speed design has been proposed, where the MCC is supported by the carry-skip capability to improve performance.

#### Implementation of odd and even carry chain (8-bit)

MCC adders can efficiently be designed in CMOS logic. As mentioned previously, due to technological constraints, the length of their carry chains is limited to 4 bits. However, these 4-bit adder blocks are used extensively in the literature in the design of wider adders.



Figure. 3.3.1 8 bit carries implementation for (a) even carry chain and (b) odd carry chain

http:// www.ijrsm.com



In the following, we propose the design of an 8-bit adder module which is composed of two independent carry chains. These chains have the same length (measured as the maximum number of series-connected transistors) as the 4-bit MCC adders. According to our simulation results, the use of the proposed 8-bit adder as the basic block, instead of the 4-bit MCC adder, can lead to high-speed adder implementations.

In the following, the design of the proposed 8-bit MCC adder is analytically presented.

#### **Proposed system**

The design exploration and applications of a spurious-power suppression technique (SPST) which can dramatically reduce the power dissipation of combinational VLSI designs for multimedia/DSP purposes. The proposed SPST separates the target designs into two parts, i.e., the most significant part and least significant part (MSP and LSP), and turns off the MSP when it does not affect the computation results to save power. This paper adopts two multimedia/DSP design examples, i.e., a multi transform design for H.264 and a versatile multimedia functional unit (VMFU), to evaluate the proposed SPST. The multi transform design can compute three transforms which are required in H.264 encoding while the VMFU possesses six commonly used multimedia/DSP functions, namely, addition, subtraction, multiplication, MAC, interpolation, and sum-of-absolute-difference. After optimizing the design elaborately, we find that the proposed SPST can, respectively, save 27% and 24% power dissipation on average of the H.264 multi transform design and the VMFU at the expense of less than 20% area augmentation.

#### **Spurious-power suppression techinque**

One of the accompanying challenges in designing ICs for portable electrical devices is lowering down the power consumption to prolong the operating time on the basis of given limited energy supply from batteries. Owing to the vigorous development of the wireless infrastructure and the personal electronic devices like video mobile phones, mobile TV sets, PDAs, etc., multimedia and DSP applications have been adopted in wireless environments. However, advanced multimedia/DSP applications such as H.264 CODECs induce much more algorithmic complexity, which increases the power consumption in real-time operation besides the cost in implementation.

Various techniques have been developed for reducing the power consumption of VLSI designs, including voltage scaling, switched- apacitance reduction, clock gating, power-down techniques, threshold-voltage controlling, multiple supply voltages, and dynamic voltage frequency scaling. Among these low-power techniques, a promising direction for significantly reducing power consumption is reducing the dynamic power which dominates total power dissipation.

#### Analysis of spst at low power

To illustrate the reason of those spurious signal transitions shown in Fig. 4.2.1, we explore five cases of 16-bit additions as shown in Fig. 4.3.1 The cases of exchanging the operands A and B in additions lead to the same spurious transitions with those shown in Fig. 4.3.1 Hence, there is probably no other case beyond these five based on this design. The first case illustrates a transient state in which spurious transitions of carry signals occur in the MSP, although the final result of the MSP is unchanged. Meanwhile, the second and third cases describe situations involving one negative operand adding another positive operand without and with carry-in from the LSP, respectively. Moreover, the fourth and fifth cases demonstrate the addition of two negative operands without and with carry-in from the LSP, respectively.



In those cases, the results of MSP are predictable; therefore, the computations in MSP are useless and can be neglected. Eliminating those spurious computations not only can save the power consumption inside the adder/subtractor in the current stage but also can decrease the glitching noises which cause power wastage inside the arithmetic circuits in the next stage. From the analysis of Fig. 4.3.1, we are motivated to propose the SPST that separates the adder/subtractor into two parts and then latches the input data of the MSP whenever they do not affect the computation results. The SPST can be expanded to be a fine-grain scheme



in which the adder/subtractor is divided into more than two parts. This is the reason we propose a bi-partitioned SPST scheme in this paper.



Figure 4.3.2 Adopting spurious technique for Low power

#### Software for simulation

#### Tanner eda

S-Edit<sup>TM</sup> is an easy-to-use PC-based design environment for schematic capture. It gives you the power you need to handle your most complex full custom IC design capture. S-Edit is tightly integrated with Tanner EDA's T-Spice<sup>TM</sup> simulation, L-Edit<sup>TM</sup> layout, and HiPer<sup>TM</sup> verification tools.

S-Edit helps you meet the demands of today's fast-paced market by optimizing your productivity and speeding your concepts to silicon. Its efficient design capture process integrates easily with third-party tools. S-Edit enables you to explore design choices and provides an easy-to-use view into the consequences of those choices. A faster design cycle gives you more flexibility in moving to an optimal solution, freeing up more time and resources for process corner validation. The results are less risk downstream, higher yield, and quicker time to market.

#### Simulation of 8-bit mcc adder



Figure 5.1 Simulation for 8-bit MCC Adder

#### Wave form for 8-bit mcc adder



Figure 5.3 Waveform for 8-bit MCC Adder



#### **Result & conclusion**

The MCC is an efficient and widely accepted design approach to construct CLA adders. In this brief, we have presented a new Manchester design style that is based on two independent carry chains. Each chain computes, in parallel with the other, half of the carries. In this way, the speed performance is significantly improved with respect to that of the standard MCC topology. The proposed design technique has been applied for the implementation of 8-, 16-, 32-, and 64-bit adders in multi output domino logic, and the simulation results verified its efficiency.

The proposed SPST can obviously decrease the switching (or dynamic) power dissipation, which comprises a significant portion of the whole power dissipation in integrated circuits. When applied to the H.264 multi transform coding design (ETD), the proposed SPST can save 27% power consumption at the cost of only 20% area overheads. Besides, the proposed SPST can achieve a 24% saving in power consumption at the expense of only 10% area overheads for the proposed VMFU. Both the SPST-equipped ETD and the VMFU are verified in detail and physically implemented on chips using the 0.18- m CMOS technology. The performance comparisons also illustrate that the SPST-equipped designs are very competitive with the existing designs. Furthermore, the proposed SPST is a fully static CMOS circuit technique which does not aggravate the problems of leakage power, signal racing, and voltage dropping.

#### References

- 1. Amin, A. A., Nov 2007, Area-efficient high-speed carry chain, *Electron. Lett.*, vol. 43, no. 23, pp. 1258–1260.
- 2. Dimitrakopoulos. G. and Nikolos. D., Feb. 2005, High-speed parallel-prefix VLSI Ling adders, *IEEE Trans. Comput.*, vol. 54, no. 2, pp. 225–231.
- 3. Efstathiou. C., Vergos H. T., and Nikolos. D., Sep. 2002, Ling adders in CMOS standard cell technologies, in *Proc. 9th ICECS*.
- 4. Ercegovac. M. D. and Lang. T., 2004, *Digital Arithmetic*. San Mateo, CA, USA: Morgan Kaufmann.
- Osorio. M, Sampaio. C., Reis. A., and Ribas. R., 2004, Enhanced 32-bit carry look-ahead adder using multiple output enabledisable CMOS differential logic, in *Proc. 17th Symp. Integr. Circuits Syst. Design*.Perri. S., Corsonello. P., Pezzimenti. F., and Kantabutra. V., Dec. 2004, Fast and energy-efficient
- 6. Manchester carry-bypass adders, Proc. Inst. Elect. Eng.-Circuits Devices Syst., vol. 151, no. 6, pp. 497-502.
- 7. Ruiz. G. A. and Granda. M., Dec. 2004, An area-efficient static CMOS carry-select adder based on a compact carry lookahead unit, *Microelectron. J.*, vol. 35, no. 12, pp. 939–944.
- 8. K. Choi, R. Soma, and M. Pedram, "Dynamic voltage and frequency scaling based on workload decomposition," in *Proc. IEEE Int. Symp. Low Power Electron. Des.*, 2004, pp. 174–179.
- 9. J. Choi, J. Jeon, and K. Choi, "Power minimization of functional units by partially guarded computation," in *Proc. IEEE Int. Symp. Low Power Electron. Des.*, 2000, pp. 131–136.
- 10. O. Chen, R. Sheen, and S. Wang, "A low-power adder operating on effective dynamic data ranges," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 10, no. 4, pp. 435–453, Aug. 2002.
- 11. O. Chen, S.Wang, and Y. W.Wu, "Minimization of switching activities of partial products for designing low-power multipliers," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 11, no. 3, pp. 418–433, Jun. 2003.
- 12. L. Benini, G. D. Micheli, A. Macii, E. Macii, M. Poncino, and R. Scarsi, "Glitch power minimization by selective gate freezing," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 8, no. 3, pp. 287–298, Jun.2000.
- 13. S. Henzler, G. Georgakos, J. Berthold, and D. Schmitt-Landsiedel, "Fast power-efficient circuit-block switch-off scheme," *Electron. Lett.*, vol. 40, no. 2, pp. 103–104, Jan. 2004.
- 14. T. Xanthopoulos and A. P. Chandrakasan, "A low-power DCT core using adaptive bitwidth and arithmetic activity exploiting signal correlations and quantization," *IEEE J. Solid-State Circuits*, vol. 35, no. 5, pp. 740–750, May 2000.