Low Power, Area Efficient Multiply-Accumulate and its Application to a DTMAC Unit

V. Vimal Raj; C.S. Manikandababu

Low Power, Area Efficient Multiply-Accumulate and its Application to a DTMAC Unit

V. Vimal Raj, C.S. Manikandababu

Abstract

we propose a low power and area efficient two-cycle multiply-accumulate (2C-MAC) architecture which supports 2’s complement numbers, and includes accumulation guard bits and saturation circuitry. The first MAC pipeline stage contains only partial-product circuitry which is for generating partial product. And the second stage consists of, sign-extension block, saturation unit and all other functionality. Proposed architecture does not need any additional cycles to generate the final result. It efficiently produces the addition of the accumulated value and the product in each cycle. And extend the new architecture to create a double throughput MAC, which can perform either multiply or multiply-accumulate operations.

Keywords

Arithmetic Circuits, Energy Efficient, High Speed, Multiply-Accumulate Unit

Full Text:

PDF

References

O. L. MacSorley, ―High-speed arithmetic in binary computers,‖ Proc. Inst. Radio Eng. (IRE), vol. 49, pp. 67–91, Jan. 1961.

W.-C. Yeh and C.-W. Jen, ―High-speed booth encoded parallel multiplier design,‖ IEEE Trans. Comput., vol. 49, no. 7, pp. 692–701, Jul. 2000.

M. R. Santoro and M. A. Horowitz, ―SPIM: A pipeline 64 64 bit iterative multiplier,‖ IEEE J. Solid- State Circuits, vol. 2, no. 1, pp. 87–493, Apr. 1989.

V. G. Oklobdzija, D. Villeger, and S. S. Liu, ―A method for speed optimized partial product reduction and generation of fast parallel multipliers using an algorithmic approach,‖ IEEE Trans. Comput., vol. 45, no. 3, pp. 294–306, Mar. 1996.

S. K. Mathew, M. A. Anders, B. Bloechel, T. Nguyen, R. K. Krishnamurthy, and S. Borkar, ―A 4-GHz 300-mW 64-bit integer execution ALU with dual supply voltages in 90-nm CMOS,‖ IEEE J. Solid-State Circuits, vol. 40, no. 1, pp. 44–51, Jan. 2005.

J. Liu, S. Zhou, H. Zhu, and C.-K. Cheng, ―An algorithmic approach for generic parallel adders,‖ in Proc. IEEE Int. Conf. Comput. Aided Des. (ICCAD), Dec. 2003, pp. 734–740.

P. F. Stelling and V. G. Oklobdzija, ―Implementing multiply-accumulate operation in multiplication time,‖ in Proc. Int. Symp. Comput. Arithmetic (ARITH), July 1997, pp. 99–106.

J. Großschädl and G.-A. Kamendje, ―A single-cycle (32 × 32 +32 + 64)-bit multiply/accumulate unit for digital signal processing and public-key cryptography,‖ in Proc. IEEE Int. Conf. Electron., Circuits, Syst. (ICECS), Dec. 2008, pp. 739–742.

A. Abdelgawad and M. Bayoumi, ―High speed and area-efficient multiply accumulate (MAC) unit for digital signal processing applications,‖ in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May 2007, pp. 3199–3202.

M. D. Ercegovac and T. Lang, Digital Arithmetic. San Mateo, CA: Morgan Kaufmann, 2003.

T. T. Hoang, M. Själander, and P. Larsson-Edefors, ―High-speed, energy- efficient 2-cycle multiply- accumulate architecture,‖ in Proc. IEEE Int. SOC Conf.(SOC), Sep. 2009, pp. 119–122.

C. R. Baugh and B. A. Wooley, ―A two’s complement parallel array multiplication algorithm,‖ IEEE Trans. Comput., vol. C-22, pp. 1045–1047, Dec 1973.

M. Själander and P. Larsson-Edefors, ―Multiplication acceleration through twin precision,‖ IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol.17, pp. 1233–1246, Sep. 2009.

M. Hatamian and G. L. Cash, ―A 70-MHz 8-bit 8- bit parallel pipelined multiplier in 2.5- m CMOS,‖ IEEE J. Solid-State Circuits, vol. JSSC-21, no. 4, pp.505–513, 1986.

H. Eriksson, P. Larsson-Edefors, M. Sheeran, M. Själander, D. Johansson, and M. Schölin, ―Multiplier reduction tree with logarithmic logic depth and regular connectivity,‖ in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May 2006, pp. 4–8.

J. Sklansky, ―Conditional-sum addition logic,‖ IRE Trans. Electronic Comput., vol. EC-9, pp. 226–231, 1960.

P. M. Kogge and H. S. Stone, ―A parallel algorithm for the efficient solution of a general class of recurrence equations,‖ IEEE Trans. Comput., vol. C-22, no. 8, pp. 786–193, Aug. 1973.

D. Brooks and M. Martonosi, ―Dynamically exploiting narrow width operands to improve processor power and performance,‖ in Proc. Int. Symp. High- Perform. Comput. Archit., 1999, pp. 13–22.

S. Yoshizawa and Y. Miyanaga, ―Use of a variable wordlength technique in an OFDM receiver to reduce energy dissipation,‖ IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 55, no. 9, pp. 2848 –2859, Oct. 2008.

R. K. Kolagotla, J. Fridman, B. C. Aldrich, M. M. Hoffman, W. C. Anderson, M. S. Allen, D. B. Witt, R. R. Dunton, and L. A. Booth, ―High performance dual- MAC DSP architecture,‖ IEEE Signal Process. Mag., vol. 19, no. 4, pp. 42–53, Jul. 2002[21] S. Hong and S.-S. Chin, ―Reconfigurable embedded MAC core design for low-power coarse-grain FPGA,‖ Electron. Lett., vol. 39, no. 7, pp. 606–608, Apr. 2003.

T. T. Hoang, M. Själander, and P. Larsson-Edefors, ―Double throughput multiply-accumulate unit for FlexCore processor enhancements,‖ presented at the IEEE Int. Symp. Parallel Distrib. Process. (IPDPS), Reconfigurable Archit. Workshop (RAW), Rome, Italy, May 2009.

S.-R. Kuang and J.-P. Wang, ―Design of power- efficient configurable booth multiplier,‖ IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57, no. 3, pp. 568–580, Mar. 2010.

M. Själander, H. Eriksson, and P. Larsson-Edefors, ―An efficient twinprecision multiplier,‖ in Proc. IEEE Int. Conf. Comput. Des. (ICCD), Oct. 2004, pp. 30–33.

M. Thuresson, M. Själander, M. Björk, L. Svensson, P. Larsson-Edefors, and P. Stenstrom, ―FlexCore: Utilizing exposed datapath control for efficient computing,‖ Springer J. Signal Process. Syst., vol. 57, no. 1, pp. 5–19, Oct. 2009.

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 3.0 License.

Username
Password
Remember me