CN1324456C - Digital signal processor using mixed compression two stage flow multiplicaton addition unit - Google Patents

Digital signal processor using mixed compression two stage flow multiplicaton addition unit Download PDF

Info

Publication number
CN1324456C
CN1324456C CNB2004100157377A CN200410015737A CN1324456C CN 1324456 C CN1324456 C CN 1324456C CN B2004100157377 A CNB2004100157377 A CN B2004100157377A CN 200410015737 A CN200410015737 A CN 200410015737A CN 1324456 C CN1324456 C CN 1324456C
Authority
CN
China
Prior art keywords
unit
compressor
compressor reducer
compression
compressed tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2004100157377A
Other languages
Chinese (zh)
Other versions
CN1556467A (en
Inventor
陈健
王田
徐如淏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University Han Yuan Technology Co., Ltd.
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University Han Yuan Technology Co Ltd
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University Han Yuan Technology Co Ltd, Shanghai Jiaotong University filed Critical Shanghai Jiaotong University Han Yuan Technology Co Ltd
Priority to CNB2004100157377A priority Critical patent/CN1324456C/en
Publication of CN1556467A publication Critical patent/CN1556467A/en
Application granted granted Critical
Publication of CN1324456C publication Critical patent/CN1324456C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The present invention relates to a digital signal processor using mixed compression two-stage pipeline multiplication and addition unit. An arithmetic unit designs a multiplication and addition unit for a two-stage flow pipeline structure, a first stage pipeline is composed of a base-4 improved Booth coding unit and a compression tree unit mixed by a 3 to 2 compressor and a 4 to 2 compressor, and a second stage pipeline is composed of a 72-bit 3 to 2 compressor, a 72-bit carry look ahead adder, a selector and a selector control line; the mixed compression tree unit uses the 4 to 2 compressor as a root which is upward grown with two branches until the number of received signals of top branches achieves or exceeds that of signals to be compressed, only a highest layer is stipulated to be composed of the 3 to 2 compressor simultaneously, and the branches grown on the lower layers are complete except for a secondary highest layer. The multiplication and addition unit specially designed by the present invention reduces the time delay and simultaneously reduces the areas of chips, the frequency and the properties of the chips are enhanced, and the cost performance of the chips is imcreased.

Description

Adopt the digital signal processor that mixes compression two-stage flowing water multiplicaton addition unit
Technical field
What the present invention relates to is a kind of digital signal processor, and particularly a kind of digital signal processor that mixes compression two-stage flowing water multiplicaton addition unit that adopts belongs to digital signal processing technique field.
Background technology
Multiplicaton addition unit is the key operation unit of various digital computation chip, especially digital signal processing chips.Usually be divided into two independently multiplication and two parts of addition.In original design based on 3:2 compression Wallace tree multiplier, because the 3:2 compressor reducer is the essential structure unit of partial product compressed tree, its ratio of compression is not high enough, and the partial product compressed tree that is constituted is also regular inadequately.The 4:2 compressor unit that " extra low voltage and the low-power consumption 4-2 compressor reducer that are used for high-speed multiplication " (Ultra Low Voltage, Low Power 4-2 Compressor for High Speed Multiplications) literary composition discloses to the optimization of Wallace Tree Multiplier Design in international circuit in 2003 and system of IMS conference magazine (Proceedings of the 2003 International Symposium on Circuits and Systems) the 5th volume.The 4:2 compressor reducer has been finished the function of two 3:2 compressor reducer series connection, and is by the optimization of circuit, littler than the time delay of two 3:2 compressor reducers by the time delay of a 4:2 compressor reducer simultaneously.Yet, because the input port more (single compressor reducer has 5 input ports) of 4:2 compressor reducer, can cause idle than multiport in some cases, so its efficient is not high.In addition, it does not guarantee to make the delay character of partial product compressed tree to reach best.Traditional multiplicaton addition unit design is to finish the monocycle, there is not streamline, as the TMS320C54x DSP of American TI Company (referring to TMS320C54xDSP CPU and peripheral equipment with reference to the first volume (Rev.G) (TMS320C54x DSP CPU and PeripheralsReference Set Volume 1)).This method is when the design high speed digital signal processor, and multiplicaton addition unit has just become the critical path of whole digital signal processor, has limited the raising of entire chip frequency, simultaneously owing to need extra totalizer to cause the increase of chip area in the chip.
Summary of the invention
The objective of the invention is to overcome the deficiencies in the prior art, a kind of digital signal processor that mixes compression two-stage flowing water multiplicaton addition unit that adopts is provided, make it can pass through the two-stage The pipeline design, time delay reduces greatly on the critical path that multiplicaton addition unit is caused, the partial product compression stage adopts 3:2 and 4:2 mixed compression structure simultaneously, reduce the time delay on the compressed tree, increase substantially the frequency and the performance of digital signal processor, reduce the chip production cost.
The digital signal processor cores get that the present invention relates to partly comprises: address-generation unit, instruction decoding unit, procedure control unit, arithmetic operation unit.Procedure control unit provides instruction address by instruction bus to command memory, and the reception instruction is delivered to instruction decoding unit with it from command memory.Instruction decoding unit will be deciphered later data and send to the two-way parallel data channel, i.e. arithmetic operation unit and address-generation unit.Arithmetic operation unit is passed to procedure control unit with its status information, and operation result is given data-carrier store or received data from data-carrier store.Address-generation unit is given data access unit with address value, specifies the position of corresponding storage and read-write.The present invention has designed the multiplicaton addition unit of two stage pipeline structure especially in described arithmetic operation unit inside, mainly comprise: the compressed tree unit that base 4 improved Booth coding units, 3:2 compressor reducer and 4:2 compressor reducer mix, 72 3:2 compressor reducers, 72 carry look ahead adder units, selector switch and selector switch control line.32 multiplicands and the multiplier that adopt that the improved Booth coding unit of base 4 codings will import become partial product, then partial product are imported the compressed tree unit of 3:2 compressor reducer and 4:2 compressor reducer mixing, and this is a first order streamline.Second level streamline is 2 65 bit positions that the compressed tree unit of 3:2 compressor reducer and the mixing of 4:2 compressor reducer is exported to be amassed import 72 3:2 compressor reducers with another addend, 2 72 bit positions that the 3:2 compression is obtained amass 72 carry look ahead adder units of input behind process selector switch under the effect of selector switch control line at last, computing obtains final taking advantage of and adds the result, finishes a complete multiply-add operation.Selector switch also can choose 72 summands and 72 addends to finish 72 additive operations under the effect of selector switch control line simultaneously.
The mixed compression structure that the compressed tree unit that 3:2 compressor reducer of the present invention and 4:2 compressor reducer mix adopts is with the foundation of a 4:2 compressor reducer as the long-pending compressed tree of entire portion, two branches or directly accept four partial product signals and a carry input signal of can upwards growing on this base.If the growth branch, according to the attribute of two branches of being grown, promptly adopting the 4:2 compressor reducer still is the 3:2 compressor reducer, can accept 10 (4:2 compressor reducers) or 6 (3:2 compressor reducer) signals on these two branches at most.If this number is still less than the number of signals that will compress, then on these branches with separately compressibility continued growth branch, up to the top branch receptible number of signals meet or exceed the number of signals that will compress.Amass the systematicness of compressed tree simultaneously for retaining part, regulation has only the top of partial product compressed tree just might be made of the 3:2 compressor reducer, and except inferior high level, the branch that beneath layer is gone up growth is complete, that is to say all corresponding two branches in all unit on this layer.According to said method can determine the structure of compressed tree on the different lines.
Wanting compressed portion product order is under 16 the situation, considers the carry signal of previous stage tree, and the partial product compressed tree that the 3:2 compressor reducer constitutes needs 6 layers, and with the time-delay calculation of 2 XOR gate of each 3:2 compressor reducer, the time delay of generation is the time delay of 12 XOR gate.The partial product compressed tree that the 4:2 compressor reducer constitutes under the kindred circumstances needs the time delay of 12 XOR gate, and only need the time delay of 11 XOR gate by the partial product compressed tree of mixed structure, adopt the mixed structure compressed tree littler than the time delay of simple compressed tree like this, the area of compressed tree significantly reduces than simple compressed tree with the 4:2 compressor reducer simultaneously.In addition, the adder unit in the multiplicaton addition unit involved in the present invention can be realized addition function simultaneously, and just not needing to add 72 totalizers in addition realizes addition function for this, thereby has reduced the area of DSP chip.
Embody the present invention thus and have practicality characteristics and obvious improvement.Reduced chip area when it makes the time delay of digital signal processor multiplicaton addition unit reduce, frequency and performance that this has just improved chip have increased the cost performance of chip.
Description of drawings
Fig. 1 is the one-piece construction block diagram of digital signal processor of the present invention.
As shown in Figure 1, digital signal processor of the present invention is by digital signal processor kernel and command memory, and data-carrier store is formed by connecting, and wherein kernel comprises procedure control unit, instruction decoding unit, arithmetic operation unit and address-generation unit.
Fig. 2 is the structured flowchart of digital signal processor multiplicaton addition unit of the present invention.
The structured flowchart of the partial product compressed tree of Fig. 3 digital signal processor 3:2 of the present invention and 4:2 mixed compression structure.
Embodiment
Below in conjunction with accompanying drawing technical scheme of the present invention is further described.
Relation between each composition module in the digital signal processor of the present invention has been described among Fig. 1.As shown in Figure 1, the instruction decoding unit of digital signal processor kernel links to each other with arithmetic operation unit and address-generation unit, and the two-way procedure control unit that is connected to; Procedure control unit is connected to command memory, and obtains instruction from command memory; Arithmetic operation unit links to each other with procedure control unit is unidirectional; The two-way data-carrier store that is connected to of arithmetic operation unit, address-generation unit is connected to data-carrier store and can be carried out bidirectional data exchange with arithmetic operation unit by address bus.Multiplicaton addition unit is positioned at arithmetic operation unit.
Digital signal processor multiplicaton addition unit involved in the present invention adopts two stage pipeline structure, as shown in Figure 2, it comprises compressed tree unit, 72 3:2 compression array, 72 carry lookahead adders that base 4 improved Booth coding units, 3:2 compressor reducer and 4:2 compressor reducer mix, selector switch, the selector switch control line.Annexation between them is: the compressed tree unit of 3:2 compressor reducer and the mixing of 4:2 compressor reducer is linked in the output of base 4 improved Booth coding units, the output of compressed tree unit and 72 addends are linked 72 3:2 compression array together, the output of array and two addend (72 addends, 72 summands) link the input end of selector switch together, the output of selector switch connects 72 carry lookahead adders.
32 multiplicaton addition units have base 4 improved Booth coding units 32 multiplicands and multiplier are become 16 33 partial products that weights are different, simultaneously in order to prevent that the sign bit expansion from having produced a partial product (32 sign bit expansion and).Then these 17 partial products are imported the compressed tree unit that 3:2 compressor reducers and 4:2 compressor reducer mix, 72 addends after expansion of 2 partial-product sums that compression obtains are input in 72 3:2 compressor reducers, 2 partial products that at last 72 3:2 compressor compresses obtained and two addends are under the effect of selector switch control line, be input in 72 carry lookahead adders through behind the data selector, the output result of this carry lookahead adder is exactly the net result of whole multiplicaton addition unit.The compressed tree unit that base 4 improved Booth coding units and 3:2 compressor reducer and 4:2 compressor reducer mix has constituted first order streamline, 72 3:2 compressor reducers and 72 carry lookahead adders, selector switch, the selector switch control line has constituted second level streamline, two stage pipeline structure of Here it is digital signal processor multiplicaton addition unit involved in the present invention.
Fig. 3 is the block diagram of the partial product compressed tree of the 3:2 that adopts of digital signal processor of the present invention and 4:2 mixed structure, and it has described the concrete connected mode of the partial product compressed tree of 32 multiplier 3:2 and 4:2 mixed structure.It comprises partial product, the 3:2 compressor reducer, the annexation between 4:2 compressor reducer and 3:2 compressor reducer and the 4:2 compressor reducer, i.e. input of linking the 4:2 compressor reducer of the second layer by the output of the 3:2 compressor reducer of top layer, and link down one deck 4:2 compressor reducer input, to the last one deck by the output of this layer.The input end of compressed tree connects base 4 improved Booth coding units, and output is as the input of 72 3:2 compression array.Shown in the figure is to mix a longest partial product compressed tree of time delay in the compression array, is sent to the input end of the 4:2 compressor reducer of the second layer after the 3:2 of top layer compressor compresses in this partial product that lists process.Because except that time high level, the number of the 4:2 compressor reducer of other each layer is complete, so the number of this layer compression device is 4.Value after this layer compression is delivered to the input end of the 3rd layer 4:2 compressor reducer again, so down, and 4:2 compressor reducer to the last as root.The value that it is exported enters into and is positioned at 72 3:2 compression array of second pipelining-stage.The structure implementation of mixing compressed tree of the present invention that Here it is.

Claims (2)

1, a kind of digital signal processor that mixes compression two-stage flowing water multiplicaton addition unit that adopts, instruction decoding unit links to each other with arithmetic operation unit and address-generation unit, and the two-way procedure control unit that is connected to, procedure control unit is connected to command memory, and from command memory, obtain instruction, arithmetic operation unit links to each other with procedure control unit is unidirectional, the two-way data-carrier store that is connected to of arithmetic operation unit, address-generation unit is connected to data-carrier store and can be carried out bidirectional data exchange with arithmetic operation unit by address bus, the multiplicaton addition unit that it is characterized in that arithmetic operation unit inside adopts two stage pipeline structure, the compressed tree unit of 3:2 compressor reducer and the mixing of 4:2 compressor reducer is linked in the output of base 4 improved Booth coding units, the output of compressed tree unit and 72 addends are linked 72 3:2 compression array together, the output of array and two addends are 72 addends, 72 summands are linked the input end of selector switch together, the output of selector switch connects 72 carry lookahead adders, base 4 improved Booth coding units become 16 33 partial products that weights are different with 32 multiplicands and multiplier, produce a partial product simultaneously, i.e. 32 sign bit expansion and, then these 17 partial products are imported the compressed tree unit that 3:2 compressor reducers and 4:2 compressor reducer mix, 72 addends after expansion of 2 partial-product sums that compression obtains are input in 72 3:2 compressor reducers, 2 partial products and two addends of at last 72 3:2 compressor compresses being obtained are input in 72 carry lookahead adders through behind the selector switch under the effect of selector switch control line, are exported the net result of whole multiplicaton addition unit by 72 carry lookahead adders; The compressed tree unit that base 4 improved Booth coding units and 3:2 compressor reducer and 4:2 compressor reducer mix constitutes first order streamline, 72 3:2 compressor reducers and 72 carry lookahead adders, and selector switch, the selector switch control line constitutes second level streamline.
2, employing as claimed in claim 1 mixes the digital signal processor of compression two-stage flowing water multiplicaton addition unit, the compressed tree unit that it is characterized in that the mixing of described 3:2 compressor reducer and 4:2 compressor reducer is with the foundation of a 4:2 compressor reducer as the long-pending compressed tree of entire portion, two branches of on this base, upwards growing, if acceptable signal number is less than the number of signals that will compress on these two branches, then on these branches with separately compressibility continued growth branch, up to the top branch receptible number of signals meet or exceed the number of signals that will compress, regulation has only the top of partial product compressed tree to be made of the 3:2 compressor reducer simultaneously, and except inferior high level, the branch that beneath all layers are gone up growth is complete.
CNB2004100157377A 2004-01-09 2004-01-09 Digital signal processor using mixed compression two stage flow multiplicaton addition unit Expired - Fee Related CN1324456C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2004100157377A CN1324456C (en) 2004-01-09 2004-01-09 Digital signal processor using mixed compression two stage flow multiplicaton addition unit

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2004100157377A CN1324456C (en) 2004-01-09 2004-01-09 Digital signal processor using mixed compression two stage flow multiplicaton addition unit

Publications (2)

Publication Number Publication Date
CN1556467A CN1556467A (en) 2004-12-22
CN1324456C true CN1324456C (en) 2007-07-04

Family

ID=34351491

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2004100157377A Expired - Fee Related CN1324456C (en) 2004-01-09 2004-01-09 Digital signal processor using mixed compression two stage flow multiplicaton addition unit

Country Status (1)

Country Link
CN (1) CN1324456C (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100555212C (en) * 2007-07-18 2009-10-28 中国科学院计算技术研究所 The carry calibration equipment of a kind of floating dual MAC and multiplication CSA compressed tree thereof
CN102722352B (en) * 2012-05-21 2015-06-03 华南理工大学 Booth multiplier
CN103412737B (en) * 2013-06-27 2016-08-10 清华大学 Realize the gate circuit of base 4-Booth coded method and streamline large number multiplication device based on the method
WO2016037307A1 (en) * 2014-09-09 2016-03-17 华为技术有限公司 Processor
CN105653240A (en) * 2015-12-30 2016-06-08 深圳市正东源科技有限公司 Multiplying unit used for RFID (Radio Frequency Identification) security chip, and implementation method
CN107957976B (en) * 2017-12-15 2020-12-18 安徽寒武纪信息科技有限公司 Calculation method and related product
CN109542393B (en) * 2018-11-19 2022-11-04 电子科技大学 Approximate 4-2 compressor and approximate multiplier
CN111384958B (en) * 2018-12-27 2024-04-05 上海寒武纪信息科技有限公司 Data compression device and related product
CN110413254B (en) * 2019-09-24 2020-01-10 上海寒武纪信息科技有限公司 Data processor, method, chip and electronic equipment
CN113010146B (en) * 2021-03-05 2022-02-11 唐山恒鼎科技有限公司 Mixed signal multiplier

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1176425A (en) * 1996-06-06 1998-03-18 松下电器产业株式会社 Arithmetic processing device
CN1278341A (en) * 1997-10-28 2000-12-27 爱特梅尔股份有限公司 Fast regular multiplier architecture

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1176425A (en) * 1996-06-06 1998-03-18 松下电器产业株式会社 Arithmetic processing device
CN1278341A (en) * 1997-10-28 2000-12-27 爱特梅尔股份有限公司 Fast regular multiplier architecture

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
微电子学 何晶,韩月秋,331-334,一种双精度浮点乘法器的设计 2003 *
微电子学 何晶,韩月秋,331-334,一种双精度浮点乘法器的设计 2003;微电子学 徐锋,邵丙铣,56-59,16×16位高速低功耗并行乘法器的实现 2003;西安电子科学大学学报(自然科学版) 许琪,原巍,沈绪榜,580-583,一种新的树型乘法器的设计 2002;现代电子技术 刘东,21-22,25,采用Booth算法的16×16并行乘法器设计 2003 *
微电子学 徐锋,邵丙铣,56-59,16×16位高速低功耗并行乘法器的实现 2003 *
现代电子技术 刘东,21-22,25,采用Booth算法的16×16并行乘法器设计 2003 *
西安电子科学大学学报(自然科学版) 许琪,原巍,沈绪榜,580-583,一种新的树型乘法器的设计 2002 *

Also Published As

Publication number Publication date
CN1556467A (en) 2004-12-22

Similar Documents

Publication Publication Date Title
Tsoumanis et al. An optimized modified booth recoder for efficient design of the add-multiply operator
CN1324456C (en) Digital signal processor using mixed compression two stage flow multiplicaton addition unit
CN100465877C (en) High speed split multiply accumulator apparatus
CN111047034B (en) On-site programmable neural network array based on multiplier-adder unit
Abdelouahab et al. The challenge of multi-operand adders in CNNs on FPGAs: How not to solve it!
CN107092462B (en) 64-bit asynchronous multiplier based on FPGA
CN103607207B (en) A kind of multi-interface data compression device of plug and play
CN108108812A (en) For the efficiently configurable convolutional calculation accelerator of convolutional neural networks
Kumar et al. Performance analysis of FIR filter using booth multiplier
CN106505971A (en) A kind of low complex degree FIR filter structure of the row that rearranged based on structured adder order
CN2854697Y (en) Universal reconfiguration computing array faced to computer
CN108429546A (en) A kind of mixed type FIR filter design method
Jangalwa et al. Design and Analysis of 8-Bit Multiplier for Low Power VLSI Applications
Hosangadi et al. Optimizing high speed arithmetic circuits using three-term extraction
kumar Varshney et al. Deployment of Braun Multiplier Using Novel Adder Formulations
CN209496362U (en) Three n binary adders of input
CN106168941B (en) A kind of FFT butterfly computation hardware circuit implementation for supporting complex multiplication
CN104683806B (en) MQ arithmetic encoder high speed FPGA implementation methods based on depth flowing water
CN111897513A (en) Multiplier based on reverse polarity technology and code generation method thereof
CN102238348B (en) Data amount-variable radix-4 module for fast Fourier transform (FFT)/inverse fast Fourier transform (IFFT) processor
CN1932800A (en) Asynchronous Fast Fourier Transform Processor Circuit
Kumar et al. CLA Based 32-bit signed pipelined Multiplier
CN113361687B (en) Configurable addition tree suitable for convolutional neural network training accelerator
CN201177811Y (en) Data processing system and constituted ASIC chip thereby
Bai et al. Logic Design and Power Optimization of Floating‐Point Multipliers

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
ASS Succession or assignment of patent right

Owner name: SHANGHAI JIAOTONG UNIV.

Free format text: FORMER OWNER: SHANGHAI HANXIN SEMICONDUCTOR TECHNOLOGY CO., LTD.

Owner name: SHANGHAI JIAODA HISYS TECHNOLOGY CO., LTD.

Effective date: 20050805

C41 Transfer of patent application or patent right or utility model
TA01 Transfer of patent application right

Effective date of registration: 20050805

Address after: 200240 No. 800, Dongchuan Road, Shanghai, Minhang District

Applicant after: Shanghai Jiao Tong University

Co-applicant after: Shanghai Jiaotong University Han Yuan Technology Co., Ltd.

Address before: 201109 Shanghai Jianchuan Road No. 468

Applicant before: Shanghai Hanxin Semiconductor Technology Co., Ltd.

C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20070704

Termination date: 20100209