CN1584821A - Cutting multiplying accumulating unit with parallel processing - Google Patents

Cutting multiplying accumulating unit with parallel processing Download PDF

Info

Publication number
CN1584821A
CN1584821A CN 03153649 CN03153649A CN1584821A CN 1584821 A CN1584821 A CN 1584821A CN 03153649 CN03153649 CN 03153649 CN 03153649 A CN03153649 A CN 03153649A CN 1584821 A CN1584821 A CN 1584821A
Authority
CN
China
Prior art keywords
partial product
alienable
accumulator
multiply
generation unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 03153649
Other languages
Chinese (zh)
Inventor
姜小波
陈杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
MICROELECTRONIC CT CHINESE ACA
Original Assignee
MICROELECTRONIC CT CHINESE ACA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by MICROELECTRONIC CT CHINESE ACA filed Critical MICROELECTRONIC CT CHINESE ACA
Priority to CN 03153649 priority Critical patent/CN1584821A/en
Publication of CN1584821A publication Critical patent/CN1584821A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Complex Calculations (AREA)

Abstract

A multiplication accumulating unit comprising partial product accumulating array, partial product generating unit and divisible accumulator is featured as connecting output end of partial products generating unit to input end of accumulating array with its output end being connected with input end of accumulator. It can carry out multiplication accumulating operation for 32 position, 16 position and 8 position.

Description

The alienable multiply-accumulator of parallel processing
Technical field
The present invention relates to the digital signal processor field.Specifically, the alienable multiply-accumulator that relates to a kind of alienable digital signal processor data channel-parallel processing.
Background technology
Need in the digital signal processing to use a large amount of multiplication accumulating operations, so multiply-accumulator is the important component part of digital signal processor.Present digital signal processing requires to handle multi-medium data, and the multi-medium data processing requirements improves 16 and 8 s' the handling capacity of data and the dirigibility of bit wide.And in the general digital signal processor, the only fixedly computing of bit wide of multiply-accumulator, this handles to use to multimedia and brings very big inconvenience.For example among the MPEG-4, computing requires bit wide from 8 to 64 dirigibility.In addition, present processor, bit wide is done wideer and wideer, accomplishes 32,64 even wideer.And in the practical application, often only use 16 computing, for example the Digital Speech Communication of widespread use at present adopts 16.And in Flame Image Process, generally only use 8 computing.If the processor that adopts 32 carries out 16 even 8 s' computing, the waste that can bring power consumption and area.
Multimedia strengthens among microprocessor and the DSP, requires handling up of 16 and 8 big processing.At present the more pipeline multiplier of report on the document utilizes two 16 multiplier, can finish a multiplication accumulating operation of 32 and handle up in two cycles, finishes 16 multiplication accumulating operation in the one-period twice.For example the coprocessor of the IntelRXScaleTM of Intel Company in order to strengthen the multimedia processing capacity, has adopted two 16 parallel multiplications, handling up when increasing by 16 processing.This structure can well realize 16 bit arithmetics, but when carrying out 32 bit arithmetics, handling capacity has reduced.And this pipeline organization can only realize the computing of two kinds of figure places, and dirigibility not enough.In addition, when this structure applied in the processor, the cycle of 32 and 16 bit arithmetics was different, must adopt different codings, and this has brought extra expense.
General divisible multiplier is the string-parallel multiplier of a kind of employing based on the Baugh-Wooley algorithm, the parallel input of multiplicand, and multiplier serial input.This method realizes that divisible function aspects modularization is better, but has adopted the serial input, finishes one 16 computing, needs 32 cycles.This is inconceivable in DSP uses.And this method adopts Theravada's musical instruments used in a Buddhist or Taoist mass to piece together the approach of big multiplier, and this has reduced the performance of big multiplier.
Summary of the invention
The objective of the invention is to, a kind of alienable multiply-accumulator of parallel processing is provided, can in one-period, finish one time 32, twice 16, four times 8 computings.
The alienable multiply-accumulator of a kind of parallel processing of the present invention is characterized in that, comprising:
A partial product generation unit, partial product array that adds up, a divisible totalizer constitutes; Wherein the output terminal of partial product generation unit is connected in the add up input end of array of partial product; The add up output terminal of array of partial product is connected in the input end of divisible totalizer; Utilize this structure to realize 32,16 and 8 multiplication accumulating operations.
Wherein alienable partial product generation unit is to be made of 32 sub-generation units, and its neutron generation unit is made of a gate with door and one two input, wherein an input end of the output termination of this gate and door; Alienable partial product generation unit produces 32 multiplyings, two 16 multiplyings, the partial product of four 8 multiplyings according to different mode control signals.
Wherein alienable accumulator element is made of 4 20 totalizer, 20 bit accumulators are made of 19 unit of full adder and a divisible unit of full adder, unit of full adder is made of a full adder and a multi-selection device, the input end of the output termination full adder of this multi-selection device; Alienable multiply-accumulator can be operated in 32 under mode signal control, and 16,8 mode of operations.Finish 80 accumulating operations respectively one time, twice 40 accumulating operations and four 20 accumulating operations.
Adopt multiply-accumulator of the present invention, can in a clock period, finish 32 multiplication accumulating operation one time, twice multiplication accumulating operations of 16,4 multiplication accumulating operations of 8 have increased the dirigibility of DSP data channel.This structure has increased the handling capacity of 16 and 8 multiplyings.This structure can adopt the method for pattern control bit, realizes the not conversion of isotopic number computing, can not change the coding of operational order like this, improves the code efficiency of processor.This structure adopts the mode of superfluous signal zero setting to realize resource when carrying out 8 and 16 bit arithmetics, resource multiplex rate height, and multiplexing same adder array has been saved area.This structure has increased the function of 32 multipliers, and is also very little to Effect on Performance.
Description of drawings
By description, further describe structure, advantage and the performance of the alienable multiply-accumulator of the present invention below in conjunction with accompanying drawing to specific embodiment, wherein:
Fig. 1 is the overall construction drawing of the parallel alienable multiply-accumulator of the present invention.
Fig. 2 is the internal frame diagram of the parallel alienable multiply-accumulator of the present invention.
Fig. 3 is the partial product synoptic diagram of the parallel alienable multiply-accumulator of the present invention.
Fig. 4 is the partial product structural drawing of the parallel alienable multiply-accumulator of the present invention.
Fig. 5 is the partial product internal element of the parallel alienable multiply-accumulator of the present invention.
Fig. 6 is the parallel divisible totalizer block diagram of alienable multiply-accumulator of the present invention.
Fig. 7 is the parallel alienable multiply-accumulator 20 bit accumulator structural drawing of the present invention.
Fig. 8 is the parallel divisible totalizer internal element of alienable multiply-accumulator of the present invention.
Embodiment
See also Fig. 1, the alienable multiply-accumulator of a kind of parallel processing of the present invention constitutes comprising: 21, one partial products of a partial product generation unit 22, one divisible totalizers 23 of array that add up; Wherein the output terminal of partial product generation unit 21 is connected in the add up input end of array 22 of partial product; The add up output terminal of array 22 of partial product is connected in the input end of divisible totalizer 23; Utilize this structure to realize 32,16 and 8 multiplication accumulating operations.
Wherein alienable partial product generation unit 21 is to constitute (as Fig. 4) by 32 sub-generation units 41, and its neutron generation unit is made of the gates 52 with 51 and one two inputs of door, wherein the output termination of this gate 52 and an input end of 51; Alienable partial product generation unit produces 32 multiplyings, two 16 multiplyings, the partial product of four 8 multiplyings according to different mode control signals.
Wherein alienable accumulator element 23 constitutes (among Fig. 6) by 4 20 totalizer 61,20 bit accumulators 61 are made of 19 unit of full adder 71 and a divisible unit of full adder 72, unit of full adder 71 constitutes (among Fig. 8) by a full adder 81 and a multi-selection device 82, the input end of the output termination full adder 81 of this multi-selection device 82; Alienable multiply-accumulator can be operated in 32 under mode signal control, and 16,8 mode of operations.Finish 80 accumulating operations respectively one time, twice 40 accumulating operations and four 20 accumulating operations.
Please consult Fig. 1 again, Fig. 1 provides a kind of overall construction drawing of implementing divisible multiply-accumulator.Wherein 11 is multiply-accumulators.12 and 13 is input ports of 32, input multiplier and multiplicand; Under 32 mode of operations, be one 32 input; Under 16 mode of operations, be combined into by two 16 figure places; Under 8 mode of operations, be combined into by four 8 figure places.14 is cumulative number input ports of one 32; Under 32 mode of operations, be one 32 input; Under 16 mode of operations, be combined into by two 16 figure places; Under 8 mode of operations, be combined into by four 8 figure places.The 15th, the mode control signal input port, the control multiply-accumulator is to be operated in 32 bit patterns, 16 bit patterns or 8 bit patterns.16 is output ports as a result of 80, under 32 mode of operations, exports one 80 multiplication accumulation result; Under 16 mode of operations, export two 40 multiplication accumulation results; Under 8 mode of operations, export four 20 multiplication accumulation result.
See also Fig. 2, Fig. 2 is the internal frame diagram of parallel alienable multiply-accumulator.Constitute by 21, one partial products of a partial product generation unit divisible accumulator elements 23 of 22, one of arrays that add up.Wherein 21 is partial product generation units, produces partial product, and this partial product generation unit is different with traditional partial product generation unit.Partial product generation unit 21 produces corresponding partial product under different mode of operations.The 22nd, the partial product array that adds up, its input is the output of partial product generation unit.21 partial products that generate are added up, can adopt compression of WALLCE tree or alternate manner to realize.The 23rd, divisible accumulator element, its input is the add up output of array 22 of partial product.24 and 25 to be 32 be the input of multiplier and multiplicand.The 26th, mode control signal.The 27th, input adds up.The 28th, the multiplication accumulation result.
Fig. 3 is the partial product synoptic diagram of parallel alienable multiply-accumulator.It is the output of partial product generation unit 21.Be operated under 32 bit patterns, all partial products are the values of deciphering out according to input 32 figure places.Be operated under 16 bit patterns, the partial product generation unit is not filling part 31 zero setting in the synoptic diagram, and filling part 32,33 generates two 16 * 16 partial product.Be operated under 8 bit patterns, except blue filling part 32, the partial product generation unit is other parts 31 and 33 zero setting.Partial product generates decoding can adopt BOOTH decoding or other any interpretation method.
Fig. 4 is the partial product generation unit structural drawing of the parallel alienable multiply-accumulator of the present invention, and it is to be made of 32 sub-generation units 41.Cellular construction is seen Fig. 5.
Fig. 5 is the inner subelement of the partial product generation unit of parallel alienable multiply-accumulator.It is made of a gate 52 with 51 and one two inputs of door.The 53rd, an input of multiplicand.The 5th, an input of multiplier.55 is 0 inputs.The 57th, select signal, generate by mode control signal.The 58th, partial product.Long-pending by 57 control sections is XiYj or zero.
Fig. 6 is the parallel divisible totalizer block diagram of alienable multiply-accumulator.It is made of 4 20 totalizer 61.Under 8,16 and 32 mode of operations, can obtain four 20 accumulation result respectively, two 40 accumulation result and one 80 s' accumulation result.Totalizer can be carry look ahead totalizer or other totalizer.
Fig. 7 is the parallel alienable multiply-accumulator 20 bit accumulator structural drawing of the present invention.It is made of 19 unit of full adder 71 and a divisible unit of full adder 72.Divisible unit of full adder structure is seen Fig. 8.
Fig. 8 is the parallel divisible accumulator element of alienable multiply-accumulator.It is made of a full adder 81 and a multi-selection device 82.83, the 84th, the input of full adder.85,86 be respectively full adder output with position and carry digit.The 87th, the carry digit of upper level full adder input, the 88th, zero input.The 810th, mode select signal.This unit is placed on first unit of 20 bit accumulators, can realize controlling the effect of carry chain by it, realizes the divisible function that adds up.

Claims (3)

1, a kind of alienable multiply-accumulator of parallel processing is characterized in that, comprising:
A partial product generation unit, partial product array that adds up, a divisible totalizer constitutes; Wherein the output terminal of partial product generation unit is connected in the add up input end of array of partial product; The add up output terminal of array of partial product is connected in the input end of divisible totalizer; Utilize this structure to realize 32,16 and 8 multiplication accumulating operations.
2, the alienable multiply-accumulator of parallel processing according to claim 1, it is characterized in that, wherein alienable partial product generation unit is to be made of 32 sub-generation units, its neutron generation unit is made of a gate with door and one two input, wherein an input end of the output termination of this gate and door; Alienable partial product generation unit produces 32 multiplyings, two 16 multiplyings, the partial product of four 8 multiplyings according to different mode control signals.
3, the alienable multiply-accumulator of parallel processing according to claim 1, it is characterized in that, wherein alienable accumulator element is made of 4 20 totalizer, 20 bit accumulators are made of 19 unit of full adder and a divisible unit of full adder, unit of full adder is made of a full adder and a multi-selection device, the input end of the output termination full adder of this multi-selection device; Alienable multiply-accumulator can be operated in 32 under mode signal control, and 16,8 mode of operations.Finish 80 accumulating operations respectively one time, twice 40 accumulating operations and four 20 accumulating operations.
CN 03153649 2003-08-19 2003-08-19 Cutting multiplying accumulating unit with parallel processing Pending CN1584821A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 03153649 CN1584821A (en) 2003-08-19 2003-08-19 Cutting multiplying accumulating unit with parallel processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 03153649 CN1584821A (en) 2003-08-19 2003-08-19 Cutting multiplying accumulating unit with parallel processing

Publications (1)

Publication Number Publication Date
CN1584821A true CN1584821A (en) 2005-02-23

Family

ID=34597789

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 03153649 Pending CN1584821A (en) 2003-08-19 2003-08-19 Cutting multiplying accumulating unit with parallel processing

Country Status (1)

Country Link
CN (1) CN1584821A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101729073B (en) * 2008-10-10 2012-10-24 国民技术股份有限公司 High-speed Sigma-Delta modulation method and modulator
CN108229668A (en) * 2017-09-29 2018-06-29 北京市商汤科技开发有限公司 Operation implementation method, device and electronic equipment based on deep learning
CN111666066A (en) * 2017-04-28 2020-09-15 英特尔公司 Instructions and logic to perform floating point and integer operations for machine learning

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101729073B (en) * 2008-10-10 2012-10-24 国民技术股份有限公司 High-speed Sigma-Delta modulation method and modulator
CN111666066A (en) * 2017-04-28 2020-09-15 英特尔公司 Instructions and logic to perform floating point and integer operations for machine learning
CN111666066B (en) * 2017-04-28 2021-11-09 英特尔公司 Method for accelerating machine learning operation, graphic processing unit and data processing system
US11169799B2 (en) 2017-04-28 2021-11-09 Intel Corporation Instructions and logic to perform floating-point and integer operations for machine learning
US11720355B2 (en) 2017-04-28 2023-08-08 Intel Corporation Instructions and logic to perform floating point and integer operations for machine learning
US12039331B2 (en) 2017-04-28 2024-07-16 Intel Corporation Instructions and logic to perform floating point and integer operations for machine learning
CN108229668A (en) * 2017-09-29 2018-06-29 北京市商汤科技开发有限公司 Operation implementation method, device and electronic equipment based on deep learning
CN108229668B (en) * 2017-09-29 2020-07-07 北京市商汤科技开发有限公司 Operation implementation method and device based on deep learning and electronic equipment

Similar Documents

Publication Publication Date Title
Jenkins et al. The use of residue number systems in the design of finite impulse response digital filters
CN1230735C (en) Processing multiply-accumulate operations in single cycle
CN100405361C (en) Method and system for performing calculation operations and a device
CN1439126A (en) Digital signal processor with coupled multiply-accumulate units
CN1120696A (en) Multibit shifting apparatus, data processor using same, and method therefor
CN110058840A (en) A kind of low-consumption multiplier based on 4-Booth coding
He et al. A new redundant binary booth encoding for fast $2^{n} $-bit multiplier design
CN101625634A (en) Reconfigurable multiplier
CN208190613U (en) A kind of fractional order integrator realized based on FPGA
CN106775577B (en) A kind of design method of the non-precision redundant manipulators multiplier of high-performance
CN102360281B (en) Multifunctional fixed-point media access control (MAC) operation device for microprocessor
CN1584821A (en) Cutting multiplying accumulating unit with parallel processing
CN109388373A (en) Multiplier-divider for low-power consumption kernel
CN107423026A (en) The implementation method and device that a kind of sin cos functionses calculate
CN103412737A (en) Base 4-Booth coding method, door circuit and assembly line large number multiplying unit
CN107092462B (en) 64-bit asynchronous multiplier based on FPGA
CN1448871A (en) Design method of built-in parallel two-dimensional discrete wavelet conversion VLSI structure
CN1203399C (en) Arithmetic unit and method of selectively delaying a multiplication result
CN100405289C (en) Floating-point multiplicator and method of compatible double-prepcision and double-single precision computing
Mahitha et al. A low power signed redundant binary vedic multiplier
Merchant et al. Efficient realization of table look-up based double precision floating point arithmetic
CN1567178A (en) Multiplier restructuring algorithm and circuit thereof
CN115001485A (en) Direct digital frequency synthesizer based on Taylor polynomial approximation
CN1553310A (en) Symmetric cutting algorithm for high-speed low loss multiplier and circuit strucure thereof
CN1122024A (en) Multiplying operation method for optional word length and accuracy and multiplier thereby

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication