CN1584821A - Cutting multiplying accumulating unit with parallel processing - Google Patents
Cutting multiplying accumulating unit with parallel processing Download PDFInfo
- Publication number
- CN1584821A CN1584821A CN 03153649 CN03153649A CN1584821A CN 1584821 A CN1584821 A CN 1584821A CN 03153649 CN03153649 CN 03153649 CN 03153649 A CN03153649 A CN 03153649A CN 1584821 A CN1584821 A CN 1584821A
- Authority
- CN
- China
- Prior art keywords
- partial product
- alienable
- accumulator
- multiply
- generation unit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Complex Calculations (AREA)
Abstract
A multiplication accumulating unit comprising partial product accumulating array, partial product generating unit and divisible accumulator is featured as connecting output end of partial products generating unit to input end of accumulating array with its output end being connected with input end of accumulator. It can carry out multiplication accumulating operation for 32 position, 16 position and 8 position.
Description
Technical field
The present invention relates to the digital signal processor field.Specifically, the alienable multiply-accumulator that relates to a kind of alienable digital signal processor data channel-parallel processing.
Background technology
Need in the digital signal processing to use a large amount of multiplication accumulating operations, so multiply-accumulator is the important component part of digital signal processor.Present digital signal processing requires to handle multi-medium data, and the multi-medium data processing requirements improves 16 and 8 s' the handling capacity of data and the dirigibility of bit wide.And in the general digital signal processor, the only fixedly computing of bit wide of multiply-accumulator, this handles to use to multimedia and brings very big inconvenience.For example among the MPEG-4, computing requires bit wide from 8 to 64 dirigibility.In addition, present processor, bit wide is done wideer and wideer, accomplishes 32,64 even wideer.And in the practical application, often only use 16 computing, for example the Digital Speech Communication of widespread use at present adopts 16.And in Flame Image Process, generally only use 8 computing.If the processor that adopts 32 carries out 16 even 8 s' computing, the waste that can bring power consumption and area.
Multimedia strengthens among microprocessor and the DSP, requires handling up of 16 and 8 big processing.At present the more pipeline multiplier of report on the document utilizes two 16 multiplier, can finish a multiplication accumulating operation of 32 and handle up in two cycles, finishes 16 multiplication accumulating operation in the one-period twice.For example the coprocessor of the IntelRXScaleTM of Intel Company in order to strengthen the multimedia processing capacity, has adopted two 16 parallel multiplications, handling up when increasing by 16 processing.This structure can well realize 16 bit arithmetics, but when carrying out 32 bit arithmetics, handling capacity has reduced.And this pipeline organization can only realize the computing of two kinds of figure places, and dirigibility not enough.In addition, when this structure applied in the processor, the cycle of 32 and 16 bit arithmetics was different, must adopt different codings, and this has brought extra expense.
General divisible multiplier is the string-parallel multiplier of a kind of employing based on the Baugh-Wooley algorithm, the parallel input of multiplicand, and multiplier serial input.This method realizes that divisible function aspects modularization is better, but has adopted the serial input, finishes one 16 computing, needs 32 cycles.This is inconceivable in DSP uses.And this method adopts Theravada's musical instruments used in a Buddhist or Taoist mass to piece together the approach of big multiplier, and this has reduced the performance of big multiplier.
Summary of the invention
The objective of the invention is to, a kind of alienable multiply-accumulator of parallel processing is provided, can in one-period, finish one time 32, twice 16, four times 8 computings.
The alienable multiply-accumulator of a kind of parallel processing of the present invention is characterized in that, comprising:
A partial product generation unit, partial product array that adds up, a divisible totalizer constitutes; Wherein the output terminal of partial product generation unit is connected in the add up input end of array of partial product; The add up output terminal of array of partial product is connected in the input end of divisible totalizer; Utilize this structure to realize 32,16 and 8 multiplication accumulating operations.
Wherein alienable partial product generation unit is to be made of 32 sub-generation units, and its neutron generation unit is made of a gate with door and one two input, wherein an input end of the output termination of this gate and door; Alienable partial product generation unit produces 32 multiplyings, two 16 multiplyings, the partial product of four 8 multiplyings according to different mode control signals.
Wherein alienable accumulator element is made of 4 20 totalizer, 20 bit accumulators are made of 19 unit of full adder and a divisible unit of full adder, unit of full adder is made of a full adder and a multi-selection device, the input end of the output termination full adder of this multi-selection device; Alienable multiply-accumulator can be operated in 32 under mode signal control, and 16,8 mode of operations.Finish 80 accumulating operations respectively one time, twice 40 accumulating operations and four 20 accumulating operations.
Adopt multiply-accumulator of the present invention, can in a clock period, finish 32 multiplication accumulating operation one time, twice multiplication accumulating operations of 16,4 multiplication accumulating operations of 8 have increased the dirigibility of DSP data channel.This structure has increased the handling capacity of 16 and 8 multiplyings.This structure can adopt the method for pattern control bit, realizes the not conversion of isotopic number computing, can not change the coding of operational order like this, improves the code efficiency of processor.This structure adopts the mode of superfluous signal zero setting to realize resource when carrying out 8 and 16 bit arithmetics, resource multiplex rate height, and multiplexing same adder array has been saved area.This structure has increased the function of 32 multipliers, and is also very little to Effect on Performance.
Description of drawings
By description, further describe structure, advantage and the performance of the alienable multiply-accumulator of the present invention below in conjunction with accompanying drawing to specific embodiment, wherein:
Fig. 1 is the overall construction drawing of the parallel alienable multiply-accumulator of the present invention.
Fig. 2 is the internal frame diagram of the parallel alienable multiply-accumulator of the present invention.
Fig. 3 is the partial product synoptic diagram of the parallel alienable multiply-accumulator of the present invention.
Fig. 4 is the partial product structural drawing of the parallel alienable multiply-accumulator of the present invention.
Fig. 5 is the partial product internal element of the parallel alienable multiply-accumulator of the present invention.
Fig. 6 is the parallel divisible totalizer block diagram of alienable multiply-accumulator of the present invention.
Fig. 7 is the parallel alienable multiply-accumulator 20 bit accumulator structural drawing of the present invention.
Fig. 8 is the parallel divisible totalizer internal element of alienable multiply-accumulator of the present invention.
Embodiment
See also Fig. 1, the alienable multiply-accumulator of a kind of parallel processing of the present invention constitutes comprising: 21, one partial products of a partial product generation unit 22, one divisible totalizers 23 of array that add up; Wherein the output terminal of partial product generation unit 21 is connected in the add up input end of array 22 of partial product; The add up output terminal of array 22 of partial product is connected in the input end of divisible totalizer 23; Utilize this structure to realize 32,16 and 8 multiplication accumulating operations.
Wherein alienable partial product generation unit 21 is to constitute (as Fig. 4) by 32 sub-generation units 41, and its neutron generation unit is made of the gates 52 with 51 and one two inputs of door, wherein the output termination of this gate 52 and an input end of 51; Alienable partial product generation unit produces 32 multiplyings, two 16 multiplyings, the partial product of four 8 multiplyings according to different mode control signals.
Wherein alienable accumulator element 23 constitutes (among Fig. 6) by 4 20 totalizer 61,20 bit accumulators 61 are made of 19 unit of full adder 71 and a divisible unit of full adder 72, unit of full adder 71 constitutes (among Fig. 8) by a full adder 81 and a multi-selection device 82, the input end of the output termination full adder 81 of this multi-selection device 82; Alienable multiply-accumulator can be operated in 32 under mode signal control, and 16,8 mode of operations.Finish 80 accumulating operations respectively one time, twice 40 accumulating operations and four 20 accumulating operations.
Please consult Fig. 1 again, Fig. 1 provides a kind of overall construction drawing of implementing divisible multiply-accumulator.Wherein 11 is multiply-accumulators.12 and 13 is input ports of 32, input multiplier and multiplicand; Under 32 mode of operations, be one 32 input; Under 16 mode of operations, be combined into by two 16 figure places; Under 8 mode of operations, be combined into by four 8 figure places.14 is cumulative number input ports of one 32; Under 32 mode of operations, be one 32 input; Under 16 mode of operations, be combined into by two 16 figure places; Under 8 mode of operations, be combined into by four 8 figure places.The 15th, the mode control signal input port, the control multiply-accumulator is to be operated in 32 bit patterns, 16 bit patterns or 8 bit patterns.16 is output ports as a result of 80, under 32 mode of operations, exports one 80 multiplication accumulation result; Under 16 mode of operations, export two 40 multiplication accumulation results; Under 8 mode of operations, export four 20 multiplication accumulation result.
See also Fig. 2, Fig. 2 is the internal frame diagram of parallel alienable multiply-accumulator.Constitute by 21, one partial products of a partial product generation unit divisible accumulator elements 23 of 22, one of arrays that add up.Wherein 21 is partial product generation units, produces partial product, and this partial product generation unit is different with traditional partial product generation unit.Partial product generation unit 21 produces corresponding partial product under different mode of operations.The 22nd, the partial product array that adds up, its input is the output of partial product generation unit.21 partial products that generate are added up, can adopt compression of WALLCE tree or alternate manner to realize.The 23rd, divisible accumulator element, its input is the add up output of array 22 of partial product.24 and 25 to be 32 be the input of multiplier and multiplicand.The 26th, mode control signal.The 27th, input adds up.The 28th, the multiplication accumulation result.
Fig. 3 is the partial product synoptic diagram of parallel alienable multiply-accumulator.It is the output of partial product generation unit 21.Be operated under 32 bit patterns, all partial products are the values of deciphering out according to input 32 figure places.Be operated under 16 bit patterns, the partial product generation unit is not filling part 31 zero setting in the synoptic diagram, and filling part 32,33 generates two 16 * 16 partial product.Be operated under 8 bit patterns, except blue filling part 32, the partial product generation unit is other parts 31 and 33 zero setting.Partial product generates decoding can adopt BOOTH decoding or other any interpretation method.
Fig. 4 is the partial product generation unit structural drawing of the parallel alienable multiply-accumulator of the present invention, and it is to be made of 32 sub-generation units 41.Cellular construction is seen Fig. 5.
Fig. 5 is the inner subelement of the partial product generation unit of parallel alienable multiply-accumulator.It is made of a gate 52 with 51 and one two inputs of door.The 53rd, an input of multiplicand.The 5th, an input of multiplier.55 is 0 inputs.The 57th, select signal, generate by mode control signal.The 58th, partial product.Long-pending by 57 control sections is XiYj or zero.
Fig. 6 is the parallel divisible totalizer block diagram of alienable multiply-accumulator.It is made of 4 20 totalizer 61.Under 8,16 and 32 mode of operations, can obtain four 20 accumulation result respectively, two 40 accumulation result and one 80 s' accumulation result.Totalizer can be carry look ahead totalizer or other totalizer.
Fig. 7 is the parallel alienable multiply-accumulator 20 bit accumulator structural drawing of the present invention.It is made of 19 unit of full adder 71 and a divisible unit of full adder 72.Divisible unit of full adder structure is seen Fig. 8.
Fig. 8 is the parallel divisible accumulator element of alienable multiply-accumulator.It is made of a full adder 81 and a multi-selection device 82.83, the 84th, the input of full adder.85,86 be respectively full adder output with position and carry digit.The 87th, the carry digit of upper level full adder input, the 88th, zero input.The 810th, mode select signal.This unit is placed on first unit of 20 bit accumulators, can realize controlling the effect of carry chain by it, realizes the divisible function that adds up.
Claims (3)
1, a kind of alienable multiply-accumulator of parallel processing is characterized in that, comprising:
A partial product generation unit, partial product array that adds up, a divisible totalizer constitutes; Wherein the output terminal of partial product generation unit is connected in the add up input end of array of partial product; The add up output terminal of array of partial product is connected in the input end of divisible totalizer; Utilize this structure to realize 32,16 and 8 multiplication accumulating operations.
2, the alienable multiply-accumulator of parallel processing according to claim 1, it is characterized in that, wherein alienable partial product generation unit is to be made of 32 sub-generation units, its neutron generation unit is made of a gate with door and one two input, wherein an input end of the output termination of this gate and door; Alienable partial product generation unit produces 32 multiplyings, two 16 multiplyings, the partial product of four 8 multiplyings according to different mode control signals.
3, the alienable multiply-accumulator of parallel processing according to claim 1, it is characterized in that, wherein alienable accumulator element is made of 4 20 totalizer, 20 bit accumulators are made of 19 unit of full adder and a divisible unit of full adder, unit of full adder is made of a full adder and a multi-selection device, the input end of the output termination full adder of this multi-selection device; Alienable multiply-accumulator can be operated in 32 under mode signal control, and 16,8 mode of operations.Finish 80 accumulating operations respectively one time, twice 40 accumulating operations and four 20 accumulating operations.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 03153649 CN1584821A (en) | 2003-08-19 | 2003-08-19 | Cutting multiplying accumulating unit with parallel processing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 03153649 CN1584821A (en) | 2003-08-19 | 2003-08-19 | Cutting multiplying accumulating unit with parallel processing |
Publications (1)
Publication Number | Publication Date |
---|---|
CN1584821A true CN1584821A (en) | 2005-02-23 |
Family
ID=34597789
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 03153649 Pending CN1584821A (en) | 2003-08-19 | 2003-08-19 | Cutting multiplying accumulating unit with parallel processing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN1584821A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101729073B (en) * | 2008-10-10 | 2012-10-24 | 国民技术股份有限公司 | High-speed Sigma-Delta modulation method and modulator |
CN108229668A (en) * | 2017-09-29 | 2018-06-29 | 北京市商汤科技开发有限公司 | Operation implementation method, device and electronic equipment based on deep learning |
CN111666066A (en) * | 2017-04-28 | 2020-09-15 | 英特尔公司 | Instructions and logic to perform floating point and integer operations for machine learning |
-
2003
- 2003-08-19 CN CN 03153649 patent/CN1584821A/en active Pending
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101729073B (en) * | 2008-10-10 | 2012-10-24 | 国民技术股份有限公司 | High-speed Sigma-Delta modulation method and modulator |
CN111666066A (en) * | 2017-04-28 | 2020-09-15 | 英特尔公司 | Instructions and logic to perform floating point and integer operations for machine learning |
CN111666066B (en) * | 2017-04-28 | 2021-11-09 | 英特尔公司 | Method for accelerating machine learning operation, graphic processing unit and data processing system |
US11169799B2 (en) | 2017-04-28 | 2021-11-09 | Intel Corporation | Instructions and logic to perform floating-point and integer operations for machine learning |
US11720355B2 (en) | 2017-04-28 | 2023-08-08 | Intel Corporation | Instructions and logic to perform floating point and integer operations for machine learning |
US12039331B2 (en) | 2017-04-28 | 2024-07-16 | Intel Corporation | Instructions and logic to perform floating point and integer operations for machine learning |
CN108229668A (en) * | 2017-09-29 | 2018-06-29 | 北京市商汤科技开发有限公司 | Operation implementation method, device and electronic equipment based on deep learning |
CN108229668B (en) * | 2017-09-29 | 2020-07-07 | 北京市商汤科技开发有限公司 | Operation implementation method and device based on deep learning and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Jenkins et al. | The use of residue number systems in the design of finite impulse response digital filters | |
CN1230735C (en) | Processing multiply-accumulate operations in single cycle | |
CN100405361C (en) | Method and system for performing calculation operations and a device | |
CN1439126A (en) | Digital signal processor with coupled multiply-accumulate units | |
CN1120696A (en) | Multibit shifting apparatus, data processor using same, and method therefor | |
CN110058840A (en) | A kind of low-consumption multiplier based on 4-Booth coding | |
He et al. | A new redundant binary booth encoding for fast $2^{n} $-bit multiplier design | |
CN101625634A (en) | Reconfigurable multiplier | |
CN208190613U (en) | A kind of fractional order integrator realized based on FPGA | |
CN106775577B (en) | A kind of design method of the non-precision redundant manipulators multiplier of high-performance | |
CN102360281B (en) | Multifunctional fixed-point media access control (MAC) operation device for microprocessor | |
CN1584821A (en) | Cutting multiplying accumulating unit with parallel processing | |
CN109388373A (en) | Multiplier-divider for low-power consumption kernel | |
CN107423026A (en) | The implementation method and device that a kind of sin cos functionses calculate | |
CN103412737A (en) | Base 4-Booth coding method, door circuit and assembly line large number multiplying unit | |
CN107092462B (en) | 64-bit asynchronous multiplier based on FPGA | |
CN1448871A (en) | Design method of built-in parallel two-dimensional discrete wavelet conversion VLSI structure | |
CN1203399C (en) | Arithmetic unit and method of selectively delaying a multiplication result | |
CN100405289C (en) | Floating-point multiplicator and method of compatible double-prepcision and double-single precision computing | |
Mahitha et al. | A low power signed redundant binary vedic multiplier | |
Merchant et al. | Efficient realization of table look-up based double precision floating point arithmetic | |
CN1567178A (en) | Multiplier restructuring algorithm and circuit thereof | |
CN115001485A (en) | Direct digital frequency synthesizer based on Taylor polynomial approximation | |
CN1553310A (en) | Symmetric cutting algorithm for high-speed low loss multiplier and circuit strucure thereof | |
CN1122024A (en) | Multiplying operation method for optional word length and accuracy and multiplier thereby |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |