CN116450217A - Multifunctional fixed-point multiplication and multiply-accumulate operation device and method - Google Patents

Multifunctional fixed-point multiplication and multiply-accumulate operation device and method Download PDF

Info

Publication number
CN116450217A
CN116450217A CN202310383363.7A CN202310383363A CN116450217A CN 116450217 A CN116450217 A CN 116450217A CN 202310383363 A CN202310383363 A CN 202310383363A CN 116450217 A CN116450217 A CN 116450217A
Authority
CN
China
Prior art keywords
bit
module
bush
multiply
multiplication
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310383363.7A
Other languages
Chinese (zh)
Inventor
张余超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Benyuan Microelectronics Co ltd
Original Assignee
Qingdao Benyuan Microelectronics Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Benyuan Microelectronics Co ltd filed Critical Qingdao Benyuan Microelectronics Co ltd
Priority to CN202310383363.7A priority Critical patent/CN116450217A/en
Publication of CN116450217A publication Critical patent/CN116450217A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/3001Arithmetic instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30145Instruction analysis, e.g. decoding, instruction word fields
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30145Instruction analysis, e.g. decoding, instruction word fields
    • G06F9/30156Special purpose encoding of instructions, e.g. Gray coding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline, look ahead using instruction pipelines
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computing Systems (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention belongs to the field of computer data processing, and discloses a multifunctional fixed-point multiplication and multiply accumulation operation device and a method, wherein the operation device comprises an instruction decoding and data distribution module, a symbol expansion preprocessing module, a Bush encoding module, a Bush decoding module, a partial product digital distribution module, a Wallace tree module and an adder module which are sequentially arranged; the device realizes that the basic 2 n-bit multiplication Wallace compression tree logic can be completely multiplexed on the Wallace compression tree circuits of the two multiplication accumulators, thereby greatly saving transistor resources, reducing the area and reducing the power consumption; the clock cycle length requirement of the high-performance processor can be met, the chip design area is small, the power consumption is low, and the time sequence path is excellent. The invention has overflow protection processing, is very suitable for being applied to embedded scenes with higher requirements on power consumption, area and frequency, and meets the requirements of digital signal processing and other application scenes on overflow protection processing of multiply-accumulate operation.

Description

Multifunctional fixed-point multiplication and multiply-accumulate operation device and method
Technical Field
The invention belongs to the field of computer data processing, and particularly relates to a multifunctional fixed-point multiplication and multiply-accumulate operation device and method.
Background
Today, modern processors such as CPUs and DSPs face task scenarios of large-scale complex computation such as image processing, digital signal processing, scientific computation, and the like, involving operations such as convolution, fourier transform, and the like, which can be converted into basic operations such as multiplication, multiply-accumulate, and the like. Therefore, multipliers and multiply-accumulators are extremely important in the design of modern processors, directly related to the performance, power consumption, and cost of the processor.
Common multiplication operations include three multiplication operation types, such as signed multiplication, unsigned multiplication, signed multiplication and unsigned multiplication, and common multiply-accumulate operations include four multiplication operation types, such as signed multiply-accumulate, signed multiply-accumulate-subtract, unsigned multiply-accumulate-subtract. In the design of the arithmetic unit, it is desirable to integrate these multiplication operations and multiply-accumulate operations into one arithmetic unit to reduce the chip area and reduce the chip power consumption, and there are currently mainly four n-bit multipliers (or n-bit multiply-accumulators) that form a 2 n-bit multiplier from the algorithm level and a multiply-accumulate multiplexing design scheme, but there are also many multipliers and multiply-accumulators that are designed separately.
In modern processors, the traditional bosch-waling tree multiplication algorithm remains the primary implementation of high performance multipliers and multiply-accumulators. Although the multiplication implementation method of the Bush-Wallace tree is quite simple and has high calculation efficiency, the realization logic and connection of the multiplier and the multiply accumulator are complex, the consumption of chip area resources is relatively large, if the two operation components of the multiplier and the multiply accumulator are independently implemented, more transistor resources are consumed, the defects of large area and high power consumption exist, and the low-power consumption design requirement of a modern processor is not met.
Considering that in processors, especially scalar processors, the multiplication operation and the multiply-accumulate operation do not have the requirement of simultaneous computation, some arithmetic unit designers implement a 2 n-bit multiplier through four n-bit multiply-accumulators (or n-bit multipliers), and can support 2 n-bit multiplication operation and n-bit multiply-accumulate operation, so that the logical multiplexing of the 2 n-bit multiplier and the n-bit multiply-accumulator is implemented on an algorithm level. As shown in FIG. 1, the design scheme of the logic multiplexing is to perform Wallace tree compression twice to reduce the number of partial products, which causes the prolongation of the time sequence path, and in the scene of higher requirement of the processor frequency, the logic multiplexing is easy to become a critical path to cause time sequence violation, which is unfavorable for the design of the high-performance processor.
Therefore, limited by the requirements of the processor design frequency, in the design of the multiplier and multiply-accumulator logic multiplexing, the time-series path delays of the multiplier and multiply-accumulator are as small as possible to meet the length requirements of the clock cycle, which presents a high challenge to the design of the multiplier and multiply-accumulator operational components.
Disclosure of Invention
In order to solve the technical problems, the invention provides a multifunctional fixed-point multiplication and multiply-accumulate operation device and a method, which can greatly save transistor resources, reduce area and reduce power consumption; meanwhile, the clock cycle length requirement of the high-performance processor can be met, the requirements of small chip design area, low power consumption and excellent time sequence path are met, the design requirement of overflow protection processing is considered, and the clock cycle length requirement of the high-performance processor is combined with the design requirement of the m-bit overflow protection processing, so that the clock cycle length control method is very suitable for being applied to embedded scenes with higher requirements on power consumption, area and frequency.
In order to achieve the above purpose, the technical scheme of the invention is as follows:
a multifunctional fixed-point multiplication and multiply-accumulate operation device comprises an instruction decoding and data distribution module, a symbol expansion preprocessing module, a Bush encoding module, a Bush decoding module, a partial product digital distribution module, a Wallace tree module and an adder module which are sequentially arranged;
the instruction decoding and data distribution module is responsible for decoding the instruction fetched by the pipeline instruction fetching stage and distributing the fetched operand to different functional units for execution;
the symbol extension preprocessing module performs symbol extension preprocessing on operands according to operation types, performs symbol extension on signed numbers, and performs zero extension on unsigned numbers as positive numbers; the symbol extension preprocessing module comprises an n-bit multiplication accumulation symbol extension preprocessing module and a 2 n-bit multiplication symbol extension preprocessing module;
the Bush coding module carries out Bush coding on the preprocessed multiplicand to obtain a Bush coding result of multiplicand multiple, and the Bush coding module comprises two n-bit multiply-accumulate Bush coding modules and a 2 n-bit multiply Bush coding module;
the Bush decoding module performs Bush decoding on the preprocessed multiplier according to a Bush coding table, and combines the Bush coding result to obtain a plurality of partial products, wherein the Bush decoding module comprises two n-bit multiply-accumulate Bush decoding modules and a 2 n-bit multiply Bush decoding module;
the input ends of the two n-bit multiply-accumulate Bush coding modules are connected with an n-bit multiply-accumulate symbol expansion preprocessing module, and the output ends of the two n-bit multiply-accumulate Bush coding modules are respectively connected with two n-bit multiply-accumulate Bush decoding modules; the input end of the 2 n-bit multiplication Bush coding module is connected with the 2 n-bit multiplication symbol expansion preprocessing module, and the output end is connected with the 2 n-bit multiplication Bush decoding module;
the partial product digital distribution module distributes different partial product inputs to the Wallace tree module according to multiplication operation or multiply-accumulate operation;
the Wallace tree module is a multiplexed 2 n-bit multiplication and n-bit double-multiplication accumulation Wallace tree module with m-bit overflow protection processing;
the adder module comprises two m-bit carry-ahead adder modules and a 2 n-bit carry-ahead adder module, wherein when the operation types are multiply-accumulate operation and multiply operation, two operands obtained by partial products compressed by the Wallace tree module are added to obtain two m-bit multiply-accumulate result outputs and one 2 n-bit multiply result output respectively.
A multi-functional fixed-point multiplication and multiply-accumulate operation method adopts a multi-functional fixed-point multiplication and multiply-accumulate operation device as described above, including multiply-accumulate operation and multiply operation;
for multiply-accumulate operations, the following process is included:
(1) According to four multiplication accumulation types of signed multiplication accumulation, signed multiplication accumulation subtraction, unsigned multiplication accumulation and unsigned multiplication accumulation subtraction, carrying out symbol expansion pretreatment on an n-bit multiplicand and an n-bit multiplier transmitted by an instruction decoding and data distribution module through an n-bit multiplication accumulation symbol expansion pretreatment module, carrying out symbol expansion pretreatment on signed numbers, carrying out zero expansion pretreatment on unsigned numbers as positive numbers, expanding the multiplicand to (n+1) bits, expanding the multiplier to (n+2) bits, if the multiplication accumulation subtraction operation is carried out, carrying out pretreatment on m-bit accumulation numbers, and setting the carry input of a Wallace tree compression circuit to be 1;
(2) The n-bit multiply-accumulate Bush coding module performs Bush coding on the preprocessed multiplicand, wherein the unsigned multiplicand defaults to a positive number; the n-bit multiply-accumulate Bush decoding module performs Bush decoding on the preprocessed multiplier according to a Bush coding table, and combines the Bush coding result to obtain a plurality of partial products;
(3) A plurality of partial products are distributed through a partial product digital distribution module and are input into a multiplexing 2 n-bit multiplication and n-bit double-multiplication accumulation Wallace tree module with m-bit overflow protection processing, the Wallace tree module is formed by a full adder and a half adder, the full adder and the half adder at different weight positions of each stage of Wallace tree compression circuit are alternately used to reduce the number of the partial products, and two partial product compression results are obtained through the operation processing of a plurality of stages of Wallace tree compression circuits;
(4) Finally, inputting the two operands obtained by compression of the Wallace tree compression circuit into two m-bit carry-ahead adder modules to obtain m-bit multiply-accumulate result output;
for multiplication operations, the following procedure is included:
(1) According to three 2n bit multiplication types of signed number multiplied by signed number, unsigned number multiplied by unsigned number and signed number multiplied by unsigned number, 2n bit multiplicand and 2n bit multiplier transmitted by instruction decoding and data distribution module are subjected to symbol expansion pretreatment by 2n bit multiplication symbol expansion pretreatment module, signed number is subjected to symbol expansion pretreatment, unsigned number is used as positive number to be subjected to zero expansion pretreatment, multiplicand is expanded to (2n+1) bit, and multiplier is expanded to (2n+2) bit;
(2) The 2 n-bit multiplication Bush coding module carries out Bush coding on the preprocessed multiplicand to obtain a Bush coding result of { -2, -1,0,1,2} multiple of the multiplicand, wherein the unsigned multiplicand defaults to a positive number; the 2 n-bit multiplication Bush decoding module carries out Bush decoding on the preprocessed multiplier according to a Bush coding table, and (n+1) partial products are obtained by combining the Bush coding results;
(3) The (n+1) partial products are distributed through a partial product digital distribution module and are input into a multiplexing 2 n-bit multiplication and n-bit double-multiply accumulation Wallace tree module with m-bit overflow protection processing, the Wallace tree module is formed by a full adder and a half adder, the full adder and the half adder at different weight positions of each stage of Wallace tree compression circuit are alternately used to reduce the number of the partial products, and two partial product compression results are obtained through the operation processing of a plurality of stages of Wallace tree compression circuits;
(4) And finally, inputting the two operands obtained by compression of the Wallace tree compression circuit into a 2 n-bit carry-ahead adder module to obtain a 2 n-bit multiplication result and outputting the result.
Through the technical scheme, the multifunctional fixed-point multiplication and multiply-accumulate operation device and method provided by the invention have the following beneficial effects:
1. the device of the invention can realize three multiplication operation types of signed multiplication, unsigned multiplication, signed unsigned multiplication and the like of 2 n-bit fixed point data and four multiplication operation types of signed multiplication accumulation, signed multiplication accumulation subtraction, unsigned multiplication accumulation subtraction and the like of n-bit fixed point data with m-bit overflow protection processing.
2. The device of the invention realizes that a basic 2 n-bit multiplication Wallace compression tree logic can be completely multiplexed on Wallace compression circuits of two multiplication accumulators, greatly saves transistor resources, reduces area and power consumption, and only increases basic data gating and carry cutting logic with little influence on the length of a time path.
3. The time sequence path delay of the device is only equivalent to the time sequence path of 2 n-bit multiplication, can meet the requirement of the clock cycle length of the high-performance processor, and meets the requirements of small chip design area, low power consumption and excellent time sequence path.
4. The invention considers the design requirement of overflow protection processing, has m-bit overflow protection processing, is very suitable for being applied to embedded scenes with higher requirements on power consumption, area and frequency, and meets the requirements of digital signal processing and other application scenes on multiply-accumulate operation overflow protection processing.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
Fig. 1 is a schematic diagram of a multi-functional fixed-point multiply and multiply-accumulate computing device disclosed in the prior art.
Fig. 2 is a schematic diagram of a multi-functional fixed-point multiply and multiply-accumulate computing device according to an embodiment of the present invention.
FIG. 3 is a partial product Wallace tree compressed lattice diagram of the prior art; (a) is a partial product Wallace tree compression bitmap of a multiplication accumulator I, (b) is a partial product Wallace tree compression bitmap of a multiplication accumulator II, and (c) is 9 partial product Wallace tree compression bitmap of a 16-bit multiplication.
FIG. 4 is a partial product Wallace tree compression lattice diagram of a 16-bit multiplier and two 8-bit multiply-accumulator multiplexes employed in an embodiment of the present invention; (a) is a 2 n-bit multiplication and n-bit multiplication accumulation Wallace tree compression bitmap, (b) is a partial product digital distribution map of the 2 n-bit multiplication and n-bit multiplication accumulation Wallace tree compression bitmap, (c) is a first-stage Wallace tree compression bitmap, (d) is a second-stage Wallace tree compression bitmap, (e) is a third-stage Wallace tree compression bitmap, and (f) is a fourth-stage Wallace tree compression bitmap.
FIG. 5 is a partial product Wallace tree compression bitmap multiplexed by an 8-bit signed multiply accumulator 1 and a 16-bit multiplier with 12-bit overflow protection processing employed by a particular embodiment of the present invention; (a) is a first-stage Wallace tree compression bitmap, (b) is a second-stage Wallace tree compression bitmap, (c) is a third-stage Wallace tree compression bitmap, and (d) is a fourth-stage Wallace tree compression bitmap.
FIG. 6 is a partial product Wallace tree compression bitmap multiplexed by an 8-bit signed multiply accumulator 2 and a 16-bit multiplier with 12-bit overflow protection processing employed by a particular embodiment of the present invention; (a) is a first-stage Wallace tree compression bitmap, (b) is a second-stage Wallace tree compression bitmap, (c) is a third-stage Wallace tree compression bitmap, and (d) is a fourth-stage Wallace tree compression bitmap.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
The invention provides a multifunctional fixed-point multiplication and multiply-accumulate operation device, which comprises an instruction decoding and data distribution module, a symbol expansion preprocessing module, a Bush encoding module, a Bush decoding module, a partial product digital distribution module, a Wallace tree module and an adder module which are sequentially arranged as shown in figure 2.
1. Instruction decoding and data distribution module
The instruction decoding and data distribution module is responsible for decoding the instruction fetched by the instruction fetching stage of the pipeline and distributing the fetched operand to different functional units for execution.
2. Symbol extension preprocessing module
The sign extension preprocessing module carries out sign extension preprocessing on operands according to operation types, carries out sign extension on signed numbers, and carries out zero extension on unsigned numbers as positive numbers; the symbol extension preprocessing module comprises an n-bit multiplication accumulation symbol extension preprocessing module and a 2 n-bit multiplication symbol extension preprocessing module.
3. Bush coding module
The Bush coding module carries out Bush coding on the preprocessed multiplicand to obtain a Bush coding result of multiplicand multiple, and the Bush coding module comprises two n-bit multiply-accumulate Bush coding modules and a 2 n-bit multiply Bush coding module.
4. Bush decoding module
The Bush decoding module performs Bush decoding on the preprocessed multiplier according to a Bush coding table shown in table 1, and combines the Bush coding result to obtain a plurality of partial products, wherein the Bush decoding module comprises two n-bit multiply-accumulate Bush decoding modules and a 2 n-bit multiply Bush decoding module.
Table 1 Bush coding Table
The input ends of the two n-bit multiply-accumulate Bush coding modules are connected with an n-bit multiply-accumulate symbol expansion preprocessing module, and the output ends of the two n-bit multiply-accumulate Bush coding modules are respectively connected with two n-bit multiply-accumulate Bush decoding modules; the input end of the 2 n-bit multiplication Bush coding module is connected with the 2 n-bit multiplication symbol expansion preprocessing module, and the output end is connected with the 2 n-bit multiplication Bush decoding module.
5. Partial product digital distribution module
The partial product digital distribution module distributes different partial product inputs to the Wallace tree module according to multiplication operation or multiply-accumulate operation.
6. Wallace tree module
The Wallace tree module is a multiplexed 2 n-bit multiplication and n-bit double-multiply-accumulate Wallace tree module with m-bit overflow protection processing.
Fig. 3 (a) and (b) each show a compression lattice diagram of a wallace tree with 4 partial products of two signed multiply-accumulate operations and a 12-bit multiply-accumulate number, and fig. 3 (c) shows a compression lattice diagram of a wallace tree with 9 partial products of 16-bit multiplication. The multiplexed 2 n-bit multiply and n-bit double multiply accumulate Wallace tree module with m-bit overflow protection processing implements Wallace tree logic of an n-bit signed multiply accumulator, an n-bit signed multiplier, and a 2 n-bit multiplier as shown in FIG. 3 in a 2 n-bit multiply and n-bit multiply accumulate Wallace tree logic as shown in FIG. 4 (b), and the compressed bitmap may exhibit this multiplexing relationship.
The multiplexed 2 n-bit multiply and n-bit double multiply accumulate Wallace tree module with m-bit overflow protection processing is a compression circuit that can process partial products of two n-bit multiply accumulate operations with m-bit overflow protection processing and one 2 n-bit multiply operation. The compression of partial products of two n-bit multiply-accumulate and one 2 n-bit multiply operation with m-bit overflow protection processing can be represented by the Wallace tree compression bitmap shown in FIGS. 4, 5, and 6.
The Wallace tree compressed lattice diagram represented by (a) in FIG. 4 represents the distribution of 2 n-bit multiplication partial products, each row lattice represents one partial product, different columns represent different weight bits, and the weights from right column to left column increase in order, wherein the blank circles represent compression points to be added for multiplexing design.
As shown in fig. 4 (b), two n-bit multiply-accumulate wallace tree compression lattice may be distributed on the wallace tree compression lattice diagram of the 2 n-bit multiply operation, thereby indicating that partial product compression of the two n-bit multiply-accumulate operations may be processed by using the wallace tree compression circuit for processing 2 n-bit multiply partial products, and left and right sides of the vertical dotted line respectively indicate partial product allocations of the two n-bit multiply-accumulate operations, i.e., the multiply-accumulator one and the multiply-accumulator two represented by (a) and (b) in fig. 3, respectively.
In fig. 4, (c) (d) (e) (f) shows a process of decompressing a partial product by a compressing circuit composed of a full adder and a half adder.
Fig. 5 and 6 show the compression process of the wallace tree processing the partial product of the multiply-accumulate operation, in which the partial product lattice on the right side of the dotted line of fig. 5 corresponds to the compression lattice on the right side of the vertical dotted line of fig. 4, and the partial product lattice on the right side of the dotted line of fig. 6 corresponds to the compression lattice on the left side of the vertical dotted line of fig. 4.
In order to increase the overflow protection processing, the partial product lattice of the overflow protection processing shown by the dotted line box at the left side of the dotted line in fig. 5 and 6 is increased, and the compression lattice of the overflow protection processing shown by the dotted line box at the left side of the dotted line and the compression lattice at the right side of the dotted line are compressed synchronously in the compression process, so that the function of the overflow protection processing is realized.
7. Adder module
The adder module comprises two m-bit carry-ahead adder modules and a 2 n-bit carry-ahead adder module, wherein when the operation types are multiply-accumulate operation and multiply operation, two operands obtained by partial product compression of the Wallace tree module are added to obtain two m-bit multiply-accumulate result outputs and one 2 n-bit multiply result output respectively.
A multi-functional fixed-point multiplication and multiplication accumulation operation method adopts the multi-functional fixed-point multiplication and multiplication accumulation operation device, and comprises multiplication accumulation operation and multiplication operation.
For multiply-accumulate operations, the following process is included:
(1) According to four multiplication accumulation types of signed multiplication accumulation, signed multiplication accumulation subtraction, unsigned multiplication accumulation and unsigned multiplication accumulation subtraction, carrying out symbol expansion pretreatment on an n-bit multiplicand and an n-bit multiplier transmitted by an instruction decoding and data distribution module through an n-bit multiplication accumulation symbol expansion pretreatment module, carrying out symbol expansion pretreatment on signed numbers, carrying out zero expansion pretreatment on unsigned numbers as positive numbers, expanding the multiplicand to (n+1) bits, expanding the multiplier to (n+2) bits, if the multiplication accumulation subtraction operation is carried out, carrying out pretreatment on m-bit accumulation numbers, and setting the carry input of a Wallace tree compression circuit to be 1;
(2) The n-bit multiply-accumulate Bush coding module performs Bush coding on the preprocessed multiplicand, wherein the unsigned multiplicand defaults to a positive number; the n-bit multiply-accumulate Bush decoding module performs Bush decoding on the preprocessed multiplier according to a Bush coding table, and combines the Bush coding result to obtain a plurality of partial products;
(3) A plurality of partial products are distributed through a partial product digital distribution module and are input into a multiplexing 2 n-bit multiplication and n-bit double-multiplication accumulation Wallace tree module with m-bit overflow protection processing, the Wallace tree module is formed by a full adder and a half adder, the full adder and the half adder at different weight positions of each stage of Wallace tree compression circuit are alternately used to reduce the number of the partial products, and two partial product compression results are obtained through the operation processing of a plurality of stages of Wallace tree compression circuits;
(4) And finally, inputting the two operands obtained by compression of the Wallace tree compression circuit into two m-bit carry-ahead adder modules to obtain m-bit multiply-accumulate result output.
The specific embodiment shown in fig. 4 has a signed multiply-accumulate operation, combines the bosh encoding result of the 8-bit multiply-accumulate bosh encoding module to obtain 4 partial products, distributes the accumulated numbers of 4 partial products and 12 bits through the partial product digital distribution module, inputs the accumulated numbers into the multiplexed 16-bit multiply and 8-bit double multiply-accumulate Wallace tree module with 12-bit overflow protection processing, the multiplexed 16-bit multiply and 8-bit double multiply-accumulate Wallace tree module with 12-bit overflow protection processing is mainly constructed by a full adder and a half adder, the two partial product compression results are obtained through the operation processing of a plurality of stages of Wallace tree compression circuits, and finally the two operands obtained through compression of the Wallace tree compression circuit are input into the 12-bit carry-ahead adder module to obtain the final 12-bit multiply result. The fusion in the invention realizes a 16-bit multiplier and two 8-bit multiply accumulators with 12-bit overflow protection processing, wherein the hardware processing flows of the two 8-bit multiply accumulators with 12-bit overflow protection processing are identical.
FIG. 4 shows a Wallace tree compression bitmap of a 16-bit multiply and 8-bit multiply-accumulate with 12-bit overflow protection process, which essentially comprises a partial product of a digital assignment and a four-level Wallace tree compression bitmap of a 16-bit multiply and 8-bit multiply-accumulate with 12-bit overflow protection process, according to an embodiment of the present invention.
Wherein (a) in FIG. 4 is a basic configuration of a 16-bit multiplicative Wallace tree compression bitmap, wherein E is a sign bit of a current partial product obtained after Bush encoding and Bush decoding,the method comprises the steps of carrying out Bush coding and Bush decoding on opposite numbers of sign bits of a current partial product, carrying out Bush coding and Bush decoding on S, carrying out Bush coding on S, carrying out Bush decoding on S, carrying out N-bit zero padding at 0 th bit, carrying out N-bit zero padding at 16 th bit and 17 th bit, carrying out N-bit overlapping, and carrying out N-bit overlapping according to the Bush coding result of a multiplicand and the Bush decoding result of the multiplier.
In fig. 4, (b) is a rearrangement of the partial product digital assignment of the compression bitmap of the wallace tree by the two 8-bit signed multiply-accumulate wallace tree using 16-bit multiplication, wherein the partial product digital of the multiply-accumulate first is assigned to the solid circle point on the right of the dotted line by bits, the lower 8 bits of the pre-processed 12-bit multiply-accumulate number 1 are assigned to the dotted circle point on the right of the dotted line, the partial product digital of the multiply-accumulate second is assigned to the solid circle point on the left of the dotted line by bits, and the lower 8 bits of the pre-processed 12-bit multiply-accumulate number 2 are assigned to the dotted circle point on the left of the dotted line. It can be seen from the figure that the 8-bit multiply accumulator one and the 8-bit multiply accumulator two can completely multiplex the Wallace tree compression bitmap of the 16-bit multiplier, with only a few compression points added.
In addition, in the Wallace tree compression circuit actually implemented, the carry of the low weight of the upper stage of the Wallace tree compression circuit of a 16-bit multiplier is output to the input of the compression circuit of the lower stage, which causes coupling between the low weight circuit and the high weight circuit when the multiplier and the two multiply accumulators are multiplexed, and the AND gate can be used for controlling the coupling between the high weight and the low weight. When the double multiply-accumulate function is realized, one input of the AND gate is set to zero to isolate carry between the high-weight circuit and the low-weight circuit, so that Wallace tree compression circuits of the two multiply-accumulators are independent and do not affect each other, and therefore, the two 8-bit multiply-accumulators can independently carry out multiply-accumulate operation by using the Wallace tree compression circuits of the multipliers without interference, the correctness of the two multiply-accumulate operations is ensured, and the influence on the length of a timing path is small.
Fig. 5 and 6 illustrate 12-bit overflow protection processing Wallace tree compression lattice diagrams for 8-bit multiply-accumulator one and 8-bit multiply-accumulator two in an exemplary embodiment. The compression points to the right of the vertical dashed lines in fig. 5 and 6 come from the right and left, respectively, of the vertical dashed line in fig. 4 (c). As shown in fig. 5 and 6, the most significant carry of the marshmallow tree compression circuit of the multiplexing 16-bit multiplication of the 8-bit multiply accumulator one and the 8-bit multiply accumulator two is connected to the overflow protection processing unit of the compression point indicated by the broken line box in the processing diagram, and in the actual implementation circuit, the overflow protection processing unit only comprises the compression circuit of the full adder and the half adder for processing the partial product digit in the broken line box. During operation, the overflow protection processing units of the first 8-bit multiply accumulator and the second 8-bit multiply accumulator are connected to the Wallace tree compression circuit for multiplexing the two 8-bit multiply accumulators and the 16-bit multiply to compress a plurality of partial product operands, so that 4 partial product and one multiply-accumulate number can be compressed to two 12-bit compressed operand results, and finally the multiply-accumulate results with 12-bit overflow protection processing of the first and second final multiply accumulators can be obtained through the addition of the two 12-bit carry-ahead adders, thereby meeting the requirements of the digital signal processing and other application scenes on the overflow protection processing of the multiply-accumulate operation.
For multiplication operations, the following procedure is included:
(1) According to three 2n bit multiplication types of signed number multiplied by signed number, unsigned number multiplied by unsigned number and signed number multiplied by unsigned number, 2n bit multiplicand and 2n bit multiplier transmitted by instruction decoding and data distribution module are subjected to symbol expansion pretreatment by 2n bit multiplication symbol expansion pretreatment module, signed number is subjected to symbol expansion pretreatment, unsigned number is used as positive number to be subjected to zero expansion pretreatment, multiplicand is expanded to (2n+1) bit, and multiplier is expanded to (2n+2) bit;
(2) The 2 n-bit multiplication Bush coding module carries out Bush coding on the preprocessed multiplicand to obtain a Bush coding result of { -2, -1,0,1,2} multiple of the multiplicand, wherein the unsigned multiplicand defaults to a positive number; the 2 n-bit multiplication Bush decoding module carries out Bush decoding on the preprocessed multiplier according to a Bush coding table, and (n+1) partial products are obtained by combining the Bush coding results;
(3) The (n+1) partial products are distributed through a partial product digital distribution module and are input into a multiplexing 2 n-bit multiplication and n-bit double-multiply accumulation Wallace tree module with m-bit overflow protection processing, the Wallace tree module is formed by a full adder and a half adder, the full adder and the half adder at different weight positions of each stage of Wallace tree compression circuit are alternately used to reduce the number of the partial products, and two partial product compression results are obtained through the operation processing of a plurality of stages of Wallace tree compression circuits;
(4) And finally, inputting the two operands obtained by compression of the Wallace tree compression circuit into a 2 n-bit carry-ahead adder module to obtain a 2 n-bit multiplication result and outputting the result.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (2)

1. The multifunctional fixed-point multiplication and multiply-accumulate operation device is characterized by comprising an instruction decoding and data distribution module, a symbol expansion preprocessing module, a Bush encoding module, a Bush decoding module, a partial product digital distribution module, a Wallace tree module and an adder module which are sequentially arranged;
the instruction decoding and data distribution module is responsible for decoding the instruction fetched by the pipeline instruction fetching stage and distributing the fetched operand to different functional units for execution;
the symbol extension preprocessing module performs symbol extension preprocessing on operands according to operation types, performs symbol extension on signed numbers, and performs zero extension on unsigned numbers as positive numbers; the symbol extension preprocessing module comprises an n-bit multiplication accumulation symbol extension preprocessing module and a 2 n-bit multiplication symbol extension preprocessing module;
the Bush coding module carries out Bush coding on the preprocessed multiplicand to obtain a Bush coding result of multiplicand multiple, and the Bush coding module comprises two n-bit multiply-accumulate Bush coding modules and a 2 n-bit multiply Bush coding module;
the Bush decoding module performs Bush decoding on the preprocessed multiplier according to a Bush coding table, and combines the Bush coding result to obtain a plurality of partial products, wherein the Bush decoding module comprises two n-bit multiply-accumulate Bush decoding modules and a 2 n-bit multiply Bush decoding module;
the input ends of the two n-bit multiply-accumulate Bush coding modules are connected with an n-bit multiply-accumulate symbol expansion preprocessing module, and the output ends of the two n-bit multiply-accumulate Bush coding modules are respectively connected with two n-bit multiply-accumulate Bush decoding modules; the input end of the 2 n-bit multiplication Bush coding module is connected with the 2 n-bit multiplication symbol expansion preprocessing module, and the output end is connected with the 2 n-bit multiplication Bush decoding module;
the partial product digital distribution module distributes different partial product inputs to the Wallace tree module according to multiplication operation or multiply-accumulate operation;
the Wallace tree module is a multiplexed 2 n-bit multiplication and n-bit double-multiplication accumulation Wallace tree module with m-bit overflow protection processing;
the adder module comprises two m-bit carry-ahead adder modules and a 2 n-bit carry-ahead adder module, wherein when the operation types are multiply-accumulate operation and multiply operation, two operands obtained by partial products compressed by the Wallace tree module are added to obtain two m-bit multiply-accumulate result outputs and one 2 n-bit multiply result output respectively.
2. A method for performing multiply-accumulate operation with a multi-function fixed-point multiply-accumulate operation device according to claim 1, comprising multiply-accumulate operation and multiply operation;
for multiply-accumulate operations, the following process is included:
(1) According to four multiplication accumulation types of signed multiplication accumulation, signed multiplication accumulation subtraction, unsigned multiplication accumulation and unsigned multiplication accumulation subtraction, carrying out symbol expansion pretreatment on an n-bit multiplicand and an n-bit multiplier transmitted by an instruction decoding and data distribution module through an n-bit multiplication accumulation symbol expansion pretreatment module, carrying out symbol expansion pretreatment on signed numbers, carrying out zero expansion pretreatment on unsigned numbers as positive numbers, expanding the multiplicand to (n+1) bits, expanding the multiplier to (n+2) bits, if the multiplication accumulation subtraction operation is carried out, carrying out pretreatment on m-bit accumulation numbers, and setting the carry input of a Wallace tree compression circuit to be 1;
(2) The n-bit multiply-accumulate Bush coding module performs Bush coding on the preprocessed multiplicand, wherein the unsigned multiplicand defaults to a positive number; the n-bit multiply-accumulate Bush decoding module performs Bush decoding on the preprocessed multiplier according to a Bush coding table, and combines the Bush coding result to obtain a plurality of partial products;
(3) A plurality of partial products are distributed through a partial product digital distribution module and are input into a multiplexing 2 n-bit multiplication and n-bit double-multiplication accumulation Wallace tree module with m-bit overflow protection processing, the Wallace tree module is formed by a full adder and a half adder, the full adder and the half adder at different weight positions of each stage of Wallace tree compression circuit are alternately used to reduce the number of the partial products, and two partial product compression results are obtained through the operation processing of a plurality of stages of Wallace tree compression circuits;
(4) Finally, inputting the two operands obtained by compression of the Wallace tree compression circuit into two m-bit carry-ahead adder modules to obtain m-bit multiply-accumulate result output;
for multiplication operations, the following procedure is included:
(1) According to three 2n bit multiplication types of signed number multiplied by signed number, unsigned number multiplied by unsigned number and signed number multiplied by unsigned number, 2n bit multiplicand and 2n bit multiplier transmitted by instruction decoding and data distribution module are subjected to symbol expansion pretreatment by 2n bit multiplication symbol expansion pretreatment module, signed number is subjected to symbol expansion pretreatment, unsigned number is used as positive number to be subjected to zero expansion pretreatment, multiplicand is expanded to (2n+1) bit, and multiplier is expanded to (2n+2) bit;
(2) The 2 n-bit multiplication Bush coding module carries out Bush coding on the preprocessed multiplicand to obtain a Bush coding result of { -2, -1,0,1,2} multiple of the multiplicand, wherein the unsigned multiplicand defaults to a positive number; the 2 n-bit multiplication Bush decoding module carries out Bush decoding on the preprocessed multiplier according to a Bush coding table, and (n+1) partial products are obtained by combining the Bush coding results;
(3) The (n+1) partial products are distributed through a partial product digital distribution module and are input into a multiplexing 2 n-bit multiplication and n-bit double-multiply accumulation Wallace tree module with m-bit overflow protection processing, the Wallace tree module is formed by a full adder and a half adder, the full adder and the half adder at different weight positions of each stage of Wallace tree compression circuit are alternately used to reduce the number of the partial products, and two partial product compression results are obtained through the operation processing of a plurality of stages of Wallace tree compression circuits;
(4) And finally, inputting the two operands obtained by compression of the Wallace tree compression circuit into a 2 n-bit carry-ahead adder module to obtain a 2 n-bit multiplication result and outputting the result.
CN202310383363.7A 2023-04-11 2023-04-11 Multifunctional fixed-point multiplication and multiply-accumulate operation device and method Pending CN116450217A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310383363.7A CN116450217A (en) 2023-04-11 2023-04-11 Multifunctional fixed-point multiplication and multiply-accumulate operation device and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310383363.7A CN116450217A (en) 2023-04-11 2023-04-11 Multifunctional fixed-point multiplication and multiply-accumulate operation device and method

Publications (1)

Publication Number Publication Date
CN116450217A true CN116450217A (en) 2023-07-18

Family

ID=87125068

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310383363.7A Pending CN116450217A (en) 2023-04-11 2023-04-11 Multifunctional fixed-point multiplication and multiply-accumulate operation device and method

Country Status (1)

Country Link
CN (1) CN116450217A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116774966A (en) * 2023-08-22 2023-09-19 深圳比特微电子科技有限公司 Multiplier, multiply-accumulate circuit, operation circuit, processor and computing device
CN116931873A (en) * 2023-09-11 2023-10-24 安徽大学 Two-byte multiplication circuit, and multiplication circuit and chip with arbitrary bit width of 2-power

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116774966A (en) * 2023-08-22 2023-09-19 深圳比特微电子科技有限公司 Multiplier, multiply-accumulate circuit, operation circuit, processor and computing device
CN116774966B (en) * 2023-08-22 2023-12-08 深圳比特微电子科技有限公司 Multiplier, multiply-accumulate circuit, operation circuit, processor and computing device
CN116931873A (en) * 2023-09-11 2023-10-24 安徽大学 Two-byte multiplication circuit, and multiplication circuit and chip with arbitrary bit width of 2-power
CN116931873B (en) * 2023-09-11 2023-11-28 安徽大学 Two-byte multiplication circuit, and multiplication circuit and chip with arbitrary bit width of 2-power

Similar Documents

Publication Publication Date Title
CN116450217A (en) Multifunctional fixed-point multiplication and multiply-accumulate operation device and method
CN110163358B (en) Computing device and method
CN109753268B (en) Multi-granularity parallel operation multiplier
CN112540743B (en) Reconfigurable processor-oriented signed multiply accumulator and method
US11816448B2 (en) Compressing like-magnitude partial products in multiply accumulation
CN105183425B (en) A kind of fixation bit wide multiplier with high-precision low complex degree characteristic
CN110362293B (en) Multiplier, data processing method, chip and electronic equipment
CN110688086A (en) Reconfigurable integer-floating point adder
CN101840324B (en) 64-bit fixed and floating point multiplier unit supporting complex operation and subword parallelism
Thomas Design and simulation of radix-8 booth encoder multiplier for signed and unsigned numbers
Krishna et al. Design of wallace tree multiplier using compressors
CN116661733A (en) Multiplier and microprocessor supporting multiple precision
CN101110016A (en) Subword paralleling integer multiplying unit
Baba et al. Design and implementation of advanced modified booth encoding multiplier
CN209879493U (en) Multiplier and method for generating a digital signal
CN114691086A (en) High-performance approximate multiplier based on operand clipping and calculation method thereof
Bokade et al. CLA based 32-bit signed pipelined multiplier
Belyaev et al. A High-perfomance Multi-format SIMD Multiplier for Digital Signal Processors
CN115857873B (en) Multiplier, multiplication calculation method, processing system, and storage medium
Nithyashree et al. Design of an efficient vedic binary squaring circuit
CN1553310A (en) Symmetric cutting algorithm for high-speed low loss multiplier and circuit strucure thereof
Chakrapani et al. A low complexity splitter based parallel multiplier for DSP applications
CN111610955B (en) Data saturation and packaging processing component, chip and equipment
CN113031909B (en) Data processor, method, device and chip
Ramya et al. Implementation of High Speed FFT using Reversible Logic Gates for Wireless DSP Applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination