CN111221499A

CN111221499A - Approximate multiplier based on approximate 6-2 and 4-2 compressors and calculation method

Info

Publication number: CN111221499A
Application number: CN201911130135.9A
Authority: CN
Inventors: 梁华国; 方宝; 盛勇侠; 鲁迎春; 黄正峰; 易茂祥; 蒋翠云
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2019-11-18
Filing date: 2019-11-18
Publication date: 2020-06-02
Anticipated expiration: 2039-11-18
Also published as: CN111221499B

Abstract

The invention provides an approximate multiplier and a calculation method based on approximate 6-2 and 4-2 compressors, wherein the approximate multiplier comprises a partial product generation module, a partial product tree-shaped compression module and a carry adder module; the partial product generating module is an AND gate array, and each bit of the multiplier and the multiplicand obtains a corresponding partial product through AND operation; the partial product tree-shaped compression module comprises an accurate compression unit, an approximate compression unit and a truncation unit; and the carry propagation adder is used for combining an output signal obtained by adding the output signals of the precise compression unit and the approximate compression unit and an output signal of the truncation unit to obtain a result. The invention can greatly reduce the area overhead, the time delay and the power consumption of the multiplier, thereby improving the performance of the multiplier and reducing the energy consumption.

Description

Approximate multiplier based on approximate 6-2 and 4-2 compressors and calculation method

Technical Field

The invention belongs to the technical field of integrated circuits, and particularly relates to an approximate multiplier based on an approximate 6-2 compressor and an approximate 4-2 compressor and a calculation method.

Background

In recent years, along with rapid development of large data and artificial intelligence, the computing power of computers has been increasing, but at the same time, the power consumption has been increasing greatly. Applications such as big data, artificial intelligence, multimedia and the like often have good fault tolerance and do not require complete accuracy, so the approximate computing technology can effectively solve the problem of high power consumption. Approximate calculation obtains the great optimization of calculation performance and calculation energy consumption by properly relaxing calculation precision under the condition of meeting the expected precision requirement. This allows the approximation calculation to take advantage of the reduced complexity and cost, change the design process of existing digital circuits and systems, design approximation circuits with lower power consumption, delay and area, and improve circuit performance.

Multipliers are key arithmetic units of digital processors and are widely used from filtering to convolutional neural networks. The traditional precise multiplier ensures that the output result is completely correct, but consumes a large amount of resources and has high delay and power consumption. Improving a conventional precision multiplier can result in an approximation multiplier, however, unreasonably altering the logic results in an approximation that tends to have a large error. In some application scenarios that can tolerate a certain error, if the complexity and the operation accuracy of the multiplier are considered at the same time, an approximate multiplier with high accuracy and low complexity needs to be designed to meet the requirements.

Disclosure of Invention

The invention aims to solve the defects of the prior art and provides an approximate multiplier based on approximate 6-2 and 4-2 compressors and a calculation method thereof, so that the time delay, the power consumption and the area of the approximate multiplier can be reduced, and meanwhile, the high accuracy is kept, thereby improving the calculation performance of the approximate multiplier and reducing the energy consumption.

In order to achieve the purpose, the invention adopts the following technical scheme:

the approximate multiplier based on approximate 6-2 and 4-2 compressors of the invention is characterized by comprising the following steps: the device comprises a partial product generation module, a partial product tree compression module and a carry adder module;

the partial product generating module is an AND gate array, the AND gate array is used for carrying out AND logic calculation on a multiplier of n bits and a multiplicand of n bits to obtain n multiplied by n partial products, and the n multiplied by n partial products form a 2n-1 column partial product compression tree;

the partial product tree compression module comprises: a precise compression unit, an approximate compression unit and a truncation unit;

the precise compression unit utilizes a precise 4-2 compressor and a full adder to compress the partial product of the n-3 columns with the highest weight in the partial product compression tree to obtain a precise compression result;

the approximate compression unit utilizes an approximate 6-2 compressor and an approximate 4-2 compressor to compress the partial products of the n-3 columns with the highest weight in the partial product compression tree to obtain an approximate compression result;

forming a preprocessing result by the accurate compression result and the approximate compression result;

the truncation unit truncates the partial products of the last 5 columns in the partial product compressed tree to obtain a 5-bit all-zero truncation result;

and the carry propagation adder adds the partial products of each column in the preprocessing result to obtain the first 2n-5 bit binary result, and then combines the binary result with the 5 bit all-zero truncation result to obtain the final 2n bit binary result.

The approximate multiplier of the invention is characterized in that:

the approximate 6-2 compressor consists of 3 two-input AND gates, 3 two-input OR gates and 3 three-input OR gates, and sequentially comprises: the first AND gate, the second AND gate and the third AND gate of the two inputs, the first OR gate, the second OR gate and the third OR gate of the two inputs, and the fourth OR gate, the fifth OR gate and the sixth OR gate of the three inputs;

a first input of the approximate 6-2 compressor is interconnected by a first input of a fourth or gate and a first input of a first or gate;

a second input of a fourth OR gate is used as a second input of the approximate 6-2 compressor;

a third input of said approximate 6-2 compressor interconnected by a third input of a fourth or gate and a second input of the first or gate;

a fourth input of said approximate 6-2 compressor interconnected by a first input of a second OR gate and a first input of a second AND gate;

a fifth input of said approximate 6-2 compressor interconnected by a second input of a second OR gate and a second input of a second AND gate;

a sixth input of the approximate 6-2 compressor is taken as a second input of a third or gate;

the output end of the second AND gate is connected with the first input end of the third OR gate; the output ends of the third OR gate and the second OR gate are respectively connected with the first input end and the second input end of the third AND gate; the output ends of the fourth OR gate, the first OR gate and the third AND gate are respectively connected with the first input end, the second input end and the third input end of the sixth OR gate; an output of the sixth OR gate serves as a first output of the approximate 6-2 compressor;

the output ends of the fourth or gate and the first or gate are respectively connected with the first input end and the second input end of the first and gate; the output end of the second AND gate is connected with the first input end of the third OR gate; the output ends of the first and gate, the second or gate and the third or gate are respectively connected with the first input end, the second input end and the third input end of the fifth or gate; an output of the fifth or-gate serves as a second output of the approximate 6-2 compressor.

The approximate 4-2 compressor comprises 2 inverters, 6 two-input and gates and 2 four-input or gates, and sequentially: the first and second inverters comprise a first and gate, a second and gate, a third and gate, a fourth and gate, a fifth and gate and a sixth and gate which are input by two inputs, and a first or gate and a second or gate which are input by four inputs;

the first input end of the first AND gate, the first input end of the second AND gate and the first input end of the sixth AND gate are connected with each other and used as the first input end of the approximate 4-2 compressor;

the second input end of the first AND gate, the first input end of the third AND gate and the first input end of the fifth AND gate are connected with each other and used as the second input end of the approximate 4-2 compressor;

the second input end of the second AND gate, the second input end of the third AND gate, the first input end of the fourth AND gate, the input end of the first inverter and the input end of the second inverter are connected with each other and used as the third input end of the approximate 4-2 compressor;

a fourth input of said approximate 4-2 compressor interconnected by a second input of a fourth AND gate and a fourth input of a second OR gate;

the output end of the first inverter is connected with the second input end of the fifth AND gate; the output end of the second inverter is connected with the second input end of the sixth AND gate; the output ends of the first AND gate, the fifth AND gate and the sixth AND gate are respectively connected with the first input end, the second input end and the third input end of the second OR gate; the output end of the second OR gate is used as the data output end of the approximate 4-2 compressor;

the output ends of the first AND gate, the second AND gate, the third AND gate and the fourth AND gate are respectively connected with the first input end, the second input end, the third input end and the fourth input end of the first OR gate; the output of the first or gate serves as the carry output of the approximate 4-2 compressor.

The invention relates to a calculation method of an approximate multiplier based on approximate 6-2 and 4-2 compressors, which is characterized by comprising the following steps:

the method comprises the following steps: generation of partial product:

performing AND logic calculation on the multiplier of n bits and the multiplicand of n bits by using an AND gate array to generate n multiplied by n partial products, and forming a 2n-1 column partial product compression tree by the n multiplied by n partial products;

step two: building of an approximate 4-2 compressor:

according to the logical relation of the formula (1), 2 inverters, 6 two-input AND gates and 2 four-input OR gates are utilized to build an approximate 4-2 compressor:

in the formula (1), Y is₁、Y₂、Y₃And Y₄Four input ends of the approximate 4-2 compressor are respectively, and Sum and Carry are respectively an output end and a Carry output end of the approximate 4-2 compressor;

step three: building of an approximate 6-2 compressor:

and (3) constructing an approximate 6-2 compressor by using 3 two-input AND gates, 3 two-input OR gates and 3 three-input OR gates according to the logical relation of the formula (2):

in the formula (2), X₁、X₂、X₃、X₄、X₅And X₆Six inputs, Sum, of the approximate 6-2 compressor, respectively₁、Sum₂First and second outputs of said approximate 6-2 compressor, respectively;

step four: simplification of partial product compression tree:

defining the column with the lowest weight of the partial product compression tree as the 1 st column of the compression tree, defining the 1 st to 5 th columns of the partial product compression tree as a truncation array, defining the 6 th to n +2 th columns of the partial product compression tree as an approximate compression array, and defining the n +3 th to 2n-1 th columns of the partial product compression tree as an accurate compression array;

setting partial products in the truncation array to be zero, and outputting a binary number of 5 bits and all zeros as a truncation result;

compressing the partial product of each column in the approximate compression array by using an approximate 6-2 compressor and an approximate 4-2 compressor to obtain an approximate compression result;

compressing the partial product of each column in the precise compression array by using a precise 4-2 compressor and a full adder to obtain a precise compression result;

step five: generation of binary results:

and adding the partial products of each column in the preprocessing result by using a carry propagation adder to obtain the first 2n-5 bit binary system result, and combining the binary system result with the truncation result to obtain the final 2n bit binary system result.

Compared with the prior art, the invention has the beneficial effects that:

1. whereas the exact 4-2 compressor of the prior art contains 1 nand gate, 2 or gates, 2 inverters, 3 and gates, 3 nor gates and 4 exclusive or gates, the approximate 4-2 compressor proposed by the present invention contains only 2 inverters, 6 and gates and 2 four-input or gates. Therefore, during the compression process, the approximate 4-2 compressor proposed by the present invention is much smaller in latency, power consumption and area than the exact 4-2 compressor.

2. The approximate 6-2 compressor proposed by the present invention comprises two approximate full adders with mutually compensated errors. This greatly improves the precision of the approximate 6-2 compressor, and at the same time, the approximate 6-2 compressor provided by the invention is used for processing the six-bit partial product, and compared with the precision multiplier which uses 2 precision 4-2 compressors for processing, the hardware cost is reduced.

3. The hardware complexity of the approximate multiplier provided by the invention is simpler. The approximate multiplier proposed by the present invention has a shorter delay than the exact multiplier because the critical path of the approximate multiplier is shorter. Meanwhile, the logic gate used by the approximate multiplier provided by the invention is far smaller than that of the precise multiplier, so that the approximate multiplier provided by the invention is far smaller than that of the precise multiplier in terms of power consumption and area.

Drawings

FIG. 1 is a schematic diagram of an approximate multiplier according to the present invention;

FIG. 2 is a circuit block diagram of the approximate 4-2 compressor of the present invention;

FIG. 3 is a circuit block diagram of a prior art precision 4-2 compressor;

FIG. 4 is a circuit block diagram of the approximate 6-2 compressor of the present invention;

FIG. 5 is a Carry graph of the approximate 4-2 compressor of the present invention;

FIG. 6 is a Sum Carlo diagram of an approximate 4-2 compressor of the present invention;

FIG. 7 is a schematic diagram of the configuration of the approximate 6-2 compressor of the present invention;

FIG. 8 is a partial product compressed tree diagram of an 8 × 8 bit approximate multiplier applying the present invention;

FIG. 9 is a partial product compressed tree diagram of an 8 × 8 bit precision dadda multiplier of the present invention.

Detailed Description

In this embodiment, as shown in fig. 1, an approximate multiplier based on approximate 6-2 and 4-2 compressors includes: the device comprises a partial product generation module, a partial product tree compression module and a carry adder module;

the partial product generating module is an AND gate array, the AND gate array is used for carrying out AND logic calculation on the multiplier of n bits and the multiplicand of n bits to obtain n multiplied by n partial products, and the n multiplied by n partial products form a 2n-1 column partial product compression tree;

the approximate compression unit uses an approximate 6-2 compressor and an approximate 4-2 compressor to compress the partial products of the n-3 columns with the highest weight in the partial product compression tree to obtain an approximate compression result;

the precise compression result and the approximate compression result form a preprocessing result;

the truncation unit performs truncation processing on the last 5 columns of partial products in the partial product compressed tree to obtain a truncation result of 5-bit all zeros;

Specifically, as shown in fig. 2, the approximate 4-2 compressor includes 2 inverters, 6 two-input and gates, and 2 four-input or gates, and sequentially: the first and second inverters comprise a first and gate, a second and gate, a third and gate, a fourth and gate, a fifth and gate and a sixth and gate which are input by two inputs, and a first or gate and a second or gate which are input by four inputs;

the second input end of the first AND gate, the first input end of the third AND gate and the first input end of the fifth AND gate are connected with each other and used as a second input end of the approximate 4-2 compressor;

the second input end of the second AND gate, the second input end of the third AND gate, the first input end of the fourth AND gate, the input end of the first inverter and the input end of the second inverter are connected with each other and used as a third input end of the approximate 4-2 compressor;

a fourth input of the approximate 4-2 compressor interconnected by a second input of the fourth AND gate and a fourth input of the second OR gate;

the output end of the first inverter is connected with the second input end of the fifth AND gate; the output end of the second inverter is connected with the second input end of the sixth AND gate; the output ends of the first AND gate, the fifth AND gate and the sixth AND gate are respectively connected with the first input end, the second input end and the third input end of the second OR gate; the output of the second or gate serves as the data output of the approximate 4-2 compressor;

As shown in fig. 4, in this embodiment, the approximate 6-2 compressor is composed of 3 two-input and gates, 3 two-input or gates, and 3 three-input or gates, and sequentially: the first AND gate, the second AND gate and the third AND gate of the two inputs, the first OR gate, the second OR gate and the third OR gate of the two inputs, and the fourth OR gate, the fifth OR gate and the sixth OR gate of the three inputs;

a first input of the approximate 6-2 compressor interconnected by a first input of the fourth or gate and a first input of the first or gate;

a second input of the approximate 6-2 compressor is taken as a second input of the fourth or gate;

a third input of the approximate 6-2 compressor interconnected by a third input of the fourth or gate and a second input of the first or gate;

a fourth input of the approximate 6-2 compressor interconnected by a first input of a second OR gate and a first input of a second AND gate;

a fifth input of the approximate 6-2 compressor interconnected by a second input of the second OR gate and a second input of the second AND gate;

a sixth input of the approximate 6-2 compressor is taken as a second input of the third or gate;

the output ends of the fourth or gate and the first or gate are respectively connected with the first input end and the second input end of the first and gate; the output end of the second AND gate is connected with the first input end of the third OR gate; the output ends of the first and gate, the second or gate and the third or gate are respectively connected with the first input end, the second input end and the third input end of the fifth or gate; the output of the fifth or gate serves as the second output of the approximate 6-2 compressor.

In this embodiment, a method for calculating an approximate multiplier based on approximate 6-2 and 4-2 compressors includes the following steps:

the method comprises the following steps: generation of partial product:

step two: building of an approximate 4-2 compressor:

is provided with Y₁、Y₂、Y₃And Y₄Four input terminals of the approximate 4-2 compressor, and Sum and Carry are output terminals of the approximate 4-2 compressor and Carry output terminals, respectively. Carry's Carlo graph is shown in FIG. 5, which shows Y in the Carlo graph₁Y₂Y₃Y₄0101 and Y₁Y₂Y₃Y₄The Carry value corresponding to 0101 is replaced by '0'. The Carlo diagram of Sum is shown in FIG. 6, which shows Y in the Carlo diagram₁Y₂Y₃Y₄Sum value corresponding to 0010 is replaced by '0', Y₁Y₂Y₃Y₄＝0011、Y₁Y₂Y₃Y₄＝0101、Y₁Y₂Y₃Y₄1100 and Y₁Y₂Y₃Y₄The Sum value corresponding to 1001 is replaced by '1'. Turning to the Carlo diagrams of FIG. 5 and FIG. 6, we get an expression for an approximate 4-2 compressor:

according to the logical relation of the formula (1), 2 inverters, 6 two-input AND gates and 2 four-input OR gates are utilized to build the approximate 4-2 compressor, and the circuit structure diagram of the approximate 4-2 compressor provided by the invention is shown in FIG. 2.

As shown in fig. 3, which is a circuit structure diagram of the exact 4-2 compressor, it can be seen from the comparison between fig. 2 and fig. 3 that the exact 4-2 compressor includes 1 nand gate, 2 or gates, 2 inverters, 3 and gates, 3 nor gates and 4 xor gates, but the approximate 4-2 compressor proposed by the present invention only includes 2 inverters, 6 and gates and 2 four-input or gates. Therefore, during the compression process, the approximate 4-2 compressor proposed by the present invention will be much smaller in latency, power consumption and area than the exact 4-2 compressor.

Step three: building of an approximate 6-2 compressor:

by usingDesigning a group of approximate full adders by an error mutual compensation strategy, and setting X₁、X₂And X₃To approximate the input of full adder 1, Sum₁' and Carry₁' is the output of the approximate full adder 1; x₄、X₅And X₆To approximate the input of the full adder 2, Sum₂' and Carry₂' is the output of the approximate full adder 2.

The average error ME of the full adder is:

in the formula (2), Value is the Value of the precise full adder, V_appTo approximate the value of a full adder, Erri represents the error between the exact full adder and the approximate full adder.

The expression for approximate full adder 1 is:

the expression for approximate full adder 2 is:

from equation (2), it is clear that the value of ME for approximate full-adder 1 is-1/8 and the value of ME for approximate full-adder 2 is +1/8, which are symmetric and the mutual errors can be compensated.

The approximate 6-2 compressor is built by adopting the error mutual compensation strategy through the approximate full adder 1 and the approximate full adder 2, the structural schematic diagram of the approximate 6-2 compressor provided by the invention is shown in fig. 7, and the expression of the approximate 6-2 compressor provided by the invention is as follows:

in the formula (5), X₁、X₂、X₃、X₄、X₅And X₆Are respectively nearLike the six inputs of a 6-2 compressor, Sum₁、Sum₂First and second outputs of an approximate 6-2 compressor, respectively. According to the logical relationship of equation (5), and 3 two-input and gates, 3 two-input or gates and 3 three-input or gates are used to build the approximate 6-2 compressor, the circuit structure diagram of the approximate 6-2 compressor proposed by the present invention is shown in fig. 4.

As can be seen from fig. 7, the approximate 6-2 compressor proposed by the present invention includes two approximate full adders whose errors compensate each other. Therefore, the precision of the approximate 6-2 compressor is greatly improved, and meanwhile, the six-bit partial product is processed by the approximate 6-2 compressor provided by the invention, and the hardware cost is reduced compared with the hardware cost of the precision multiplier which is processed by 2 precision 4-2 compressors.

Step four: simplification of partial product compression tree:

here, taking an 8 × 8 multiplier as an example, a schematic diagram of a partial product compression tree structure of the 8 × 8 multiplier is shown in fig. 8. Defining the column with the lowest weight of the partial product compression tree as the 1 st column of the compression tree, and adding a binary '0' bit after the last partial product of the 7 th column and before the first partial product of the 9 th column of the partial product compression tree in the first compression. The partial products of the 13 th column and the 14 th column are compressed by two precise full adders respectively, the partial products of the 11 th column and the 12 th column are compressed by two precise 4-2 compressors respectively, the partial products of the 6 th column and the 10 th column are compressed by two approximate 6-2 compressors respectively, and the partial products of the upper and lower six parts of the 7 th column, the 8 th column and the 9 th column are compressed by six approximate 4-2 compressors respectively. And directly truncating partial products of 1 st column to 5 th column of the partial product compression tree, and setting all the partial products to be five-bit binary '0', and performing secondary compression on the obtained result. In the second compression, the partial products of the 13 th column are compressed by a full adder, and the partial products of the 8 th, 9 th and 10 th columns are compressed by three approximate 4-2 compressors, respectively.

The hardware complexity of the approximation multiplier proposed by the present invention is clearly simpler than the exact 8 x 8 multiplier shown in fig. 9. Therefore, the critical path of the approximate multiplier proposed by the invention is shorter, so that the delay of the approximate multiplier proposed by the invention is shorter than that of the precise multiplier. Meanwhile, the logic gate used by the approximate multiplier provided by the invention is far smaller than that of the precise multiplier, so that the approximate multiplier provided by the invention is far smaller than that of the precise multiplier in terms of power consumption and area.

Step five: generation of binary results:

and adding the partial products of each column in the preprocessing result by using a carry propagation adder to obtain the first 2n-5 bit binary result, and combining the binary result with the truncation result to obtain the final 2n bit binary result.

Compared with an accurate dadda tree-shaped 8 x 8 multiplier, the 8 x 8 bit approximate multiplier applied in the invention has the advantages of reducing the power consumption by 42.9%, reducing the delay time by 5.2%, saving the area by 41.2%, reducing the power consumption delay product by 45.9%, reducing the energy delay product by 48.7% and normalizing the average error by 0.00236. Therefore, based on the low five-column truncation provided by the invention and the approximate 6-2 compressor and the approximate 4-2 compressor provided by the invention are used for carrying out approximate compression processing on the partial products from the sixth column to the n +2 th column, the precision of the approximate multiplier provided by the invention can be further improved, and the hardware overhead of the approximate multiplier is reduced, so that the performance of the multiplier is improved.

Claims

1. An approximation multiplier based on approximation 6-2 and 4-2 compressors, comprising: the device comprises a partial product generation module, a partial product tree compression module and a carry adder module;

2. The approximate multiplier of claim 1, wherein:

3. The approximate multiplier of claim 1, wherein:

4. A method for calculating an approximate multiplier based on approximate 6-2 and 4-2 compressors, comprising the steps of:

the method comprises the following steps: generation of partial product:

step two: building of an approximate 4-2 compressor:

step three: building of an approximate 6-2 compressor:

step four: simplification of partial product compression tree:

step five: generation of binary results: