CN111694543B

CN111694543B - Approximate multiplier design method, approximate multiplier and image sharpening circuit

Info

Publication number: CN111694543B
Application number: CN202010483412.0A
Authority: CN
Inventors: 杨志玺; 杨俊�; 李献斌; 郭熙业; 吴先宇; 赵振岩; 周超
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2020-06-01
Filing date: 2020-06-01
Publication date: 2022-06-14
Anticipated expiration: 2040-06-01
Also published as: CN111694543A

Abstract

The application relates to an approximate multiplier design method, an approximate multiplier and an image sharpening circuit, wherein the method comprises the following steps: and replacing a half adder for partial product accumulation in the precise 4-bit Dadda tree multiplier with a full adder, and replacing a lower 4-bit calculation circuit in the precise 4-bit Dadda tree multiplier with an approximate addition circuit to obtain an approximate 4-bit multiplier. Wherein the approximate addition circuit comprises a logical OR circuit for obtaining the value of the 2 nd bit from the LSB of the final product of the approximate 4-bit multiplier. Based on the values of the 2 nd to 4 th columns from the LSB of the partial product accumulation result in the approximate 4-bit multiplier and the output value of the approximate addition circuit, an error detection and correction circuit is designed for eliminating the error of the final product output from the approximate 4-bit multiplier. The method structurally improves the multiplier substantially, reduces the use amount of hardware resources on one hand, and provides an error correction mechanism on the other hand, so that the calculation result of the approximate multiplier is more accurate.

Description

Approximate multiplier design method, approximate multiplier and image sharpening circuit

Technical Field

The present application relates to the field of low power consumption digital signal processing circuit design technology, and in particular, to an approximate multiplier design method, an approximate multiplier, and an image sharpening circuit.

Background

In some fault tolerant applications, the accuracy of the calculated values may be reduced moderately, and operations are performed on an "approximate" basis, with the related art being referred to collectively as approximate calculations. The main idea of the circuit design of approximate calculation is to change the circuit implementation logic and reduce the resources occupied by the circuit by simplifying the circuit structure. Approximate calculation circuits have been widely used in Digital Signal Processing (DSP) systems, multimedia, fuzzy logic and neural networks to simplify circuits, reduce chip area and reduce circuit power consumption by reducing calculation accuracy while providing practical calculation results for related applications.

The multiplier is a main resource consumption unit in the logic arithmetic unit, and is widely applied to the image processing circuit, so that it is necessary to reduce the resource consumption of the multiplier by an approximate design. At present, most of approximate multiplier designs only aim at basic components in a partial product item compression accumulation process, such as a half adder, a full adder, a compressor and the like are adopted to realize the approximate multiplier, an error correction mechanism is not considered, and the functions of error detection and error correction are not provided. For example, as shown in fig. 1, in a Partial Product accumulation process (PPR), a Half Adder (HA), a 4:2 compressor and a full Adder are used to perform compression accumulation of Partial products, and a Carry-forward Adder (CPA) is used to perform final accumulation calculation to obtain a final Product. In addition, for some approximate multiplier designs with error correction mechanisms, the error correction mechanisms are implemented by improving the basic components of partial product term compression accumulation in the multiplier without substantially improving the multiplier structure.

Disclosure of Invention

In view of the above, it is necessary to provide an approximate multiplier design method, an approximate multiplier and an image sharpening circuit capable of providing an error correction mechanism in order to solve the above technical problems.

A method of approximate multiplier design, the method comprising:

and replacing a half adder for partial product accumulation in the precise 4-bit Dadda tree multiplier with a full adder, and replacing a lower 4-bit calculation circuit in the precise 4-bit Dadda tree multiplier with an approximate addition circuit to obtain an approximate 4-bit multiplier. Wherein the approximate addition circuit includes a logical OR circuit for obtaining a value of a 2 nd bit from the LSB of the final product of the approximate 4-bit multiplier.

Based on the values of the 2 nd to 4 th columns from the LSB of the partial product accumulation result in the approximate 4-bit multiplier and the output value of the approximate addition circuit, an error detection and correction circuit is designed for eliminating the error of the final product output from the approximate 4-bit multiplier.

One embodiment provides a method, further comprising: and designing a power gating circuit according to the 4-bit approximate multiplier and the error detection and correction circuit, wherein the power gating circuit is used for starting or closing the error detection and correction circuit according to a preset calculation precision threshold value.

One embodiment provides a method, further comprising: the final accumulation calculation is performed using a ripple carry adder to obtain the upper 4-bit value of the final product of the approximate 4-bit multiplier.

One embodiment provides a method, further comprising: a circuit for calculating the exact final product of the multiplication of the upper N/2 bits of an N-bit multiplier and the upper N/2 bits of an N-bit multiplicand using 1 exact N/2-bit Dadda tree multiplier.

A circuit for calculating an approximate final product of multiplying the multiplier and other bits of the multiplicand using 3 approximate N/2 bit multipliers, where N is 2^k，k＝3，4，5……。

An approximate multiplier, comprising:

an approximate 4-bit multiplier, the circuit of which is: the half adder for partial product accumulation in the exact 4-bit Dadda tree multiplier is replaced by a full adder and the lower 4-bit calculation circuit in the exact 4-bit Dadda tree multiplier is replaced by an approximate adder. The approximate addition circuit includes a logical or circuit for obtaining the value of the 2 nd bit from the LSB of the approximate 4-bit multiplier final product. And an error detection and correction circuit, the input of which is the value of the 2 nd to 4 th columns from the LSB of the partial product accumulation result in the approximate 4-bit multiplier, and the output of which is a correction value. The correction value is used to eliminate errors that approximate the final product of the 4-bit multiplier output.

In one embodiment, the power gating circuit is further included for turning on or off the error detection and correction circuit according to a preset calculation accuracy threshold.

In one embodiment, the method further comprises a ripple carry adder for obtaining the upper 4-bit value of the final product of the approximate 4-bit multiplier.

In one embodiment, the method further comprises the following steps: 1 exact N/2 bit Dadda tree multiplier for calculating the exact final product of the multiplication of the upper N/2 bits of the N-bit multiplier and the upper N/2 bits of the N-bit multiplicand. Wherein N is 2^k，k＝3，4，5……。

3 approximate N/2 bit multipliers for calculating the approximate final product of multiplying the multiplier by the other bits of the multiplicand.

An image sharpening circuit comprising an approximation multiplier as claimed in any one of the preceding embodiments.

The approximate multiplier design method, the approximate multiplier and the image sharpening circuit replace a half adder used for partial product accumulation in the precise 4-bit Dadda tree multiplier with a full adder, carry out approximate calculation design on a lower 4-bit calculation circuit, and design a corresponding error detection and correction circuit.

Drawings

FIG. 1 is a schematic diagram of a circuit configuration of a prior art precision 4-bit Dadda tree multiplier;

FIG. 2 is a schematic diagram of the steps of an approximate multiplier design method in one embodiment;

FIG. 3 is a schematic diagram of an exemplary approximate multiplier circuit;

FIG. 4 is a schematic diagram of an embodiment of an error detection and correction circuit;

FIG. 5 is a schematic diagram of an embodiment of a power gating circuit;

FIG. 6 is a schematic diagram of a calculation principle of an N-bit approximate multiplier in one embodiment;

FIG. 7 is a diagram illustrating an exemplary 16-bit approximate multiplier;

FIG. 8 is a diagram illustrating an exemplary image sharpening circuit.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In the prior art, as shown in fig. 1, in a Partial Product Reduction (PPR), a Half Adder (HA), a 4:2 compressor, and a full Adder are used to perform compression and accumulation of Partial products, and a travelling wave Carry Adder (CPA) is used to perform final accumulation calculation, so as to obtain a final Product. The tree structure is adopted to enable the accumulation process of partial products to be executed in parallel, the utilization efficiency of hardware resources is improved, and the time delay is reduced, which is more obvious when the number of bits of the multiplier is increased.

The application provides an approximate multiplier design method based on an accurate 4-bit Dadda tree multiplier shown in FIG. 1, and the method includes:

step 202: and replacing a half adder for partial product accumulation in the precise 4-bit Dadda tree multiplier with a full adder, and replacing a lower 4-bit calculation circuit in the precise 4-bit Dadda tree multiplier with an approximate addition circuit to obtain an approximate 4-bit multiplier. Wherein the approximate addition circuit comprises a logical OR circuit for obtaining the value of the 2 nd bit from the LSB of the final product of the approximate 4-bit multiplier.

Specifically, by the above approximate multiplier design method, an approximate 4-bit multiplier as shown in fig. 3 can be obtained. It can be seen that compared to the exact 4-bit daddada tree multiplier shown in fig. 1, the approximate 4-bit multiplier shown in fig. 3 changes its lower 4 p3 columns from two accumulation terms to one accumulation term in the Final accumulation stage (Final Addition), so that p0, p3, and p4 can be directly used as multiplier outputs. The values of p1 and p2 are either

logic

0 or 1, and the probabilities that p1 equals logic 1 and p1 equals logic 1 are both 0.25 and 0.75. For the combination of p1 and p2, "00", "01", "10" occurs with a higher probability than the combination "11", so the approximate addition of p1 and p2 can be achieved using a logical or gate. Based on the circuit improvement, the number of adders used in the final accumulation stage of the approximate 4-bit multiplier shown in fig. 3 is reduced to 3.

Step 204: based on the values of the 2 nd to 4 th columns from the LSB of the partial product accumulation result in the approximate 4-bit multiplier and the output value of the approximate addition circuit, an error detection and correction circuit is designed for eliminating the error of the final product output from the approximate 4-bit multiplier.

In the approximate 4-bit multiplier shown in FIG. 3, the carry signal needs to be output to the high weight bit only when p1 and p2 are both logic 1's, so that the error probability is

The magnitude of the error is 2. Error detection and correction circuitry is designed based on this feature. When p1 and p2 are both logic 1's, the correct sum is logic 0's and the approximate multiplier sum is logic 1's. Therefore, for the 2 nd and 3 rd bits from LSB of the final product, an error occurs only when both p1 and p2 are 1; and the correction is achieved by adding an extra binary number '10'. When p3 is 0 and p1 and p2 are 1, then the error does not propagate to p4 (i.e., the fourth bit and the forward high weight bit are correct). However, when p3 is 1, the error will propagate to bit 4 and the higher weight bits, in which case the approximate final product of bit 4 is the opposite of the correct product. Thus, a NAND gate (NAND) is used to detect whether an error exists, and an exclusive nor gate (XNOR) is used to generate a correction value. Similarly, if p 1-p 4 are all 1, then an extra carry is needed to correct the error. Based on the above description, the error detection and correction circuit designed in this embodiment is shown in fig. 4, with inputs of p1, p2, p3, p4, outputs low-weight final results of the 2 nd bit to the 4 th bit from the LSB as final products, and a carry value to the 5 th bit from the LSB.

Table 1 shows the resource usage comparison of the exact 4-bit daddada tree multiplier of the prior art and the approximate 4-bit multiplier provided by the present embodiment. It can be seen that the approximate 4-bit multiplier provided by the present embodiment has lower resource consumption, i.e. can realize approximate multiplication with low power consumption.

TABLE 1 comparison of resource usage of exact 4-bit Dadda tree multiplier and approximate 4-bit multiplier

Multiplier type	Resource consumption
		Precision 4-bit Dadda tree multiplier	2Compressor+5FA+3HA
Approximate 4-bit Dadda tree multiplier	2Compressor+4FA+1HA+OR

The approximate multiplier design method provided by this embodiment is based on the exact 4-bit Dadda tree multiplier, replaces the half adder used for partial product accumulation with the full adder, performs approximate calculation design on the lower 4-bit calculation circuit, and designs the corresponding error detection and correction circuit, thereby substantially improving the multiplier in structure. On one hand, the usage amount of hardware resources is reduced, and on the other hand, an error correction mechanism is provided, so that the calculation result of the approximate multiplier is more accurate.

Specifically, the present embodiment designs a power gating circuit as shown in fig. 5 according to the 4-bit approximate multiplier and the error detection and correction circuit in the above embodiments. When the output result of the 4-bit approximate multiplier meets the calculation precision requirement of fault-tolerant application, the error detection and correction circuit can be closed through the power gating circuit; otherwise, the error detection and correction circuit can be turned on through the power gating circuit to obtain an accurate final product result. The method provided by the embodiment enables the approximate multiplier to flexibly adapt to different calculation accuracy requirements, and simplifies the circuit structure and reduces the circuit energy consumption when the calculation accuracy requirements are low.

One embodiment provides a method, further comprising: the final accumulation calculation is performed using a ripple carry adder to obtain the upper 4-bit value of the final product of the approximate 4-bit multiplier. Since the partial product result to be accumulated in this embodiment has only 3 columns, the use of the ripple carry adder can utilize the simple circuit feature and the delay is small.

One embodiment provides a method, further comprising: a circuit for calculating an exact final product of multiplying the high N/2 bits of an N-bit multiplier by the high N/2 bits of an N-bit multiplicand using 1 exact N/2-bit Dadda tree multiplier, and a circuit for calculating an approximate final product of multiplying the multiplier by the other bits of the multiplicand using 3 approximate N/2-bit multipliers, where N is 2^k，k＝3，4，5……。

Specifically, the approximate multiplier with N bits can be implemented in a recursive manner by using the approximate multiplier with 4 bits in the above embodiment. Let the multiplier be A, and the high N/2 bit of the multiplier be A_HThe lower N/2 position is A_LThe multiplicand is B, the high N/2 bit of the multiplicand is B_HThe lower N/2 position is B_L. As shown in fig. 6, by pair a_HB_H、A_HB_L、A_LB_H、A_LB_LA final product result of 2N bits can be obtained by performing 4N/2 bit multiplications. To maintain accuracy, for A_HB_HImplemented using precision multipliers, and A_HB_L、A_LB_H、A_LB_LAnd an approximate multiplier is adopted for realization. As shown in FIG. 7, take a 16-bit approximate multiplier as an example, A_HB_HIn part, an exact 8-bit multiplier is used, and each 8-bit multiplier is further decomposed into four 4-bit multipliers.

The embodiment provides a method for constructing a high-bit-width approximate multiplier, which can adjust a circuit structure according to the bit width on the basis of ensuring the accuracy of a final product result, so that the designed approximate multiplier can be flexibly suitable for the requirements of different calculation scenes.

It should be understood that, although the steps in the flowchart of fig. 2 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 2 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

In one embodiment, an approximate multiplier is provided, comprising:

an approximate 4-bit multiplier, the circuit of which is: the half adder for partial product accumulation in the exact 4-bit Dadda tree multiplier is replaced by a full adder and the lower 4-bit calculation circuit in the exact 4-bit Dadda tree multiplier is replaced by an approximate adder. The approximate addition circuit includes a logical or circuit for obtaining the value of the 2 nd bit from the LSB of the approximate 4-bit multiplier final product.

And an error detection and correction circuit, the input of which is the value of the 2 nd to 4 th columns from the LSB of the partial product accumulation result in the approximate 4-bit multiplier, and the output of which is a correction value. The correction value is used to eliminate the error of the final product of the approximate 4-bit multiplier output.

For specific definition of the approximate multiplier, see the above definition of the design method of the approximate multiplier, which is not described herein again.

In one embodiment, an image sharpening circuit is provided, comprising an approximation multiplier as claimed in any one of the preceding embodiments.

Specifically, the image sharpening algorithm of the image sharpening circuit is executed as follows:

wherein the content of the first and second substances,

i and S are the original and processed images.

The algorithm is performed on a block image basis, i.e. the image is divided into 5 x 5 image blocks. Two images were selected for processing, including the typical images lenna. jpg and ela. jpg.

The image sharpening circuit is shown in fig. 8, where the DSP system is written in Verilog, the input and output signals are connected through registers as loads, and the multiplier employs the approximate multiplier provided in any of the embodiments described above. The detailed functions of the components are described as follows:

acc _ REG: a register for storing pixel values in the processed block image;

counter: indicating whether the processing of the image block is completed; if so, generating a flag signal div _ in;

accum accumulator: for processing iterative addition and division; if the flag signal div _ in is disabled, the accumulator is used as a register to store the added value, which is then used as the operand for adder "+"; if div _ in is enabled, divide.

In equation (1), a division by 273 is required, and a divider is typically used for this operation, but this increases the complexity of the circuit implementation. In this embodiment, therefore, division is realized by shift subtraction according to equation (2).

Equation (2) indicates that the division can be performed by shifting the bits to the left by 8 and 12, and then subtracting the corresponding terms to obtain an approximate result. Note that subtraction is performed with 2's complement addition; the translation of the complement is also included in the Accum.

'*','+','-': arithmetic operations, consisting of multiplications, additions and subtractions. Wherein, the multiplication part is realized by adopting the approximate multiplier in the embodiment; '+' is a 16-bit unsigned adder. In terms of subtraction, a 10-bit 2's complement adder is also employed in Accum.

REG: the registers are used to store intermediate signals and the global clock is used to synchronize sequential circuit components such as REGs, Acc _ Reg, Counter and Accum.

By comparing the application effects of different types of multipliers in the image sharpening circuit provided in this embodiment, it can be known that the image sharpening circuit implemented based on the approximate multiplier in the above embodiments can output a high-quality image under the condition of low resource consumption. Table 2 shows the accuracy and circuit test results for image sharpening circuits using different types of multipliers. It can be seen that, after the approximate multiplier provided in the embodiment of the present application is adopted, the area and power consumption parameters of the image sharpening circuit are reduced, and the image processing performance of the image sharpening circuit is better than that of the image sharpening circuit adopting other approximate multipliers.

TABLE 2 precision and circuit test results for image sharpening circuits for different types of multipliers

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent application shall be subject to the appended claims.

Claims

1. A method of approximate multiplier design, the method comprising:

replacing a half adder for partial product accumulation in an accurate 4-bit Dadda tree multiplier with a full adder, and replacing a low-4-bit calculation circuit in the accurate 4-bit Dadda tree multiplier with an approximate addition circuit to obtain an approximate 4-bit multiplier; the approximate addition circuit includes a logical OR circuit for obtaining a value of a 2 nd bit from the LSB least significant bit of a final product of the approximate 4-bit multiplier; the exact 4-bit Dadda tree multiplier includes: a half adder, a 4:2 compressor, a full adder and a ripple carry adder; the partial product accumulation means that a half adder, a 4:2 compressor and a full adder perform compression accumulation of partial products;

and designing an error detection and correction circuit for eliminating errors of the final product output by the approximate 4-bit multiplier according to the values of 2 nd to 4 th bits from the LSB least significant bit of the partial product accumulation result in the approximate 4-bit multiplier and the output value of the approximate addition circuit.

2. The approximate multiplier design method of claim 1 further comprising:

designing a power gating circuit according to the approximate 4-bit multiplier and the error detection and correction circuit; the power gating circuit is used for switching on or switching off the error detection and correction circuit according to a preset calculation precision threshold value.

3. The approximate multiplier design method of claim 1 wherein:

and performing final accumulation calculation by using the ripple carry adder to obtain the value of the upper 4 bits of the final product of the approximate 4-bit multiplier.

4. The approximate multiplier design method according to any of claims 1-3, further comprising:

a calculation circuit for forming an accurate final product of multiplication of the high N/2 bits of the N-bit multiplier and the high N/2 bits of the N-bit multiplicand by using 1 accurate N/2-bit Dadda tree multiplier;

a calculation circuit for composing an approximate final product of multiplying the multiplier and other bits of the multiplicand using 3 approximate N/2 bit multipliers; wherein N is 2^k，k＝3，4，5……。

5. An approximate multiplier, comprising:

an approximate 4-bit multiplier, the circuit of the approximate 4-bit multiplier being: replacing a half adder for partial product accumulation in an accurate 4-bit Dadda tree multiplier with a full adder, and replacing a lower 4-bit calculation circuit in the accurate 4-bit Dadda tree multiplier with an approximate addition circuit; said approximate addition circuit comprises a logical OR circuit for obtaining the value of the 2 nd bit from the LSB least significant bit of the final product of said approximate 4-bit multiplier; the exact 4-bit Dadda tree multiplier includes: a half adder, a 4:2 compressor, a full adder and a ripple carry adder; the partial product accumulation represents that a half adder, a 4:2 compressor and a full adder carry out compression accumulation of partial products;

an error detection and correction circuit having an input of a value of 2 th to 4 th bits from the LSB least significant bit of the partial product accumulation result in the approximate 4-bit multiplier and an output of a correction value; the correction value is used to eliminate errors in the final product output by the approximate 4-bit multiplier.

6. The approximate multiplier of claim 5, further comprising:

and the power gating circuit is used for opening or closing the error detection and correction circuit according to a preset calculation precision threshold value.

7. The approximate multiplier of claim 5, further comprising:

a ripple carry adder for obtaining the upper 4-bit value of the final product of the approximate 4-bit multiplier.

8. The approximate multiplier of any of claims 5-7, further comprising:

1 accurate N/2 bit Dadda tree multiplier for calculating an accurate final product of multiplication of the high N/2 bit of the N bit multiplier and the high N/2 bit of the N bit multiplicand; wherein N is 2^k，k＝3，4，5……；

3 approximate N/2 bit multipliers for calculating approximate final products of multiplication of the multiplier and other bits of the multiplicand.

9. An image sharpening circuit comprising the approximation multiplier of any one of claims 5 to 8.