CN114610268A

CN114610268A - High-precision logarithmic multiplier

Info

Publication number: CN114610268A
Application number: CN202210231433.2A
Authority: CN
Inventors: 孙大鹰; 秦力成; 王冲; 周义其; 顾文华
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2022-03-09
Filing date: 2022-03-09
Publication date: 2022-06-10

Abstract

The invention discloses a high-precision logarithmic multiplier, which comprises the steps of firstly, inputting an operand A, B, and preprocessing an operand A, B through a preprocessing unit; then, the operand A, B exclusive or result obtained after preprocessing is input into an Error part processing unit to obtain the final result of an Error Part (EP); meanwhile, the operand A, B most significant bit k obtained after preprocessing is used₁And k₂Inputting the approximate part calculation unit to obtain the final result of the Approximate Part (AP); finally, the EP obtained by the error part processing unit and the AP obtained by the approximate part calculating unit are calculated by an adder to obtain a final result P_approx. The approximate multiplier provided by the invention achieves the optimal balance between the precision and the hardware consumption, and can obtain double-sided error distribution, which is beneficial to reducing the accumulation of errors in the application mainly based on multiplication and accumulation.

Description

High-precision logarithmic multiplier

Technical Field

The invention belongs to the technical field of approximate calculation, and particularly relates to a high-precision logarithmic multiplier.

Background

With the increasing performance of integrated circuits in recent years, power consumption and hardware complexity are becoming the bottleneck of further performance improvement, and approximate calculation is becoming a new trend to solve the bottleneck problem. There is a certain redundancy of accuracy due to many applications, such as machine learning, digital signal processing, computer vision, etc. Therefore, the circuit complexity can be simplified by sacrificing certain precision, and a high-performance and low-power-consumption efficient digital system is designed. The multiplier is the most basic arithmetic unit essential in a digital system, and thus an approximate multiplier is widely studied.

Approximate multipliers can be mainly classified into three categories, namely approximate multipliers based on traditional multipliers, approximate multipliers based on logarithmic approximation, and approximate multipliers based on Cartesian Genetic Programming (CGP).

A conventional multiplier is mainly composed of three processes: partial product generation, partial product accumulation and final carry propagation accumulation. There are many related studies of approximate multipliers based on conventional multipliers. Kulkarni proposed udm (undesignedmindusttilier), an approximate 2 x 2 multiplier as the basic module to form larger bit-wide approximate multipliers. The partial product originally generated when "11" was input should be "1001" while the approximate 2 x 2 multiplier generated a partial product of "111", thus introducing an error rate of 1/16 for the 2 x 2 multiplier. Kidambi first proposed truncation multipliers that split Part of the accumulation addition process into a Significant Part (MSP) and a non-Significant Part (LSP), and reduce the circuit area by truncating the non-Significant Part, but such an approach introduces large errors. Hashmemi proposed a Dynamic Range Unbiased Multiplier (DRUM), where k bits in an operand are computed using a precision Multiplier, the start of the k bits is selected to be dependent on the position of the first bit of the operand, and the last bit is compensated by one. If the first bit position of the operand is less than the highest bit of the k bits, the low k bits of the operand are directly selected for operation. The resource consumption of this approach is mainly due to the extra circuitry required for implementing the dynamic selection of operands. For tree-structured multipliers, there are many studies on approximate compressors. Momeni simplifies the implementation logic of an exact 4:2 compressor and proposes the implementation logic of two different approximate 4:2 compressors, one that tends to low approximation errors and one that tends to lower resource usage. DRUM is an approximate Multiplier that can dynamically select operand bit widths to obtain different precisions, and similarly, a Multiplier for high-order compressor (HOCM) proposed by d.esposito, HOCM is an approximate Multiplier that uses a dynamically selected compressor algorithm. The difference in accuracy is large due to the different kinds of approximate compressors. In order to improve the accuracy of the multiplier, when the HOCM performs the partial integration addition, the HOCM divides the partial integration into an MSP part and an LSP part, and simultaneously divides the partial integration into a plurality of stages. The number of stages can be set by itself. Since the LSP fraction has less impact on the final result, the LSPs of each stage can all use an approximate compressor to reach the number of partial products expected to be reached by the next stage. For the MSP, it has a large influence on the final result, so the MSP at each stage selects whether each column of MSP uses the precise compressor and the number of precise compressors, the kind and the number of approximate compressors according to the algorithm proposed by the HOCM. H.jiang, moreover, proposes a novel approximation adder which, unlike the conventional approximation adder, outputs a sum signal and an error signal for accumulation of partial products in accordance with two adjacent input operands. The error is compensated by adding the result of the calculation of the generated error signal to the original partial sum. A two-stage error compensation strategy is proposed to calculate the error term, the first stage of the approximation multiplier for calculating the error term being calculated entirely by an or gate, and the second stage of the approximation multiplier for calculating the error term being calculated by a portion of the or gate and an approximation adder.

The approximate multiplier based on Cartesian genetic programming is mainly proposed by Mrazek, the method firstly finds the most efficient approximate multiplier on a search space by randomly reducing the connection of internal lines based on the combinational logic of an accurate multiplier, and adds different constraint conditions to reduce the search space for different applications, although the method can obtain the efficient approximate multiplier, the method is very time-consuming.

Approximation multipliers based on logarithmic approximation are mainly implemented based on the michel algorithm, which proposes that the logarithm of a binary number is implemented by approximating itself. The integer part of the binary number is determined from its first bit, while the remaining bits are the fractional part. The original multiplication operation is converted into addition by converting the operands into logarithmic domain, and the result of the addition is subjected to anti-logarithmic operation to obtain the final approximate product. The specific algorithm flow is as follows, first all binary operands can be represented by equation (1):

where k denotes the position of the first bit in the binary number, Z_iRepresenting the value at the ith bit, j depends on the precision of the desired representation binary number, j equals one for an integer and x represents the mantissa portion. According to this equation, the multiplicand and the multiplicand can be expressed by equation (2):

thus, the product of a and B can be expressed as:

the logarithm is taken on both sides of the equation, and the product of two numbers can be expressed as the sum of the two input operand logarithms, i.e. equation (4):

log₂(A×B)＝k₁+k₂+log₂(1+x₁)+log₂(1+x₂) (4)

wherein log can be transformed₂The term (1+ x) is approximately x,

log₂(A×B)≈k₁+k₂+x₁+x₂ (5)

final approximate product result of log Multiplier LM (Logalithmic Multiplier, LM) and x₁+x₂According to x₁+x₂Whether carry signals are generated to perform inverse logarithm operation to obtain approximate product as formula (6)

The michel algorithm, while effective in reducing circuit complexity, introduces large approximation errors, which are unacceptable for many applications. There are therefore many methods proposed to improve accuracy. Mahalingam proposes an operand decomposition method, where originally two operands are decomposed into four operands, thereby reducing the number of "1" s present in each operand. This means that the chance of carry over can be reduced to improve the precision of the michel algorithm based logarithmic multiplier, but the operand decomposition also means that additional hardware circuitry is required for operand preprocessing. Nandan therefore proposes improved operand decomposition to simplify the unnecessary arithmetic logic in the original operand decomposition process. The interval linear approximation is another commonly used method for improving precision, and because the error source of the logarithmic multiplier is mainly because the logarithmic and anti-logarithmic processes cannot be accurately realized in a hardware circuit, the error conditions in the interval of the segment are respectively calculated through interval segmentation, the compensation constants are set according to different error conditions, and meanwhile, the selection of the compensation constants is also considered by combining hardware realization, and the error compensation is carried out on different interval ranges so as to reduce the error. The iterative technique is another approach to obtain a high precision approximate multiplier, which is first represented by Z.

It is proposed that the method is also based on the Michelle algorithm, but that the binary expression of the operands is deformed, the carry of the mantissa is ignored and the product of the operands is divided into an approximation part and an error part, wherein the approximation partThe error part can be realized by addition and shift operation, and the error part needs multiplication, so that the error part is divided into new approximation and the error part for iteration, the algorithm is terminated when the error part is zero, and an accurate multiplication result can be obtained. The specific algorithm flow is as follows:

in an Iterative Multiplier (IM), (1) can be rewritten as follows:

x×2^k＝N-2^k (7)

by substituting formula (7) for formula (3), formula (8) can be obtained, wherein k₁And k₂Represent the positions of the most significant bits of a and B, respectively:

equation (8) can be divided into two parts, AP and EP.

The AP can be derived from a shift operation and an add operation, while the EP needs to be computed by a multiplier. Therefore, equation (8) is applied to iteratively calculate EP. The exact product cannot be obtained until EP is zero, as shown in detail below.

E⁽⁰⁾＝C⁽¹⁾+E⁽¹⁾ (13)

E⁽ⁱ⁾＝0 (17)

The accuracy can be significantly improved by iteration compared to conventional LM's. However, the iterative computation also necessarily consumes more hardware resources

In view of the above mentioned approximation error, a compensation algorithm needs to be proposed to improve the accuracy of LM, so as to achieve the best balance between accuracy and hardware consumption.

Disclosure of Invention

The invention aims to provide a high-precision logarithmic multiplier, which improves the precision of LM and converts multiplication operation in EP into left shift operation.

In order to achieve the object of the present invention, the present invention provides a high-precision logarithmic multiplier, which comprises a preprocessing unit, an error portion calculating unit (EP calculating unit), and an approximate portion calculating unit (AP calculating unit), and the working flow thereof is specifically as follows:

firstly, the operand A, B is input into a preprocessing unit, and the operand A, B is preprocessed through a first detector module and a priority encoder;

then, inputting the operand A, B XOR result obtained after preprocessing into a data comparison module in an error part processing unit, and obtaining a final result EP of an error part through the operation of an adjacent detector module, a priority encoder and a barrel shifter;

meanwhile, the operand A, B most significant bit k obtained after preprocessing is used₁And k₂Inputting an approximate part calculation unit, and obtaining a final result AP of the approximate part through the operation of an adder, a decoder and a barrel shifter;

finally, the EP obtained by the error part processing unit and the AP obtained by the approximate part calculating unit are calculated by an adder to obtain a final result P_approx。

Further, in the pre-processing unit, two input operands A, B enter first detector module 1 and first detector module 2, respectively, at 2^kExtracts the power of the most significant bit of operand A, B

Then, the

And

k is output through a priority encoder 1 and a priority encoder 2 respectively₁And k₂(ii) a Then inputs operand A and

XOR generation

Input operand B and

XOR generation

Further, in the error portion calculating unit, the result of the exclusive or is inputted to the data comparing module in the error portion calculating unit, if

Then the intermediate variable

Whereas intermediate variables

Followed by Q₁Q is detected by a neighboring detector module in the error portion calculation unit₁Is approximately 2^kOr 2^k+1If Q is₁Bit k-1 in the binary system of (1), Q₁Is approximately 2^k+1Otherwise, the k-1 bit is 0, then Q₁Is approximately 2^kI.e. round (Q)₁) Is 2^kOr 2^k+1(ii) a Round (Q)₁) Output Q of priority encoder 3 in error portion calculation unit₁K or k +1, followed by Q₂Left shift Q by barrel shifter 3 in error portion calculation unit₁The most significant bit of (a) realizes round (Q)₁)·Q₂。

Further, in the approximate partial computation unit, the most significant bit k of operand A, B is assigned₁And k₂Generating k via adder 1₁+k₂Is then generated by a decoder

Left shift k by barrel shifter 1₂Bit obtaining

Left shift k by barrel shifter 2₁Bit obtaining

Both are obtained by an adder 2

Further, the error part is processed by the round (Q) obtained by the unit₁)·Q₂Obtained from an approximation part of the processing unit

The phase or the phase of the mixture is shown in the specification,

and

the final result is obtained by the adder 3

Compared with the prior art, the invention has the remarkable improvements that: 1) the Normalized Mean Error Distance (NMED), the Mean Relative Error Distance (MRED), and the maximum Error (Worst Case Error, WCE) are lower than other approximate multipliers. NMED, MRED can be calculated by the following formula:

ED＝|P_approx-P_exact| (27)

where N represents the total number of input operands, M represents the maximum output of the precision multiplier, and P (ED), P (RED) represent the probability of the error occurring. The invention considers the 8-bit multiplier and other 8-bit multipliers designed in the invention. The input operands ranged from 0 to 255 and all possible input operands were simulated to evaluate the performance of the multiplier designed in this invention, the results of which are shown in table I. Compared with the most accurate Improved Logarithmic Multiplier A (ILM-A), the NMED is reduced by 34%, and the MRED is reduced by 48%, which shows that the Multiplier designed in the invention can effectively improve the precision through a compensation algorithm. Additionally overestimating or underestimating larger operands in EP can achieve a bilateral error distribution as shown in FIG. 3. In the application using multiply-accumulate as the main operation, since the error has positive or negative, the generated errors may cancel each other out, so that the error over-accumulation can be avoided.

2) At P_approxOr gates are used in the calculation instead of addition. TABLE IIComparing the performance of the OR gate with that of a conventional adder, the two designs are realized by Verilog and synthesized by ISE14.7-Webpack on xc6slx16-2csg324 of Xilinx. The adder uses a Carry-look-ahead (CLA) adder, compared with the CLA, the LUT of the OR gate is reduced by 39%, the delay is reduced by 83%, and the power consumption is reduced by 74%. The use of an or gate therefore consumes less resources.

3) The multiplication operation is replaced by a left shift operation in the EP calculation unit. The invention selects to approximate the larger operand in the EP to 2 by the NOD module based on the minimized WCE strategy^kOr 2^k+1Finally shift the smaller operand left by 2^kBit or 2^k+1The bit gets the EP. Compared with multiplication, the hardware resource consumed for realizing the shift operation is less.

4) Compared with other multipliers, the multiplier designed by the invention achieves the best balance between precision and hardware consumption. The invention realizes the approximate multiplier through Verilog hardware description language, and all designs are realized only through combinational logic without a pipeline and are synthesized through ISE 14.7-Webpack. The implementation is then built on xc6slx16-2csg324 in Xilinx and all I/O is allocated to pins, power is estimated at a clock frequency of 50 MHz. The results of the comparison are shown in Table III. The modified log multiplier a (ILM-a) in table III refers to a log multiplier using the original NOD block, and the modified log multiplier B (ILM-B) refers to a log multiplier using the simplified NOD block. Compared with ILM-A, the design has better hardware performance, simultaneously NMED, MRED and WCE are respectively 34%, 48% and 25% lower than ILM-A, and the compensation algorithm can remarkably improve the precision of the multiplier on the premise of not sacrificing hardware resources. Although power consumption is increased by 22% compared to ILM-B, the range of input operands is not limited in this design and is more accurate than ILM-B. The design is therefore suitable for more applications. The NMED, MRED and WCE of the present invention were reduced by 53%, 61% and 25%, respectively, compared to the Logarithmic Multiplier (LM). In addition, the invention also calculates the product of PDP and NMED and the product of PDP and MRED to prove that the design is efficient. Table III shows that PDP by NMED and PDP by MRED of the present design are the smallest among the existing logarithmic multipliers. Furthermore, the MRED versus PDP relationship for the logarithmic multiplier considered is shown in FIG. 4, with different values for PDP-MRED represented by different dashed lines. PDP-MRED min means that the multiplier reaches the best balance between hardware consumption and accuracy.

To more clearly illustrate the functional characteristics and structural parameters of the present invention, the following description is given with reference to the accompanying drawings and the detailed description.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a schematic diagram of the overall structure of the present invention;

FIG. 2 is a schematic diagram of an error portion processing unit according to the present invention;

FIG. 3 is a schematic diagram of a two-sided error distribution generated by the present invention;

FIG. 4 is a graph of MRED versus PDP for various types of logarithmic multipliers;

FIG. 5 is a gate level circuit diagram of a first detector module;

FIG. 6 is a gate level circuit diagram of an adjacent detector module.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments; all other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention relates to a high-precision logarithmic multiplier, which comprises a preprocessing unit, an error part calculating unit (EP calculating unit) and an approximate part calculating unit (AP calculating unit), wherein the overall flow algorithm 1 is as follows:

(1) a, B n-bit input operand, P_approxApproximate product of A x B

(2) First detector (LOD):

priority Encoder (PE):

(3) first detector (LOD):

priority Encoder (PE):

(4).

(5).

(6).round(Q₁)·Q₂calculated by Algorithm 2

(7).

(8).

(9) Coding of

(10).

The working process is as follows:

firstly, the operand A, B is input, and the operand A, B is preprocessed by the first detector module and the priority encoder;

then, the operand A, B XOR result obtained after the pre-processing is input to the data comparison module, the adjacent detector module, the priority encoder, and the barrel shifter in the error part processing unit to obtain the final result round (Q) of the error part (Errorpat, EP)₁)·Q₂。

Meanwhile, the operand A, B most significant bit k obtained after preprocessing is used₁And k₂Inputting the approximate part calculation unit, and obtaining the final result of the Approximate Part (AP) through the calculation of an adder, a decoder and a barrel shifter

Finally, the error part is processed by the round (Q) obtained by the unit₁)·Q₂Obtained from an approximation part of the processing unit

Performing phase OR operation by an adder to obtain a final result P_approx。

Example 1

As shown in FIGS. 1 and 2, in the data pre-processing unit, two input operands A, B enter the first detector module 1 and the first detector module 2, respectively, and are counted as 2^kExtracts the power of the most significant bit of operand A, B

Followed by

And

XOR generation

Input operand B and

XOR generation

In the processing stage of the error part computing unit, the result after the XOR is input into a data comparison module in the error part computing unit, if the result is not the same as the result of the XOR, the error part computing unit is used for processing the result

Then the intermediate variable

Whereas intermediate variables

Followed by Q₁Q is detected by a neighboring detector module in the error portion calculation unit₁Is approximately 2^kOr 2^k+1If Q is₁Bit k-1 in the binary system of (1), Q₁Is approximately 2^k+1Otherwise, the k-1 bit is 0, then Q₁Is approximately 2^kI.e. round (Q)₁) Is 2^kOr 2^k ⁺¹(ii) a Round (Q)₁) By priority encoder output Q in error portion calculation unit₁K or k +1, followed by Q₂Left shift Q of barrel shifter in error part calculation unit₁The most significant bit of (a) realizes round (Q)₁)·Q₂。

At the approximate partial compute unit processing stage, the most significant bit k of operand A, B is taken₁And k₂Generating k via adder 1₁+k₂Is then generated by a decoder

Left shift k by barrel shifter 1₂Bit obtaining

Left shift k by barrel shifter 2₁Bit obtaining

Both are obtained by an adder 2

Processing the error part obtained by the unit₁)·Q₂Derived from an approximation part of the processing unit

The phase or the phase of the mixture is shown in the specification,

and

the final result is obtained through the adder 3

Specifically, in the embodiment, the data comparison module compares two operands, the larger operand being Q₁Smaller is Q₂(ii) a Adjacent detector module detecting Q₁If the k-1 bit appears to be a "1", the larger operand Q₁Overestimation of 2^k+1Else, the larger operand Q₁Underestimate of 2^kThen Q is shifted by a barrel shifter₂The round (Q) is obtained by shifting k +1 bit or k bit to the left₁)·Q₂. Priority encoder handle Q₁Encoding to round (Q)₁) (ii) a Due to round (Q)₁)·Q₂Is in a certain ratio

Is small, therefore

Carry-out is not generated, and the addition effect is realized by using an OR gate instead of an adder. Will be provided with

And round (Q)₁)·Q₂Both are in phase or are obtained

The use of an or gate enables both compensation of errors and consumes less resources than an adder.

Specifically, in the present embodiment, the product P is approximated_approxAnd (3) calculating: the result round (Q) produced in the final EP calculation Unit₁)·Q₂And

is in phase or is obtained

Then with

The approximate product P is obtained by the adder 3_approx。

Is unique in round (Q)₁)·Q₂Ratio of ever to time

Is small, therefore

No carry is generated, then the effect of using an or gate and an adder is the same, and fewer hardware resources are used for the or gate.

Specifically, in the present embodiment, the purpose of the first detector module (LOD module) is to increment the most significant bit k of the input operand A, B by 2^kIs extracted. And 4-bit LOD detection modules are cascaded into high-order LOD modules. 4bit LOThe D-gate stage circuit diagram is shown in fig. 5, with a total of 3 three-input multiplexers and 3 and gates, with the output of each three-input multiplexer being coupled to the input of each and gate. Assume that the input operand is d and the output result is Z. The third bit (most significant bit) of Z is equal to the third bit (most significant bit) of d, and the second bit of Z is equal to the second bit of d and the output of a three-input multiplexer. The principle of the three-input selector is to output 1 when the third bit of d is 0 and 0 when the third bit of d is 1. Similarly, the first bit of Z is equal to the first bit of d and the output of a three-input multiplexer is AND-ed. The three-input selector outputs the output of the first three-input selector when the second bit of d is 0, and outputs 0 when the second bit of d is 1. The lowest bit of Z is equal to the lowest bit of d and the output of a three-input multiplexer is anded. The three-input selector outputs the output of the second multiplexer when the first bit of d is 0, and outputs 0 when the first bit of d is 1.

Specifically, in this embodiment, the purpose of the priority encoder module is to assign 2^kK in (3) is detected. Assuming that the input operand is D and the output operand is Z, a multiplexer is used, if D is equal to 2⁰Then the output Z is 0 and D is equal to 2¹ Z outputs 1, recursion continues until D equals 2⁷And Z outputs 7.

In particular, in the present embodiment, the purpose of the barrel shift module is to perform a shift operation.

Is that the product of

Shift k to the left₂Bit, same reason

That is to say, will

Shift k to the left₁A bit.

In particular, in the present embodiment, the purpose of the decoder moduleIs to make k₁+k₂Is converted into

Assuming that the input operand is D and the output operand is Z, a multiplexer is used, and if D is equal to 0, the output Z is 2⁰D equals 1, Z outputs 2¹Recursion continues until D equals 7, Z outputs 2⁷。

Specifically, in the present embodiment, the adder uses a carry-look-ahead adder. The difference between the carry look-ahead adder and the ordinary adder is that the ordinary adder needs to wait for the carry information transmitted by the low-order full adder to perform calculation, and if the number of stages is high, the combinational logic delay is too long. The advanced carry adder is unique in that the carry information can be directly calculated and sent to each stage through the input operand without waiting for the transmission after the carry information is calculated by the next stage. Assume that the input operand is A, B, the input carry is CIN, the output is S, and the output carry is CO. The carry look ahead adder principle is as follows:

g＝A&B (29)

p＝A|B (30)

c[0]＝g[0]|(p[0]&CIN) (31)

c_i+1＝g_i|p_i&c_i,i＝1,2,3,...,k-1 (32)

S＝A^B^{c[k-1:0],CIN} (33)

CO＝c[k] (34)

the variable c is the carry information of each bit, and the CLA with large bit width can be cascaded by the CLA with small bit width.

Specifically, in the present embodiment, the purpose of the data comparison module is to compare Q₁And Q₂The magnitude of the two values. As shown in fig. 2, x and y are input operands, Q₁、Q₂This is obtained from the following equation:

specifically, in the present embodiment, the purpose of the adjacent one detector module (NOD module) is to approximate the input operand A, B to 2^kOr 2^k+1. As shown in FIG. 2, if NOD determines that bit k-1 of a k-bit operand is 1, the operand is approximated to be 2^k+1And if the k-1 position is 0, it is 2^k. Fig. 6 shows a 16-bit gate stage circuit diagram of the NOD detection module, which has a total of 61 and gates, 14 or gates, and 44 not gates, and assumes that the input operand is I and the output operand is O. The calculation method of each bit of the output O is shown as the following formula:

o₁₆＝I₁₅&I₁₄ (40)

specifically, in this embodiment, the or gate adds the EP end result to the AP instead of the addition operation.

Example 2

As can be seen from equation (9), in the EP calculation unit in the prior art, the error of the logarithmic multiplier is caused by the EP, and therefore, a compensation algorithm based on the WCE is proposed to estimate the EP. The larger operand in EP is approximately a power of 2 and the other operand is unchanged so that the multiplication in EP can be replaced with a shift operation. Implementing the shift operation uses less hardware resources than multiplication. The details of algorithm 2 are as follows:

else

endif

calculating k: Q₁The position of the first bit:

if k＝0

EP＝Q₂

else if Q₁[k-1]＝1

EP＝Q₂＜＜(k+1)

else

EP＝Q₂＜＜k

endif

first, the sizes of two operands are compared, the larger operand is Q₁The smaller operand is Q₂。Q₁Passing through NOD module, if Q₁The k-1 bit of (A) exhibits a value of "1", then Q is added₁Overestimation of 2^k+1Otherwise, Q will be₁Underestimate of 2^kThen may pass through another operand Q₂EP is obtained by shifting left k bits or k +1 bits.

Specifically, in the present embodiment, the selection strategy of the operand is based on the WCE minimization principle, and the error of the algorithm is caused by the approximation of the operand. For an n-bit multiplier, the Rounding Error (RE) at different values of k can be expressed as:

k represents the most significant bit of the operand.

The present invention analyzes WCE in both cases. The first case is to select smaller operands for approximation, i.e.

According to equation (18), the maximum rounding error is 2^n-3. Suppose Q₁Is less than Q₂Of then Q₂Is in the range of 0 to 2^n-1-1. Therefore, the WCE of the approximate multiplier can be calculated by the following equation:

thus, for an 8-bit multiplier, the WCE of the proposed multiplier is 4064.

The second case is to select larger operands for approximation, i.e.

To satisfy Q₁Is greater than or equal to Q₂Due to Q₁Maximum value of 2^k+2^k-1Then Q₂Is in the range of 0 to 2^k+2^k-1. Therefore, the WCE of the approximate multiplier can be calculated by the following equation:

thus, for an 8-bit multiplier, the WCE of the proposed multiplier for the second case is 3072.

The WCE is smaller in the second case than in the first case. Therefore, in the proposed compensation algorithm, the larger value of the operand is approximated in EP.

Example 3

As can be seen from equation (9), the AP portion of the multiplier is composed of three parts. Namely, it is

The first part

Operands A and B are generated via the LOD module

And

after passing through the priority encoder, k is generated₁And k₂。k₁、k₂Generating k by adder 1₁+k₂。k₁+k₂Finally generated by a decoder

The second part

A and

XOR generation

Left shift k through barrel shifter₂Bit obtaining

Third part

B and

XOR generation

Left shift k through barrel shifter₁Bit obtaining

Generated by adder 2

Final approximate product

Approximate product P_approxAnd (3) calculating: the approximate product can be calculated by the following equation:

obviously, two adders are required to generate the AP and an additional adder is required to compensate for the EP, but due to the characteristic that adding the EP final result to the AP does not generate a carry, the addition effect can be achieved by using an or gate instead of the adder. The method comprises the following specific steps:

q is generated through LOD module₁、Q₂And greater Q₁Is approximately 2^kOr 2^k+1. Suppose that

The following inequality can be derived:

and round (Q)₁)·Q₂Are all in a one-hot coded form and because of round (Q)₁)·Q₂Is less than

Therefore, it is not only easy to use

And round (Q)₁)·Q₂The addition does not result in a carry. Thus, it is possible to provide

And round (Q)₁)·Q₂The sum of (a) and (b) may be operated using an or gate instead of an adder. Equation (21) can thus be written as:

example 4

The NMED and MRED of the invention can be calculated by the following formulas:

ED＝|P_approx-P_exact| (27)

where N represents the total number of input operands, M represents the maximum output of the precision multiplier, and P (ED), P (RED) represent the probability of the error occurring. The invention considers the 8-bit multiplier and other 8-bit multipliers designed in the invention. The input operands ranged from 0 to 255 and all possible input operands were simulated to evaluate the performance of the multiplier designed in this invention, the results of which are shown in table I.

TABLE I

Error metric for logarithmic multiplier

Compared with the most accurate Improved Logarithmic Multiplier A (ILM-A), the NMED is reduced by 34%, and the MRED is reduced by 48%, which shows that the Multiplier designed in the invention can effectively improve the precision through a compensation algorithm.

Referring to fig. 3, fig. 3 is a schematic diagram of the bilateral error distribution generated by the present invention, which can be obtained by overestimating or underestimating the larger operand in the EP according to the present invention, as shown in fig. 3. In the application using multiply-accumulate as the main operation, since the error has positive or negative, the generated errors may cancel each other out, so that the error over-accumulation can be avoided.

At P_approxAnd the addition operation is replaced by an OR gate in the calculation module. Table II shows a comparison of the performance of OR gates with conventional adders, both designs implemented by Verilog and synthesized with ISE14.7-Webpack on xc6slx16-2csg324 of Xilinx. The adder uses a Carry-look-ahead (CLA) adder, compared with the CLA, the LUT of the OR gate is reduced by 39%, the delay is reduced by 83%, and the power consumption is reduced by 74%. Thus using an or gate consumes less resources.

TABLE II

Hardware comparison of CLA AND OR gates

The invention realizes the approximate multiplier through Verilog hardware description language, and all designs are realized only through combinational logic without a pipeline and are synthesized through ISE 14.7-Webpack. The implementation is then built on xc6slx16-2csg324 in Xilinx and all I/O is allocated to pins, power is estimated at a clock frequency of 50 MHz. The results of the comparison are shown in Table III.

TABLE III

Hardware index of logarithmic multiplier

The modified log multiplier a (ILM-a) in table III refers to a log multiplier using the original NOD block, and the modified log multiplier B (ILM-B) refers to a log multiplier using the simplified NOD block. Compared with ILM-A, the design has better hardware performance, simultaneously NMED, MRED and WCE are respectively 34%, 48% and 25% lower than ILM-A, and the compensation algorithm can remarkably improve the precision of the multiplier on the premise of not sacrificing hardware resources. Although power consumption is increased by 22% compared to ILM-B, the range of input operands is not limited in this design and is more accurate than ILM-B. The design is therefore suitable for more applications. Compared with a Logarithmic Multiplier (LM), the NMED, MRED and WCE of the design are respectively reduced by 53%, 61% and 25%. In addition, the invention also calculates the product of PDP and NMED and the product of PDP and MRED to prove that the design is efficient. Table III shows that PDP by NMED and PDP by MRED of the present design are the smallest among the existing logarithmic multipliers.

Furthermore, the MRED versus PDP relationship for the logarithmic multiplier considered is shown in FIG. 4, with different values for PDP-MRED represented by different dashed lines. The smaller the PDP, the larger the MRED of the approximate multiplier, and PDP-MRED represents one unit of measure of the multiplier cost performance. The smaller the PDP-MRED is, the less resources are consumed by hardware on the premise that the multiplier has the same precision. The bottom dotted line in the HPLM body diagram designed by the invention shows that the PDP-MRED minimum of the approximate multiplier designed by the invention is shown, namely the approximate multiplier designed by the invention achieves the optimal balance between hardware consumption and precision.

It is to be noted that, in the present invention, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A high-precision logarithmic multiplier is characterized by comprising a preprocessing unit, an error part calculating unit and an approximate part calculating unit, and the working flow of the high-precision logarithmic multiplier is as follows:

finally, the EP obtained by the error part processing unit and the AP obtained by the approximate part calculating unit are operated by an adder to obtain a final result P_approx。

2. A high precision logarithmic multiplier as claimed in claim 1 wherein in the preprocessing unit two input operands A, B enter first detector block 1 and first detector block 2, respectively, at 2^kExtracts the most significant bit of the operand A, B as a power of

Followed by

And

XOR generation

Input operand B and

XOR generation

3. A high precision logarithmic multiplier as claimed in claim 2, wherein in the error portion calculating unit, the result of the exclusive or is inputted to the data comparing module in the error portion calculating unit if

Then the intermediate variable

Whereas intermediate variables

Followed by Q₁Q is detected by a neighboring detector module in the error portion calculation unit₁Is approximately 2^kOr 2^k+1If Q is₁Bit k-1 in the binary system of (1), Q₁Is approximately 2^k+1Otherwise, the k-1 bit is 0, then Q₁Is approximately 2^kI.e. round (Q)₁) Is 2^kOr 2^k+1(ii) a Round (Q)₁) Output Q of priority encoder 3 in error portion calculation unit₁The most significant bit k or k +1,immediately after Q₂Left shift Q by barrel shifter 3 in error portion calculation unit₁The most significant bit of (a) realizes round (Q)₁)·Q₂。

4. A high precision logarithmic multiplier as claimed in claim 2, characterized in that the most significant bit k of the operand A, B is divided into k in the approximate partial computation unit₁And k₂Generating k via adder 1₁+k₂Is then generated by a decoder

Left shift k through barrel shifter 1₂Bit obtaining

Left shift k by barrel shifter 2₁Bit obtaining

Both are obtained by an adder 2

5. A high precision logarithmic multiplier as claimed in claim 2, characterized in that the error part is processed by the unit derived round (Q)₁)·Q₂Obtained from an approximation part of the processing unit

The phase or the phase of the mixture is shown in the specification,

the final result is obtained by the adder 3