CN114610268A - High-precision logarithmic multiplier - Google Patents

High-precision logarithmic multiplier Download PDF

Info

Publication number
CN114610268A
CN114610268A CN202210231433.2A CN202210231433A CN114610268A CN 114610268 A CN114610268 A CN 114610268A CN 202210231433 A CN202210231433 A CN 202210231433A CN 114610268 A CN114610268 A CN 114610268A
Authority
CN
China
Prior art keywords
operand
error
multiplier
bit
approximate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210231433.2A
Other languages
Chinese (zh)
Inventor
孙大鹰
秦力成
王冲
周义其
顾文华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology
Original Assignee
Nanjing University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and Technology filed Critical Nanjing University of Science and Technology
Priority to CN202210231433.2A priority Critical patent/CN114610268A/en
Publication of CN114610268A publication Critical patent/CN114610268A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/57Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups G06F7/483 – G06F7/556 or for performing logical operations

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a high-precision logarithmic multiplier, which comprises the steps of firstly, inputting an operand A, B, and preprocessing an operand A, B through a preprocessing unit; then, the operand A, B exclusive or result obtained after preprocessing is input into an Error part processing unit to obtain the final result of an Error Part (EP); meanwhile, the operand A, B most significant bit k obtained after preprocessing is used1And k2Inputting the approximate part calculation unit to obtain the final result of the Approximate Part (AP); finally, the EP obtained by the error part processing unit and the AP obtained by the approximate part calculating unit are calculated by an adder to obtain a final result Papprox. The approximate multiplier provided by the invention achieves the optimal balance between the precision and the hardware consumption, and can obtain double-sided error distribution, which is beneficial to reducing the accumulation of errors in the application mainly based on multiplication and accumulation.

Description

High-precision logarithmic multiplier
Technical Field
The invention belongs to the technical field of approximate calculation, and particularly relates to a high-precision logarithmic multiplier.
Background
With the increasing performance of integrated circuits in recent years, power consumption and hardware complexity are becoming the bottleneck of further performance improvement, and approximate calculation is becoming a new trend to solve the bottleneck problem. There is a certain redundancy of accuracy due to many applications, such as machine learning, digital signal processing, computer vision, etc. Therefore, the circuit complexity can be simplified by sacrificing certain precision, and a high-performance and low-power-consumption efficient digital system is designed. The multiplier is the most basic arithmetic unit essential in a digital system, and thus an approximate multiplier is widely studied.
Approximate multipliers can be mainly classified into three categories, namely approximate multipliers based on traditional multipliers, approximate multipliers based on logarithmic approximation, and approximate multipliers based on Cartesian Genetic Programming (CGP).
A conventional multiplier is mainly composed of three processes: partial product generation, partial product accumulation and final carry propagation accumulation. There are many related studies of approximate multipliers based on conventional multipliers. Kulkarni proposed udm (undesignedmindusttilier), an approximate 2 x 2 multiplier as the basic module to form larger bit-wide approximate multipliers. The partial product originally generated when "11" was input should be "1001" while the approximate 2 x 2 multiplier generated a partial product of "111", thus introducing an error rate of 1/16 for the 2 x 2 multiplier. Kidambi first proposed truncation multipliers that split Part of the accumulation addition process into a Significant Part (MSP) and a non-Significant Part (LSP), and reduce the circuit area by truncating the non-Significant Part, but such an approach introduces large errors. Hashmemi proposed a Dynamic Range Unbiased Multiplier (DRUM), where k bits in an operand are computed using a precision Multiplier, the start of the k bits is selected to be dependent on the position of the first bit of the operand, and the last bit is compensated by one. If the first bit position of the operand is less than the highest bit of the k bits, the low k bits of the operand are directly selected for operation. The resource consumption of this approach is mainly due to the extra circuitry required for implementing the dynamic selection of operands. For tree-structured multipliers, there are many studies on approximate compressors. Momeni simplifies the implementation logic of an exact 4:2 compressor and proposes the implementation logic of two different approximate 4:2 compressors, one that tends to low approximation errors and one that tends to lower resource usage. DRUM is an approximate Multiplier that can dynamically select operand bit widths to obtain different precisions, and similarly, a Multiplier for high-order compressor (HOCM) proposed by d.esposito, HOCM is an approximate Multiplier that uses a dynamically selected compressor algorithm. The difference in accuracy is large due to the different kinds of approximate compressors. In order to improve the accuracy of the multiplier, when the HOCM performs the partial integration addition, the HOCM divides the partial integration into an MSP part and an LSP part, and simultaneously divides the partial integration into a plurality of stages. The number of stages can be set by itself. Since the LSP fraction has less impact on the final result, the LSPs of each stage can all use an approximate compressor to reach the number of partial products expected to be reached by the next stage. For the MSP, it has a large influence on the final result, so the MSP at each stage selects whether each column of MSP uses the precise compressor and the number of precise compressors, the kind and the number of approximate compressors according to the algorithm proposed by the HOCM. H.jiang, moreover, proposes a novel approximation adder which, unlike the conventional approximation adder, outputs a sum signal and an error signal for accumulation of partial products in accordance with two adjacent input operands. The error is compensated by adding the result of the calculation of the generated error signal to the original partial sum. A two-stage error compensation strategy is proposed to calculate the error term, the first stage of the approximation multiplier for calculating the error term being calculated entirely by an or gate, and the second stage of the approximation multiplier for calculating the error term being calculated by a portion of the or gate and an approximation adder.
The approximate multiplier based on Cartesian genetic programming is mainly proposed by Mrazek, the method firstly finds the most efficient approximate multiplier on a search space by randomly reducing the connection of internal lines based on the combinational logic of an accurate multiplier, and adds different constraint conditions to reduce the search space for different applications, although the method can obtain the efficient approximate multiplier, the method is very time-consuming.
Approximation multipliers based on logarithmic approximation are mainly implemented based on the michel algorithm, which proposes that the logarithm of a binary number is implemented by approximating itself. The integer part of the binary number is determined from its first bit, while the remaining bits are the fractional part. The original multiplication operation is converted into addition by converting the operands into logarithmic domain, and the result of the addition is subjected to anti-logarithmic operation to obtain the final approximate product. The specific algorithm flow is as follows, first all binary operands can be represented by equation (1):
Figure BDA0003538507730000021
where k denotes the position of the first bit in the binary number, ZiRepresenting the value at the ith bit, j depends on the precision of the desired representation binary number, j equals one for an integer and x represents the mantissa portion. According to this equation, the multiplicand and the multiplicand can be expressed by equation (2):
Figure BDA0003538507730000031
Figure BDA0003538507730000032
thus, the product of a and B can be expressed as:
Figure BDA0003538507730000033
the logarithm is taken on both sides of the equation, and the product of two numbers can be expressed as the sum of the two input operand logarithms, i.e. equation (4):
log2(A×B)=k1+k2+log2(1+x1)+log2(1+x2) (4)
wherein log can be transformed2The term (1+ x) is approximately x,
log2(A×B)≈k1+k2+x1+x2 (5)
final approximate product result of log Multiplier LM (Logalithmic Multiplier, LM) and x1+x2According to x1+x2Whether carry signals are generated to perform inverse logarithm operation to obtain approximate product as formula (6)
Figure BDA0003538507730000034
The michel algorithm, while effective in reducing circuit complexity, introduces large approximation errors, which are unacceptable for many applications. There are therefore many methods proposed to improve accuracy. Mahalingam proposes an operand decomposition method, where originally two operands are decomposed into four operands, thereby reducing the number of "1" s present in each operand. This means that the chance of carry over can be reduced to improve the precision of the michel algorithm based logarithmic multiplier, but the operand decomposition also means that additional hardware circuitry is required for operand preprocessing. Nandan therefore proposes improved operand decomposition to simplify the unnecessary arithmetic logic in the original operand decomposition process. The interval linear approximation is another commonly used method for improving precision, and because the error source of the logarithmic multiplier is mainly because the logarithmic and anti-logarithmic processes cannot be accurately realized in a hardware circuit, the error conditions in the interval of the segment are respectively calculated through interval segmentation, the compensation constants are set according to different error conditions, and meanwhile, the selection of the compensation constants is also considered by combining hardware realization, and the error compensation is carried out on different interval ranges so as to reduce the error. The iterative technique is another approach to obtain a high precision approximate multiplier, which is first represented by Z.
Figure BDA0003538507730000049
It is proposed that the method is also based on the Michelle algorithm, but that the binary expression of the operands is deformed, the carry of the mantissa is ignored and the product of the operands is divided into an approximation part and an error part, wherein the approximation partThe error part can be realized by addition and shift operation, and the error part needs multiplication, so that the error part is divided into new approximation and the error part for iteration, the algorithm is terminated when the error part is zero, and an accurate multiplication result can be obtained. The specific algorithm flow is as follows:
in an Iterative Multiplier (IM), (1) can be rewritten as follows:
x×2k=N-2k (7)
by substituting formula (7) for formula (3), formula (8) can be obtained, wherein k1And k2Represent the positions of the most significant bits of a and B, respectively:
Figure BDA0003538507730000041
equation (8) can be divided into two parts, AP and EP.
Figure BDA0003538507730000042
The AP can be derived from a shift operation and an add operation, while the EP needs to be computed by a multiplier. Therefore, equation (8) is applied to iteratively calculate EP. The exact product cannot be obtained until EP is zero, as shown in detail below.
Figure BDA0003538507730000043
Figure BDA0003538507730000044
Figure BDA0003538507730000045
E(0)=C(1)+E(1) (13)
Figure BDA0003538507730000046
Figure BDA0003538507730000047
Figure BDA0003538507730000048
E(i)=0 (17)
The accuracy can be significantly improved by iteration compared to conventional LM's. However, the iterative computation also necessarily consumes more hardware resources
In view of the above mentioned approximation error, a compensation algorithm needs to be proposed to improve the accuracy of LM, so as to achieve the best balance between accuracy and hardware consumption.
Disclosure of Invention
The invention aims to provide a high-precision logarithmic multiplier, which improves the precision of LM and converts multiplication operation in EP into left shift operation.
In order to achieve the object of the present invention, the present invention provides a high-precision logarithmic multiplier, which comprises a preprocessing unit, an error portion calculating unit (EP calculating unit), and an approximate portion calculating unit (AP calculating unit), and the working flow thereof is specifically as follows:
firstly, the operand A, B is input into a preprocessing unit, and the operand A, B is preprocessed through a first detector module and a priority encoder;
then, inputting the operand A, B XOR result obtained after preprocessing into a data comparison module in an error part processing unit, and obtaining a final result EP of an error part through the operation of an adjacent detector module, a priority encoder and a barrel shifter;
meanwhile, the operand A, B most significant bit k obtained after preprocessing is used1And k2Inputting an approximate part calculation unit, and obtaining a final result AP of the approximate part through the operation of an adder, a decoder and a barrel shifter;
finally, the EP obtained by the error part processing unit and the AP obtained by the approximate part calculating unit are calculated by an adder to obtain a final result Papprox
Further, in the pre-processing unit, two input operands A, B enter first detector module 1 and first detector module 2, respectively, at 2kExtracts the power of the most significant bit of operand A, B
Figure BDA0003538507730000051
Then, the
Figure BDA0003538507730000052
And
Figure BDA0003538507730000053
k is output through a priority encoder 1 and a priority encoder 2 respectively1And k2(ii) a Then inputs operand A and
Figure BDA0003538507730000054
XOR generation
Figure BDA0003538507730000055
Input operand B and
Figure BDA0003538507730000056
XOR generation
Figure BDA0003538507730000057
Further, in the error portion calculating unit, the result of the exclusive or is inputted to the data comparing module in the error portion calculating unit, if
Figure BDA0003538507730000061
Then the intermediate variable
Figure BDA0003538507730000062
Whereas intermediate variables
Figure BDA0003538507730000063
Followed by Q1Q is detected by a neighboring detector module in the error portion calculation unit1Is approximately 2kOr 2k+1If Q is1Bit k-1 in the binary system of (1), Q1Is approximately 2k+1Otherwise, the k-1 bit is 0, then Q1Is approximately 2kI.e. round (Q)1) Is 2kOr 2k+1(ii) a Round (Q)1) Output Q of priority encoder 3 in error portion calculation unit1K or k +1, followed by Q2Left shift Q by barrel shifter 3 in error portion calculation unit1The most significant bit of (a) realizes round (Q)1)·Q2
Further, in the approximate partial computation unit, the most significant bit k of operand A, B is assigned1And k2Generating k via adder 11+k2Is then generated by a decoder
Figure BDA0003538507730000064
Left shift k by barrel shifter 12Bit obtaining
Figure BDA0003538507730000065
Left shift k by barrel shifter 21Bit obtaining
Figure BDA0003538507730000066
Both are obtained by an adder 2
Figure BDA0003538507730000067
Further, the error part is processed by the round (Q) obtained by the unit1)·Q2Obtained from an approximation part of the processing unit
Figure BDA0003538507730000068
The phase or the phase of the mixture is shown in the specification,
Figure BDA0003538507730000069
and
Figure BDA00035385077300000610
the final result is obtained by the adder 3
Figure BDA00035385077300000611
Compared with the prior art, the invention has the remarkable improvements that: 1) the Normalized Mean Error Distance (NMED), the Mean Relative Error Distance (MRED), and the maximum Error (Worst Case Error, WCE) are lower than other approximate multipliers. NMED, MRED can be calculated by the following formula:
Figure BDA00035385077300000612
ED=|Papprox-Pexact| (27)
Figure BDA0003538507730000071
where N represents the total number of input operands, M represents the maximum output of the precision multiplier, and P (ED), P (RED) represent the probability of the error occurring. The invention considers the 8-bit multiplier and other 8-bit multipliers designed in the invention. The input operands ranged from 0 to 255 and all possible input operands were simulated to evaluate the performance of the multiplier designed in this invention, the results of which are shown in table I. Compared with the most accurate Improved Logarithmic Multiplier A (ILM-A), the NMED is reduced by 34%, and the MRED is reduced by 48%, which shows that the Multiplier designed in the invention can effectively improve the precision through a compensation algorithm. Additionally overestimating or underestimating larger operands in EP can achieve a bilateral error distribution as shown in FIG. 3. In the application using multiply-accumulate as the main operation, since the error has positive or negative, the generated errors may cancel each other out, so that the error over-accumulation can be avoided.
2) At PapproxOr gates are used in the calculation instead of addition. TABLE IIComparing the performance of the OR gate with that of a conventional adder, the two designs are realized by Verilog and synthesized by ISE14.7-Webpack on xc6slx16-2csg324 of Xilinx. The adder uses a Carry-look-ahead (CLA) adder, compared with the CLA, the LUT of the OR gate is reduced by 39%, the delay is reduced by 83%, and the power consumption is reduced by 74%. The use of an or gate therefore consumes less resources.
3) The multiplication operation is replaced by a left shift operation in the EP calculation unit. The invention selects to approximate the larger operand in the EP to 2 by the NOD module based on the minimized WCE strategykOr 2k+1Finally shift the smaller operand left by 2kBit or 2k+1The bit gets the EP. Compared with multiplication, the hardware resource consumed for realizing the shift operation is less.
4) Compared with other multipliers, the multiplier designed by the invention achieves the best balance between precision and hardware consumption. The invention realizes the approximate multiplier through Verilog hardware description language, and all designs are realized only through combinational logic without a pipeline and are synthesized through ISE 14.7-Webpack. The implementation is then built on xc6slx16-2csg324 in Xilinx and all I/O is allocated to pins, power is estimated at a clock frequency of 50 MHz. The results of the comparison are shown in Table III. The modified log multiplier a (ILM-a) in table III refers to a log multiplier using the original NOD block, and the modified log multiplier B (ILM-B) refers to a log multiplier using the simplified NOD block. Compared with ILM-A, the design has better hardware performance, simultaneously NMED, MRED and WCE are respectively 34%, 48% and 25% lower than ILM-A, and the compensation algorithm can remarkably improve the precision of the multiplier on the premise of not sacrificing hardware resources. Although power consumption is increased by 22% compared to ILM-B, the range of input operands is not limited in this design and is more accurate than ILM-B. The design is therefore suitable for more applications. The NMED, MRED and WCE of the present invention were reduced by 53%, 61% and 25%, respectively, compared to the Logarithmic Multiplier (LM). In addition, the invention also calculates the product of PDP and NMED and the product of PDP and MRED to prove that the design is efficient. Table III shows that PDP by NMED and PDP by MRED of the present design are the smallest among the existing logarithmic multipliers. Furthermore, the MRED versus PDP relationship for the logarithmic multiplier considered is shown in FIG. 4, with different values for PDP-MRED represented by different dashed lines. PDP-MRED min means that the multiplier reaches the best balance between hardware consumption and accuracy.
To more clearly illustrate the functional characteristics and structural parameters of the present invention, the following description is given with reference to the accompanying drawings and the detailed description.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a schematic diagram of the overall structure of the present invention;
FIG. 2 is a schematic diagram of an error portion processing unit according to the present invention;
FIG. 3 is a schematic diagram of a two-sided error distribution generated by the present invention;
FIG. 4 is a graph of MRED versus PDP for various types of logarithmic multipliers;
FIG. 5 is a gate level circuit diagram of a first detector module;
FIG. 6 is a gate level circuit diagram of an adjacent detector module.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments; all other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention relates to a high-precision logarithmic multiplier, which comprises a preprocessing unit, an error part calculating unit (EP calculating unit) and an approximate part calculating unit (AP calculating unit), wherein the overall flow algorithm 1 is as follows:
(1) a, B n-bit input operand, PapproxApproximate product of A x B
(2) First detector (LOD):
Figure BDA0003538507730000091
priority Encoder (PE):
Figure BDA0003538507730000092
(3) first detector (LOD):
Figure BDA0003538507730000093
priority Encoder (PE):
Figure BDA0003538507730000094
(4).
Figure BDA0003538507730000095
(5).
Figure BDA0003538507730000096
(6).round(Q1)·Q2calculated by Algorithm 2
(7).
Figure BDA0003538507730000097
(8).
Figure BDA0003538507730000098
(9) Coding of
Figure BDA0003538507730000099
(10).
Figure BDA00035385077300000910
The working process is as follows:
firstly, the operand A, B is input, and the operand A, B is preprocessed by the first detector module and the priority encoder;
then, the operand A, B XOR result obtained after the pre-processing is input to the data comparison module, the adjacent detector module, the priority encoder, and the barrel shifter in the error part processing unit to obtain the final result round (Q) of the error part (Errorpat, EP)1)·Q2
Meanwhile, the operand A, B most significant bit k obtained after preprocessing is used1And k2Inputting the approximate part calculation unit, and obtaining the final result of the Approximate Part (AP) through the calculation of an adder, a decoder and a barrel shifter
Figure BDA00035385077300000911
Finally, the error part is processed by the round (Q) obtained by the unit1)·Q2Obtained from an approximation part of the processing unit
Figure BDA00035385077300000912
Performing phase OR operation by an adder to obtain a final result Papprox
Example 1
As shown in FIGS. 1 and 2, in the data pre-processing unit, two input operands A, B enter the first detector module 1 and the first detector module 2, respectively, and are counted as 2kExtracts the power of the most significant bit of operand A, B
Figure BDA0003538507730000101
Followed by
Figure BDA0003538507730000102
And
Figure BDA0003538507730000103
k is output through a priority encoder 1 and a priority encoder 2 respectively1And k2(ii) a Then inputs operand A and
Figure BDA0003538507730000104
XOR generation
Figure BDA0003538507730000105
Input operand B and
Figure BDA0003538507730000106
XOR generation
Figure BDA0003538507730000107
In the processing stage of the error part computing unit, the result after the XOR is input into a data comparison module in the error part computing unit, if the result is not the same as the result of the XOR, the error part computing unit is used for processing the result
Figure BDA0003538507730000108
Then the intermediate variable
Figure BDA0003538507730000109
Whereas intermediate variables
Figure BDA00035385077300001010
Followed by Q1Q is detected by a neighboring detector module in the error portion calculation unit1Is approximately 2kOr 2k+1If Q is1Bit k-1 in the binary system of (1), Q1Is approximately 2k+1Otherwise, the k-1 bit is 0, then Q1Is approximately 2kI.e. round (Q)1) Is 2kOr 2k +1(ii) a Round (Q)1) By priority encoder output Q in error portion calculation unit1K or k +1, followed by Q2Left shift Q of barrel shifter in error part calculation unit1The most significant bit of (a) realizes round (Q)1)·Q2
At the approximate partial compute unit processing stage, the most significant bit k of operand A, B is taken1And k2Generating k via adder 11+k2Is then generated by a decoder
Figure BDA00035385077300001011
Left shift k by barrel shifter 12Bit obtaining
Figure BDA00035385077300001012
Left shift k by barrel shifter 21Bit obtaining
Figure BDA00035385077300001013
Both are obtained by an adder 2
Figure BDA00035385077300001014
Processing the error part obtained by the unit1)·Q2Derived from an approximation part of the processing unit
Figure BDA00035385077300001015
The phase or the phase of the mixture is shown in the specification,
Figure BDA00035385077300001016
and
Figure BDA00035385077300001017
the final result is obtained through the adder 3
Figure BDA00035385077300001018
Specifically, in the embodiment, the data comparison module compares two operands, the larger operand being Q1Smaller is Q2(ii) a Adjacent detector module detecting Q1If the k-1 bit appears to be a "1", the larger operand Q1Overestimation of 2k+1Else, the larger operand Q1Underestimate of 2kThen Q is shifted by a barrel shifter2The round (Q) is obtained by shifting k +1 bit or k bit to the left1)·Q2. Priority encoder handle Q1Encoding to round (Q)1) (ii) a Due to round (Q)1)·Q2Is in a certain ratio
Figure BDA0003538507730000111
Is small, therefore
Figure BDA0003538507730000112
Carry-out is not generated, and the addition effect is realized by using an OR gate instead of an adder. Will be provided with
Figure BDA0003538507730000113
And round (Q)1)·Q2Both are in phase or are obtained
Figure BDA0003538507730000114
The use of an or gate enables both compensation of errors and consumes less resources than an adder.
Specifically, in the present embodiment, the product P is approximatedapproxAnd (3) calculating: the result round (Q) produced in the final EP calculation Unit1)·Q2And
Figure BDA0003538507730000115
is in phase or is obtained
Figure BDA0003538507730000116
Then with
Figure BDA0003538507730000117
The approximate product P is obtained by the adder 3approx
Figure BDA0003538507730000118
Is unique in round (Q)1)·Q2Ratio of ever to time
Figure BDA0003538507730000119
Is small, therefore
Figure BDA00035385077300001110
No carry is generated, then the effect of using an or gate and an adder is the same, and fewer hardware resources are used for the or gate.
Specifically, in the present embodiment, the purpose of the first detector module (LOD module) is to increment the most significant bit k of the input operand A, B by 2kIs extracted. And 4-bit LOD detection modules are cascaded into high-order LOD modules. 4bit LOThe D-gate stage circuit diagram is shown in fig. 5, with a total of 3 three-input multiplexers and 3 and gates, with the output of each three-input multiplexer being coupled to the input of each and gate. Assume that the input operand is d and the output result is Z. The third bit (most significant bit) of Z is equal to the third bit (most significant bit) of d, and the second bit of Z is equal to the second bit of d and the output of a three-input multiplexer. The principle of the three-input selector is to output 1 when the third bit of d is 0 and 0 when the third bit of d is 1. Similarly, the first bit of Z is equal to the first bit of d and the output of a three-input multiplexer is AND-ed. The three-input selector outputs the output of the first three-input selector when the second bit of d is 0, and outputs 0 when the second bit of d is 1. The lowest bit of Z is equal to the lowest bit of d and the output of a three-input multiplexer is anded. The three-input selector outputs the output of the second multiplexer when the first bit of d is 0, and outputs 0 when the first bit of d is 1.
Specifically, in this embodiment, the purpose of the priority encoder module is to assign 2kK in (3) is detected. Assuming that the input operand is D and the output operand is Z, a multiplexer is used, if D is equal to 20Then the output Z is 0 and D is equal to 21 Z outputs 1, recursion continues until D equals 27And Z outputs 7.
In particular, in the present embodiment, the purpose of the barrel shift module is to perform a shift operation.
Figure BDA0003538507730000121
Is that the product of
Figure BDA0003538507730000122
Shift k to the left2Bit, same reason
Figure BDA0003538507730000123
That is to say, will
Figure BDA0003538507730000124
Shift k to the left1A bit.
In particular, in the present embodiment, the purpose of the decoder moduleIs to make k1+k2Is converted into
Figure BDA0003538507730000125
Assuming that the input operand is D and the output operand is Z, a multiplexer is used, and if D is equal to 0, the output Z is 20D equals 1, Z outputs 21Recursion continues until D equals 7, Z outputs 27
Specifically, in the present embodiment, the adder uses a carry-look-ahead adder. The difference between the carry look-ahead adder and the ordinary adder is that the ordinary adder needs to wait for the carry information transmitted by the low-order full adder to perform calculation, and if the number of stages is high, the combinational logic delay is too long. The advanced carry adder is unique in that the carry information can be directly calculated and sent to each stage through the input operand without waiting for the transmission after the carry information is calculated by the next stage. Assume that the input operand is A, B, the input carry is CIN, the output is S, and the output carry is CO. The carry look ahead adder principle is as follows:
g=A&B (29)
p=A|B (30)
c[0]=g[0]|(p[0]&CIN) (31)
ci+1=gi|pi&ci,i=1,2,3,...,k-1 (32)
S=A^B^{c[k-1:0],CIN} (33)
CO=c[k] (34)
the variable c is the carry information of each bit, and the CLA with large bit width can be cascaded by the CLA with small bit width.
Specifically, in the present embodiment, the purpose of the data comparison module is to compare Q1And Q2The magnitude of the two values. As shown in fig. 2, x and y are input operands, Q1、Q2This is obtained from the following equation:
Figure BDA0003538507730000131
Figure BDA0003538507730000132
Figure BDA0003538507730000133
Figure BDA0003538507730000134
Figure BDA0003538507730000135
specifically, in the present embodiment, the purpose of the adjacent one detector module (NOD module) is to approximate the input operand A, B to 2kOr 2k+1. As shown in FIG. 2, if NOD determines that bit k-1 of a k-bit operand is 1, the operand is approximated to be 2k+1And if the k-1 position is 0, it is 2k. Fig. 6 shows a 16-bit gate stage circuit diagram of the NOD detection module, which has a total of 61 and gates, 14 or gates, and 44 not gates, and assumes that the input operand is I and the output operand is O. The calculation method of each bit of the output O is shown as the following formula:
o16=I15&I14 (40)
Figure BDA0003538507730000136
Figure BDA0003538507730000137
Figure BDA0003538507730000138
Figure BDA0003538507730000139
Figure BDA00035385077300001310
Figure BDA00035385077300001311
Figure BDA0003538507730000141
specifically, in this embodiment, the or gate adds the EP end result to the AP instead of the addition operation.
Example 2
As can be seen from equation (9), in the EP calculation unit in the prior art, the error of the logarithmic multiplier is caused by the EP, and therefore, a compensation algorithm based on the WCE is proposed to estimate the EP. The larger operand in EP is approximately a power of 2 and the other operand is unchanged so that the multiplication in EP can be replaced with a shift operation. Implementing the shift operation uses less hardware resources than multiplication. The details of algorithm 2 are as follows:
Figure BDA0003538507730000142
Figure BDA0003538507730000143
else
Figure BDA0003538507730000144
endif
calculating k: Q1The position of the first bit:
if k=0
EP=Q2
else if Q1[k-1]=1
EP=Q2<<(k+1)
else
EP=Q2<<k
endif
first, the sizes of two operands are compared, the larger operand is Q1The smaller operand is Q2。Q1Passing through NOD module, if Q1The k-1 bit of (A) exhibits a value of "1", then Q is added1Overestimation of 2k+1Otherwise, Q will be1Underestimate of 2kThen may pass through another operand Q2EP is obtained by shifting left k bits or k +1 bits.
Specifically, in the present embodiment, the selection strategy of the operand is based on the WCE minimization principle, and the error of the algorithm is caused by the approximation of the operand. For an n-bit multiplier, the Rounding Error (RE) at different values of k can be expressed as:
Figure BDA0003538507730000151
k represents the most significant bit of the operand.
The present invention analyzes WCE in both cases. The first case is to select smaller operands for approximation, i.e.
Figure BDA0003538507730000152
According to equation (18), the maximum rounding error is 2n-3. Suppose Q1Is less than Q2Of then Q2Is in the range of 0 to 2n-1-1. Therefore, the WCE of the approximate multiplier can be calculated by the following equation:
Figure BDA0003538507730000153
thus, for an 8-bit multiplier, the WCE of the proposed multiplier is 4064.
The second case is to select larger operands for approximation, i.e.
Figure BDA0003538507730000154
To satisfy Q1Is greater than or equal to Q2Due to Q1Maximum value of 2k+2k-1Then Q2Is in the range of 0 to 2k+2k-1. Therefore, the WCE of the approximate multiplier can be calculated by the following equation:
Figure BDA0003538507730000155
thus, for an 8-bit multiplier, the WCE of the proposed multiplier for the second case is 3072.
The WCE is smaller in the second case than in the first case. Therefore, in the proposed compensation algorithm, the larger value of the operand is approximated in EP.
Example 3
As can be seen from equation (9), the AP portion of the multiplier is composed of three parts. Namely, it is
Figure BDA0003538507730000156
The first part
Figure BDA0003538507730000161
Operands A and B are generated via the LOD module
Figure BDA0003538507730000162
And
Figure BDA0003538507730000163
after passing through the priority encoder, k is generated1And k2。k1、k2Generating k by adder 11+k2。k1+k2Finally generated by a decoder
Figure BDA0003538507730000164
The second part
Figure BDA0003538507730000165
A and
Figure BDA0003538507730000166
XOR generation
Figure BDA0003538507730000167
Left shift k through barrel shifter2Bit obtaining
Figure BDA0003538507730000168
Third part
Figure BDA0003538507730000169
B and
Figure BDA00035385077300001610
XOR generation
Figure BDA00035385077300001611
Left shift k through barrel shifter1Bit obtaining
Figure BDA00035385077300001612
Figure BDA00035385077300001613
Generated by adder 2
Figure BDA00035385077300001614
Final approximate product
Figure BDA00035385077300001615
Approximate product PapproxAnd (3) calculating: the approximate product can be calculated by the following equation:
Figure BDA00035385077300001616
obviously, two adders are required to generate the AP and an additional adder is required to compensate for the EP, but due to the characteristic that adding the EP final result to the AP does not generate a carry, the addition effect can be achieved by using an or gate instead of the adder. The method comprises the following specific steps:
q is generated through LOD module1、Q2And greater Q1Is approximately 2kOr 2k+1. Suppose that
Figure BDA00035385077300001617
The following inequality can be derived:
Figure BDA00035385077300001618
Figure BDA00035385077300001619
Figure BDA00035385077300001620
Figure BDA00035385077300001621
Figure BDA0003538507730000171
and round (Q)1)·Q2Are all in a one-hot coded form and because of round (Q)1)·Q2Is less than
Figure BDA0003538507730000172
Therefore, it is not only easy to use
Figure BDA0003538507730000173
And round (Q)1)·Q2The addition does not result in a carry. Thus, it is possible to provide
Figure BDA0003538507730000174
And round (Q)1)·Q2The sum of (a) and (b) may be operated using an or gate instead of an adder. Equation (21) can thus be written as:
Figure BDA0003538507730000175
example 4
The NMED and MRED of the invention can be calculated by the following formulas:
Figure BDA0003538507730000176
ED=|Papprox-Pexact| (27)
Figure BDA0003538507730000177
where N represents the total number of input operands, M represents the maximum output of the precision multiplier, and P (ED), P (RED) represent the probability of the error occurring. The invention considers the 8-bit multiplier and other 8-bit multipliers designed in the invention. The input operands ranged from 0 to 255 and all possible input operands were simulated to evaluate the performance of the multiplier designed in this invention, the results of which are shown in table I.
TABLE I
Error metric for logarithmic multiplier
Figure BDA0003538507730000178
Compared with the most accurate Improved Logarithmic Multiplier A (ILM-A), the NMED is reduced by 34%, and the MRED is reduced by 48%, which shows that the Multiplier designed in the invention can effectively improve the precision through a compensation algorithm.
Referring to fig. 3, fig. 3 is a schematic diagram of the bilateral error distribution generated by the present invention, which can be obtained by overestimating or underestimating the larger operand in the EP according to the present invention, as shown in fig. 3. In the application using multiply-accumulate as the main operation, since the error has positive or negative, the generated errors may cancel each other out, so that the error over-accumulation can be avoided.
At PapproxAnd the addition operation is replaced by an OR gate in the calculation module. Table II shows a comparison of the performance of OR gates with conventional adders, both designs implemented by Verilog and synthesized with ISE14.7-Webpack on xc6slx16-2csg324 of Xilinx. The adder uses a Carry-look-ahead (CLA) adder, compared with the CLA, the LUT of the OR gate is reduced by 39%, the delay is reduced by 83%, and the power consumption is reduced by 74%. Thus using an or gate consumes less resources.
TABLE II
Hardware comparison of CLA AND OR gates
Figure BDA0003538507730000181
The invention realizes the approximate multiplier through Verilog hardware description language, and all designs are realized only through combinational logic without a pipeline and are synthesized through ISE 14.7-Webpack. The implementation is then built on xc6slx16-2csg324 in Xilinx and all I/O is allocated to pins, power is estimated at a clock frequency of 50 MHz. The results of the comparison are shown in Table III.
TABLE III
Hardware index of logarithmic multiplier
Figure BDA0003538507730000182
The modified log multiplier a (ILM-a) in table III refers to a log multiplier using the original NOD block, and the modified log multiplier B (ILM-B) refers to a log multiplier using the simplified NOD block. Compared with ILM-A, the design has better hardware performance, simultaneously NMED, MRED and WCE are respectively 34%, 48% and 25% lower than ILM-A, and the compensation algorithm can remarkably improve the precision of the multiplier on the premise of not sacrificing hardware resources. Although power consumption is increased by 22% compared to ILM-B, the range of input operands is not limited in this design and is more accurate than ILM-B. The design is therefore suitable for more applications. Compared with a Logarithmic Multiplier (LM), the NMED, MRED and WCE of the design are respectively reduced by 53%, 61% and 25%. In addition, the invention also calculates the product of PDP and NMED and the product of PDP and MRED to prove that the design is efficient. Table III shows that PDP by NMED and PDP by MRED of the present design are the smallest among the existing logarithmic multipliers.
Furthermore, the MRED versus PDP relationship for the logarithmic multiplier considered is shown in FIG. 4, with different values for PDP-MRED represented by different dashed lines. The smaller the PDP, the larger the MRED of the approximate multiplier, and PDP-MRED represents one unit of measure of the multiplier cost performance. The smaller the PDP-MRED is, the less resources are consumed by hardware on the premise that the multiplier has the same precision. The bottom dotted line in the HPLM body diagram designed by the invention shows that the PDP-MRED minimum of the approximate multiplier designed by the invention is shown, namely the approximate multiplier designed by the invention achieves the optimal balance between hardware consumption and precision.
It is to be noted that, in the present invention, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (5)

1. A high-precision logarithmic multiplier is characterized by comprising a preprocessing unit, an error part calculating unit and an approximate part calculating unit, and the working flow of the high-precision logarithmic multiplier is as follows:
firstly, the operand A, B is input into a preprocessing unit, and the operand A, B is preprocessed through a first detector module and a priority encoder;
then, inputting the operand A, B XOR result obtained after preprocessing into a data comparison module in an error part processing unit, and obtaining a final result EP of an error part through the operation of an adjacent detector module, a priority encoder and a barrel shifter;
meanwhile, the operand A, B most significant bit k obtained after preprocessing is used1And k2Inputting an approximate part calculation unit, and obtaining a final result AP of the approximate part through the operation of an adder, a decoder and a barrel shifter;
finally, the EP obtained by the error part processing unit and the AP obtained by the approximate part calculating unit are operated by an adder to obtain a final result Papprox
2. A high precision logarithmic multiplier as claimed in claim 1 wherein in the preprocessing unit two input operands A, B enter first detector block 1 and first detector block 2, respectively, at 2kExtracts the most significant bit of the operand A, B as a power of
Figure FDA0003538507720000011
Followed by
Figure FDA0003538507720000012
And
Figure FDA0003538507720000013
k is output through a priority encoder 1 and a priority encoder 2 respectively1And k2(ii) a Then inputs operand A and
Figure FDA0003538507720000014
XOR generation
Figure FDA0003538507720000015
Input operand B and
Figure FDA0003538507720000016
XOR generation
Figure FDA0003538507720000017
3. A high precision logarithmic multiplier as claimed in claim 2, wherein in the error portion calculating unit, the result of the exclusive or is inputted to the data comparing module in the error portion calculating unit if
Figure FDA0003538507720000018
Then the intermediate variable
Figure FDA0003538507720000019
Whereas intermediate variables
Figure FDA00035385077200000110
Followed by Q1Q is detected by a neighboring detector module in the error portion calculation unit1Is approximately 2kOr 2k+1If Q is1Bit k-1 in the binary system of (1), Q1Is approximately 2k+1Otherwise, the k-1 bit is 0, then Q1Is approximately 2kI.e. round (Q)1) Is 2kOr 2k+1(ii) a Round (Q)1) Output Q of priority encoder 3 in error portion calculation unit1The most significant bit k or k +1,immediately after Q2Left shift Q by barrel shifter 3 in error portion calculation unit1The most significant bit of (a) realizes round (Q)1)·Q2
4. A high precision logarithmic multiplier as claimed in claim 2, characterized in that the most significant bit k of the operand A, B is divided into k in the approximate partial computation unit1And k2Generating k via adder 11+k2Is then generated by a decoder
Figure FDA0003538507720000021
Figure FDA0003538507720000022
Left shift k through barrel shifter 12Bit obtaining
Figure FDA0003538507720000023
Figure FDA0003538507720000024
Left shift k by barrel shifter 21Bit obtaining
Figure FDA0003538507720000025
Both are obtained by an adder 2
Figure FDA0003538507720000026
5. A high precision logarithmic multiplier as claimed in claim 2, characterized in that the error part is processed by the unit derived round (Q)1)·Q2Obtained from an approximation part of the processing unit
Figure FDA0003538507720000027
The phase or the phase of the mixture is shown in the specification,
Figure FDA0003538507720000028
the final result is obtained by the adder 3
Figure FDA0003538507720000029
CN202210231433.2A 2022-03-09 2022-03-09 High-precision logarithmic multiplier Pending CN114610268A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210231433.2A CN114610268A (en) 2022-03-09 2022-03-09 High-precision logarithmic multiplier

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210231433.2A CN114610268A (en) 2022-03-09 2022-03-09 High-precision logarithmic multiplier

Publications (1)

Publication Number Publication Date
CN114610268A true CN114610268A (en) 2022-06-10

Family

ID=81862056

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210231433.2A Pending CN114610268A (en) 2022-03-09 2022-03-09 High-precision logarithmic multiplier

Country Status (1)

Country Link
CN (1) CN114610268A (en)

Similar Documents

Publication Publication Date Title
Pineiro et al. Algorithm and architecture for logarithm, exponential, and powering computation
US5245564A (en) Apparatus for multiplying operands
US4949296A (en) Method and apparatus for computing square roots of binary numbers
US5993051A (en) Combined leading one and leading zero anticipator
KR100241076B1 (en) Floating- point multiply-and-accumulate unit with classes for alignment and normalization
US5023827A (en) Radix-16 divider using overlapped quotient bit selection and concurrent quotient rounding and correction
US5132925A (en) Radix-16 divider using overlapped quotient bit selection and concurrent quotient rounding and correction
RU2408057C2 (en) Fixed point multiplier with presaturation
US5408426A (en) Arithmetic unit capable of performing concurrent operations for high speed operation
US6182100B1 (en) Method and system for performing a logarithmic estimation within a data processing system
Takagi et al. A hardware algorithm for integer division
US8554819B2 (en) System to implement floating point adder using mantissa, rounding, and normalization
Murillo et al. A suite of division algorithms for posit arithmetic
US7016930B2 (en) Apparatus and method for performing operations implemented by iterative execution of a recurrence equation
US4979141A (en) Technique for providing a sign/magnitude subtraction operation in a floating point computation unit
CN114610268A (en) High-precision logarithmic multiplier
JP2857505B2 (en) Division device
US20160085508A1 (en) Optimized structure for hexadecimal and binary multiplier array
US11366638B1 (en) Floating point multiply-add, accumulate unit with combined alignment circuits
Krishnan A comparative study on the performance of FPGA implementations of high-speed single-precision binary floating-point multipliers
Yun et al. A latency-effective pipelined divider for double-precision floating-point numbers
CN112783470A (en) Device and method for executing floating point logarithm operation
Prasanna et al. An efficient fused floating-point dot product unit using vedic mathematics
JP3233432B2 (en) Multiplier
KR20010067226A (en) Interpolation method and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination