CN112783469A - Method and device for executing floating-point exponential operation - Google Patents

Method and device for executing floating-point exponential operation Download PDF

Info

Publication number
CN112783469A
CN112783469A CN202011592456.3A CN202011592456A CN112783469A CN 112783469 A CN112783469 A CN 112783469A CN 202011592456 A CN202011592456 A CN 202011592456A CN 112783469 A CN112783469 A CN 112783469A
Authority
CN
China
Prior art keywords
value
data path
iteration
prediction
mantissa
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011592456.3A
Other languages
Chinese (zh)
Inventor
刘明
周彦兵
周小明
赵学华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Information Technology
Original Assignee
Shenzhen Institute of Information Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Information Technology filed Critical Shenzhen Institute of Information Technology
Priority to CN202011592456.3A priority Critical patent/CN112783469A/en
Publication of CN112783469A publication Critical patent/CN112783469A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/483Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/483Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
    • G06F7/485Adding; Subtracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/483Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
    • G06F7/487Multiplying; Dividing

Abstract

The invention discloses a method and a device for executing floating-point exponential operation, wherein a preprocessing module comprises the following steps: comprises performing exception processing on input floating point exponential function, and preprocessing with multiplier to obtain output valuenum2 and the exponent part of the result of the floating point exponential functionY(ii) a An exponential function mantissa iteration module: output value of preprocessing module by four times of predicting CORDIC algorithmnum2, performing four times of prediction iterative calculation of the X data path, the Y data path and the Z data path to obtainxValue sumyValue of willxValue sumyThe value is input into an adder to be operated to obtain the mantissa part of the calculation result of the floating-point exponential functionX(ii) a A floating point regularization module: mantissa valueXAnd value of the indexYLeading zero detection is carried out, then the detection is converted into a standard floating point format through shift operation, and finally the detection is combined with a sign bitSAre combined and combinedAnd outputting the row normalized format. The method uses the quartic prediction CORDIC algorithm to carry out the exponential operation of the floating point number, predicts the quartic iteration direction once, greatly reduces the iteration times and shortens the calculation period of the exponential function.

Description

Method and device for executing floating-point exponential operation
Technical Field
The present invention relates to the field of exponentiation, and more particularly, to an apparatus and method for performing floating-point exponentiation.
Background
In real life, many aspects require the calculation of an exponential function. For example: aircraft control, voice transmission, navigation and the like in the aviation field; image and real-time information transmission in the aerospace field; the calculation of the floating point exponential function with high precision is also involved in the calculation of the payback in interest calculation in the financial field. The precision and the calculation speed of the exponential operation are improved, and the method has important significance for practical application.
In the design of an integrated circuit, due to the limitations of conditions such as a manufacturing process, a chip area and the like, a traditional floating-point exponent hardware operation unit is simple in structure and low in speed and is difficult to meet the calculation requirement, so that the floating-point exponent operation is realized by combining multiple mathematical transformations and a software method in practice. The method is easy to implement, but the operation efficiency is not high, and the method becomes a calculation bottleneck which is difficult to break through under the requirement of high-precision calculation.
Disclosure of Invention
The invention aims at the problems and provides a method and a device for executing floating point exponential operation, which utilize a novel improved quartic prediction CORDIC algorithm to predict four iteration direction values in one clock period, and improve the efficiency by four times.
The technical scheme of the invention is as follows: a method for performing a floating-point exponent operation is provided, comprising the steps of:
s1, preprocessing: the method comprises two parts of exception processing and input value preprocessing, wherein an input floating point exponential function M multiplied by 2 is firstly inputESplitting the input value into a sign bit S, an exponent E and a mantissa M, detecting whether the floating point exponent function is abnormal or not according to the exponent E, and if the input value is not abnormal, assuming that the floating point exponent function is MX 2EIs X, is Y, and is the mantissa M multiplied by (ln2)-1And moving the E bit to the left, and obtaining the mantissa part X by solving for 2X according to the convergence requirement of the CORDIC algorithm, thereby satisfying the formula
Figure BDA0002869567740000011
Defining 148-bit width num1 according to 2X, specifically: num1 ═ M × 2E-Y x ln2+ ln2, num1 shift-spliced by a multiplier to num 2;
s2, exponential function mantissa iteration: receiving num2 obtained by S1 preprocessing, performing four-time prediction iterative calculation on an X data path, a Y data path and a Z data path on an input value num2 through a four-time prediction CORDIC algorithm to obtain an X value and a Y value of 136 bits of output, inputting the X value and the Y value into an adder for operation to obtain a 2X operation result, and further obtaining a calculation result mantissa value X;
s3, floating point regularization: and leading zero detection is carried out on the mantissa value X and the exponent value Y obtained by processing in the steps S1 and S2, then the mantissa value X and the exponent value Y are converted into a standard floating point format through shift operation, and finally the mantissa value X and the exponent value Y are combined with a sign bit S to carry out normalized format output.
Further, the calculation formula of the fourth prediction CORDIC algorithm in step S2 is as follows:
Figure BDA0002869567740000021
wherein σii+1i+2i+3Is a sign factor with the value of-1 or 1 and represents the predicted rotation direction of the current iteration, i represents the number of the four predicted iterations, and thetai,θi+1,θi+2,θ3Representing four angles of rotation, xi,yi,ziInitial values, X, representing the same principle of the ith four prediction iterations of the X data path, the Y data path and the Z datai+4,yi+4,zi+4And (4) representing the iteration results of the ith four prediction iterations of the X data path, the Y data path and the Z data path.
Further, the implementation process of performing four prediction iterative calculations on the X data path, the Y data path, and the Z data path by using the four prediction CORDIC algorithm in step S2 is as follows:
(1) traverse sigmaii+1i+2i+3Each takes the value of-1 or 1 to form 16 sigmaii+1i+2i+3Value, Z data path according to 16 kinds of sigmaii+1i+2i+3Carrying out value calculation on a corresponding z value;
(2) selecting the value of the 16 results of z, which is most approximate to 0, as the value of the four prediction iterations, i.e., the value of the z, i.e., the value of the corresponding sigmaii+1i+2i+3The predicted rotation direction S ═ σ as the current four prediction iterationsii+1i+2i+3]Outputs to X and Y data paths, the X and Y data paths being responsive to the direction of rotation S [ [ sigma ] ] passed theretoii+1i+2i+3]And calculating to obtain the x value and the y value of the result of the current four prediction iterations.
Further, the second implementation procedure of performing four prediction iterative computations on the X data path, the Y data path, and the Z data path by the four prediction CORDIC algorithm in step S2 is as follows:
(1) traverse sigmaii+1i+2i+3Each takes the value of-1 or 1 to form 16 sigmaii+1i+2i+3Taking values for X data path, Y data path and Z data path separately for each sigmaii+1i+2i+3Carrying out parallel calculation on values, and simultaneously generating 16 groups of iteration results;
(2) selecting the value of the maximum approximate 0 of the 16 results of the Z data path as the value of the prediction iteration of the time, namely, the value of the maximum approximate 0, and then, taking the value of the sigma corresponding to the result of the value of the maximum approximate 0 as the value of the prediction iteration of the time, namely, the maximum approximate 0 of the 16 results of the Z data pathii+1i+2i+3And directly taking the generated X value of the iteration result of the X data path and the generated Y value of the iteration result of the Y data path as the result of the current four prediction iterations.
In another aspect of the present invention, an apparatus for performing a floating-point exponent operation is provided, including:
a preprocessing module: the method is used for exception handling and input value preprocessing of an input floating-point exponential function, and firstly, the input floating-point exponential function is processed by Mx 2ESplit into a sign bit S, an exponent E and a mantissaM, detecting whether the floating-point exponential function has abnormality or not according to the exponent E, and if the input value has no abnormality, assuming that the floating-point exponential function is Mx 2EIs X, is Y, and is the mantissa M multiplied by (ln2)-1And moving the E bit to the left, and obtaining the mantissa part X by solving for 2X according to the convergence requirement of the CORDIC algorithm, thereby satisfying the formula
Figure BDA0002869567740000031
Defining 148-bit width num1 according to 2X, specifically: num1 ═ M × 2E-Y x ln2+ ln2, num1 shift-spliced by a multiplier to num 2;
an exponential function mantissa iteration module: the device is used for receiving num2 obtained by S1 preprocessing, performing four-time prediction iterative calculation on an X data path, a Y data path and a Z data path on an input value num2 through a four-time prediction CORDIC algorithm to obtain an X value and a Y value of 136 bits of output, inputting the X value and the Y value into an adder for operation to obtain a 2X operation result, and further obtaining a calculation result mantissa value X;
a floating point regularization module: the method is used for leading zero detection of the mantissa value X and the exponent value Y obtained by processing in steps S1 and S2, then converting the mantissa value X and the exponent value Y into a standard floating point format through shift operation, and finally combining the mantissa value X and the exponent value Y with a sign bit S to output in a normalized format.
Further, the calculation formula of the four prediction CORDIC algorithm in the exponential function mantissa iteration module is as follows:
Figure BDA0002869567740000032
wherein σii+1i+2i+3Is a sign factor with the value of-1 or 1 and represents the predicted rotation direction of the current iteration, i represents the number of the four predicted iterations, and thetai,θi+1,θi+2,θ3Representing four angles of rotation, xi,yi,ziInitial values, X, representing the same principle of the ith four prediction iterations of the X data path, the Y data path and the Z datai+4,yi+4,zi+4And (4) representing the iteration results of the ith four prediction iterations of the X data path, the Y data path and the Z data path.
Further, the implementation process of performing four prediction iterative calculations on the X data path, the Y data path, and the Z data path by the four prediction CORDIC algorithm in the exponential function mantissa iterative module is as follows:
(1) traverse sigmaii+1i+2i+3Each takes the value of-1 or 1 to form 16 sigmaii+1i+2i+3Value, Z data path according to 16 kinds of sigmaii+1i+2i+3Carrying out value calculation on a corresponding z value;
(2) selecting the value of the 16 results of z, which is most approximate to 0, as the value of the four prediction iterations, i.e., the value of the z, i.e., the value of the corresponding sigmaii+1i+2i+3The predicted rotation direction S ═ σ as the current four prediction iterationsii+1i+2i+3]Outputs to X and Y data paths, the X and Y data paths being responsive to the direction of rotation S [ [ sigma ] ] passed theretoii+1i+2i+3]And calculating to obtain the x value and the y value of the result of the current four prediction iterations.
Further, a second implementation process of performing four prediction iterative computations on the X data path, the Y data path, and the Z data path by using a four prediction CORDIC algorithm in the exponential function mantissa iterative module is as follows:
(1) traverse sigmaii+1i+2i+3Each takes the value of-1 or 1 to form 16 sigmaii+1i+2i+3Taking values for X data path, Y data path and Z data path separately for each sigmaii+1i+2i+3Carrying out parallel calculation on values, and simultaneously generating 16 groups of iteration results;
(2) selecting the value, which is most approximate to 0, of the 16 results of the Z data path as the numerical value, namely, the ite _ Z, of the four prediction iterations, and then, enabling the sigma corresponding to the result of the ite _ Z to beii+1i+2i+3And directly taking the generated X value of the iteration result of the X data path and the generated Y value of the iteration result of the Y data path as the result of the current four prediction iterations.
Further, the exponential function mantissa iteration module comprises: a fourth prediction Z data path calculation unit; a single iteration Z data path calculation unit; a four-prediction X data path calculation unit; a single iteration X data path calculation unit; a four-prediction Y data path calculation unit; a single iteration Y data path calculation unit; the iteration loop control unit is used for performing iteration loop control by using the counter and determining whether each iteration mode is four prediction iterations or single iteration; the lookup table unit is used for calculating a required iteration angle value according to the output of the iteration cycle control unit; the selector unit is used for selectively outputting the calculation result of the fourth-time prediction Z data path calculation unit and the calculation result of the single-time iteration Z data path calculation unit, selectively outputting the calculation result of the fourth-time iteration X data path calculation unit and the calculation result of the single-time iteration X data path calculation unit, and selectively outputting the calculation result of the fourth-time iteration Y data path calculation unit and the calculation result of the single-time iteration Y data path calculation unit.
The method and the device for executing the floating-point exponential operation, provided by the invention, realize 128-bit high-precision floating-point exponential function operation, and have the beneficial effects that:
1. the improved quartic prediction CORDIC algorithm is used for conducting floating point number exponential operation, the quartic iteration direction is predicted by one time clock, the iteration times are greatly reduced, and the calculation period of the exponential function is shortened.
2. The floating-point exponential function arithmetic device is designed, and 113-bit full-precision output can be met under the condition that input and output are 128-bit floating-point numbers.
Drawings
FIG. 1 is a diagram illustrating a flow of floating-point exponential function operations according to an embodiment of the present invention;
FIG. 2 is a simplified structural diagram of an apparatus for implementing floating-point exponential function operation according to an embodiment of the present invention;
FIG. 3 is a schematic flow diagram of a pre-processing module according to an embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating a logic control method for four iterations of the predictive CORDIC algorithm according to an embodiment of the present invention;
FIG. 5 is a block diagram illustrating an exponential function mantissa iteration module according to an embodiment of the present invention;
FIG. 6 is a schematic diagram illustrating an iterative X data path of a four-prediction CORDIC algorithm according to an embodiment of the present invention;
FIG. 7 is a block diagram of a floating point regularization module according to an embodiment of the present invention.
Detailed Description
In order to further describe the technical scheme of the present invention in detail, the present embodiment is implemented on the premise of the technical scheme of the present invention, and detailed implementation modes and specific steps are given.
According to the standard rule IEE754 of floating-point function, the exponent floating-point number to be calculated in the embodiment of the invention is Mx 2EAssume that the mantissa portion of the calculation result is X and the exponent portion is Y. Then the calculation result of the corresponding exponential function is as follows (1):
Figure BDA0002869567740000051
wherein X is more than 0.5 and less than 1, i.e.
Figure BDA0002869567740000052
Then there are:
Figure BDA0002869567740000053
modification of formula (2) yields formula (3):
Figure BDA0002869567740000054
since Y is an integer, the value of Y can be obtained by equation (3), and only the value of X is required, and the value of X needs to be preprocessed, assuming that:
2Y=eZ (4)
then it is possible to obtain:
Z=Y ln2 (5)
the expression for X, i.e., formula (6), can be obtained:
Figure BDA0002869567740000055
wherein Y is obtained by the formula (3), and the value range of X is as follows: x is more than 0.5 and less than 1.
From the range of the exponential function: mx 2EYln2 must fall within the convergence region of CORDIC [ -1.1182,1.1182 [ ]]And (4) the following steps. In this embodiment, the fourth prediction CORDIC algorithm is newly improved to calculate X, and the fourth prediction CORDIC algorithm is based on a rotation vector of the CORDIC algorithm under hyperbolic coordinate, and a corresponding CORDIC hyperbolic iteration formula is as follows:
Figure BDA0002869567740000061
direction of rotation sigmaiThe judgment of (1) is:
Figure BDA0002869567740000062
setting an initial iteration condition:
Figure BDA0002869567740000063
where θ is the angle of rotation required.
Thus, after N iterations one can obtain:
Figure BDA0002869567740000064
wherein, K is a rotation compensation factor, and can be known through the definition of a hyperbolic function:
eθ=coshθ+sinhθ (10)
e is calculated by adding the iterated x, y values and multiplying by the correction factorθThe value is obtained.
The CORDIC algorithm under the hyperbolic coordinate has the defect of overlarge iteration times under the high-precision condition, and the novel quartic prediction CORDIC algorithm is adopted for the hyperbolic coordinate system. The iteration times of the CORDIC are effectively reduced, and the operation index duration is shortened. The improved algorithm can predict four iteration direction values in one clock period, the efficiency is improved by four times, and each direction value can be judged according to the difference values of the four predicted angle values and the z value, so that the error is effectively reduced.
On the traditional CORDIC model equation, four σ values are predicted over one clock cycle (four σ values are predicted backwards per clock)ii+1i+2i+3Value), substituting the calculated formula (11) into the algorithm for obtaining four predicted CORDIC in turn:
Figure BDA0002869567740000065
due to sigmaii+1i+2i+3The values of the two elements are only-1 and 1, and the number of the case branches is only limited to 16, so that the first superposition mode in the rotation mode is as follows: and calculating 16 corresponding z values, and if the current time is in a rotation mode, selecting the z result which is closest to 0 in the 16 results of z as the z result of the current iteration. And corresponding sigmaii+1i+2i+3As the predicted rotation direction for the current iteration, the X and Y data paths are passed along with the predicted rotation direction S [ [ σ ] ]ii+1i+2i+3]And calculating to obtain the x value and the y value of the result of the current four-time prediction iteration, and finishing the current four-time prediction iteration.
When sigma isii+1i+2i+3When the value is 1, the rotation is carried out towards the positive direction, namely, the required angle is approached, and at the moment, the following steps are carried out:
zi+1=ziiθi (12)
when sigma isii+1i+2i+3When the angle is-1, the rotation is in the opposite direction, namely, the rotation is away from the required angle, and then:
zi+1=ziiθi (13)
in the case of data calculation with a bit width of n, the conventional CORDIC algorithm needs n iterations, and the number of iterations required is predicted to be reduced to n/4 by the four-time predictive CORDIC algorithm. Therefore, the biggest defect of the CORDIC algorithm, namely the delay problem, is solved, and the calculation time is greatly shortened.
In an embodiment of a method for performing floating-point exponent operation, as shown in fig. 1, the operation flow diagram of a floating-point exponent function is shown, where an input and a final result of the floating-point exponent function should both be floating-point numbers, when a 128-bit floating-point number num _ in is input, the method first enters a preprocessing module to perform exception judgment on an input floating point, and if the input floating-point number does not satisfy the floating-point number standard, an exception is thrown, and at this time, a sign signal is 0; if the input signal is correct, sign is 1, the input signal enters a preprocessing module, and the input 128-bit floating point number num _ in is split into a sign bit S, an exponent part E and a mantissa M according to a floating point representation specification, as shown in a formula (14):
Figure BDA0002869567740000071
the main design idea of the invention is to calculate the value of Y and then calculate the value of X according to the value of Y, the value of M and the value of E. The value range of X is [0.5,1] as derived from the above. According to the property of the exponential function, the corresponding abscissa of the rotation vector is negative at the moment, and the convergence requirement of the CORDIC algorithm is not met. And 2X takes on the value [1,2], when the rotation vector falls in the convergence region of CORDIC, so the value of 2X is calculated here, and the expression of 2X is shown in equation (15):
Figure BDA0002869567740000072
define 148 bits wide num 1:
num1=M×2E-Y×ln2+ln2 (16)
to ensure calculation accuracy, the lookup table bit width is defined as 128 bits, and num1 is set to a 130-bit mantissa iteration input value num2 in consideration of the sign bit and the carry bit.
Num2 is input into a mantissa iteration part, 136-bit x and y values are output through mantissa iteration operation, and due to the fact that shift operation is needed in the iteration process, the length x and y of a 128-bit word is increased by 8-bit compensation bit width, and precision loss caused by shift is prevented. According to the CORDIC hyperbolic model theorem, the formula (17) is shown in this case:
Figure BDA0002869567740000081
the formula for obtaining Mx is shown in formula (18) by inputting x and y to the adder:
Figure BDA0002869567740000082
the Mx bit width is 128, because the 2X value calculated at this time is not necessarily a standard mantissa in the floating point rule, and there may be leading zeros. Therefore, the number of front zeros needs to be detected by the front zero detection unit, and the front zeros are converted into a standard floating point format through the shift operation, so that the redundant bit width can reduce the precision loss generated in the shift process. And carrying out corresponding shift operation on the Y to ensure that the floating point number is not changed as a whole.
In fact, the 128-bit floating point number in the IEEE floating point specification has only 112 bits of mantissa, and it is necessary to round and truncate the shifted data to 112 bits, and then output the standard floating point number in a normalized format with Y, S.
The apparatus for performing floating-point exponent operation as shown in fig. 2 includes a preprocessing module: the method is used for exception handling and input value preprocessing of an input floating-point exponential function, and firstly, the input floating-point exponential function is processed by Mx 2ESplitting the data into a sign bit S, an exponent E and a mantissa M, and detecting a floating point exponent according to the exponent EWhether the number function has abnormality or not, if the input value has no abnormality, the floating point exponential function M multiplied by 2 is assumedEIs X, is Y, and is the mantissa M multiplied by (ln2)-1And moving the E bit to the left, and obtaining the mantissa part X by solving for 2X according to the convergence requirement of the CORDIC algorithm, thereby satisfying the formula
Figure BDA0002869567740000083
Defining 148-bit width num1 according to 2X, specifically: num1 ═ M × 2E-Y × ln2+ ln2, shift stitching num1 to get num 2.
The specific implementation mode is as follows: as shown in fig. 3, first, exception determination processing is performed on an input 128-bit floating point number. For the exponential function, there are only two kinds of input abnormal conditions, which are: NaN and ∞, wherein both the two cases need to judge E first, as shown in FIG. 3, where f-NaN indicates that the input value does not exist, and f-no indicates that the input value is infinite, which are both cases of input errors; and when no problem exists in the input, entering a Y calculation module, calculating a num2 value by using a multiplier after the Y value is obtained by calculation, and ending the preprocessing module.
The Y value is specifically calculated as follows: from the above derivation, the value of Y is the mantissa M multiplied by (ln2)-1Moving the E bit to the left will divide by ln2 to a sum (ln2)-1The multiplication of (2).
The mantissa M plus the concealment precision bit is actually 113 bits, will (ln2)-1Denoted as parameter a.
Figure BDA0002869567740000084
To avoid loss of precision, let 226-bit wide data numm be defined as:
Figure BDA0002869567740000091
from the above derivation:
Y-1<numm×2E<Y (21)
at this time, the calculation of Y can be obtained only by adding 1 to the integer part of the numm shift E bit, and since the bit width of Y is 15 bits, the exponent E is discussed in different cases:
(1) when E is greater than 0: the decimal point needs to be shifted to the right by E bit, at this time, the bit width of Y is 15, and if the decimal point is shifted to (225-14), the value of the integer part before the decimal point cannot be taken by Y, which may lose precision, resulting in an error result. Therefore, E < 13 is specified to ensure accuracy. The range of E is defined here to ensure the accuracy of the calculation, and is defined in a small range. The value range of E can be influenced by adopting different data expansion methods. Let b be num [225 (224-E) ], then Y be b +1, and the mantissa portion be num [ (225-E):0 ];
(2) when E is 0, the first two digits of the decimal point of numm are the integer part, and when c is numm [225:224], Y is c +1. The mantissa portion is num mm [223:0 ];
(3) when E is equal to-1, the decimal point is shifted left by one digit, and d is equal to num [225], then Y is equal to d +1, and the mantissa is num [224:0 ];
(4) when E is less than or equal to-2, no significant digit exists before decimal point, and when Y is equal to 1. Mantissa significand numm [225:0], preceded by (E +2) significant zeros.
The input limit of the input value E is 16396, and after the Y value is obtained, the next step needs to be obtained
Figure BDA0002869567740000092
The exponent part of (2) is input into the mantissa iteration part to solve the value of X. Since 2X belongs to (1,2), we find the exponent part of 2X, defined as the number num1 of 148 bits precision:
num1=M×2E-Y×ln2+ln2 (22)
and (5) carrying out displacement splicing on num1 to obtain num2, and finishing the pretreatment.
An exponential function mantissa iteration module: receiving num2 obtained by the preprocessing module, and implementing four-step parallel branch CORDIC algorithm operation on num2, specifically: and performing four times of prediction iterative calculation on the X data path, the Y data path and the Z data path through a four times of prediction CORDIC algorithm to obtain an X value and a Y value of 136 bits of output, inputting the X value and the Y value into an adder for operation to obtain a 2X operation result, and further obtaining a mantissa value X of the calculation result.
The specific implementation mode is as follows: the mantissa iteration module receives the pre-processing module result num2 and implements the four-times predictive CORDIC algorithm using hardware. And calculating by iteration X, y and z to finally obtain values coshnum2 and sinhnum2 corresponding to the input value num2, and transmitting the result to an addition and combination module for addition operation to obtain an index value corresponding to num2, namely the 2X value.
In the preferred embodiment, in the quartic predictive CORDIC algorithm, in order to reduce the computation delay of the iterative algorithm, a parallel computation mode is adopted for the Z data path, and 16 possible values of the rotation direction sigma are computed in parallel, and sigma enabling Z to be closest to 0 is selectedii+1i+2i+3Iterative operation of the X data path and the Y data path is carried out according to the value, so that the effect of predicting the four rotation directions through one operation can be achieved.
In the preferred embodiment, in the four-prediction CORDIC algorithm rotation mode, the iteration directions of x and y are determined by the value of z, so that the calculation result of z needs to be waited. Assuming a computation delay of t for a single coordinate four-step iterationiteThen at least 2t is needediteTo complete a complete full coordinate iteration operation. Although this approach has greatly reduced the delay required for the computation compared to the conventional CORDIC algorithm, there is still room for optimization. In order to further reduce the operation delay and improve the calculation efficiency, the calculation is carried out on z and the current iteration sigma is obtainedii+1i+2i+3At the same time, 16 sets of parallel calculations are also performed on x and y, respectively, in the same manner, and x, y, and z simultaneously generate 16 sets of iteration results. And the final iteration results of x and y are directly obtained by carrying out multi-path selection on sigma values generated after the calculation of z is finished. This delays the two calculations otherwise required by the calculation by 2titeThe compression is one-time calculation delay and one-time delay of the multiplexer, and the delay of the multiplexer is negligible relative to the delay of a complex calculation circuit. Equivalent to 2titeIs compressed to titeThe calculation delay is reduced by half, and the calculation is effectively improvedEfficiency.
The logic control method for the four iterations of the predictive CORDIC algorithm is shown in fig. 4, and in order to ensure convergence of the algorithm, the iteration is repeated at i ═ 1, 4, 13, 40, 3K +1. (K is the last value of i). The algorithm uses a counter cnt which is initially 1, and each time the clock rising edge cnt is equal to cnt +4, so that the prediction CORDIC algorithm is performed four times when the cnt (i in the code) takes values of 1, 5, 9, 13, and … 129. The value is closest to the ideal iteration i value when cnt is 5, 13, 41, 121, so that the convergence can be guaranteed by choosing to repeat the iteration when cnt is 5, 13, 41, 121.
The iteration starts with i ═ 1 in the code and ends with 129. A state machine may be used to implement the iterative function described above. Four states are assumed:
00: initial state: giving an initial value of x-k, y-0, z-num 2, i-1, and emptying x _ next, y _ next, z _ next;
01: four prediction CORDIC iterations: let clock i equal to i +4, if i equal to 5 or i equal to 41, then state goes to 10; if i is 13 or 121, the state goes to 11; if i is 129, the iteration is ended, and the iterated x, y and z values are output.
10: -1 repeating iteration state: and determining the sign of the fourth iteration by judging the second lower bit S [1] of the output value S of the module Z _ pre, and repeating the iteration once by making i equal to i-1. If S1 is 1, then sigma of the third iteration is 1, if S1 is 0, then sigma of the third iteration is-1. And repeating by using a basic single iteration formula, and returning to the 01 state after repeated iteration of iteration.
11: repeating the iteration state: and determining the sign of the fourth iteration by judging the lowest bit S [0] of the output value of the module Z _ pre, and repeating the iteration once by making i equal to i. If S0 is 1, σ for the fourth iteration is 1, and if S0 is 0, σ for the fourth iteration is-1. And (5) carrying out repeated iteration by using a basic single iteration formula, and returning to the state of 01 after the repeated iteration is finished.
The structure diagram of the exponent function mantissa iteration module is shown in fig. 5, and the method is performed for floating point data with quadruple precision. In the embodiment, the maximum bit width of the input data is 128 bits, and the bit width of the input value of the mantissa exponent operation unit is 113 bits. However, since multiplication and shift operations are required in the calculation, higher requirements are imposed on the bit widths of x and y. Through analysis of 16 parallel operations in the iterative process, the maximum multiplication factor is 35, and the sign bit of 1 bit is supplemented, so that the requirement of calculation can be met only by supplementing at least 8 bits before the input mantissa value. The precision of the lookup table is 128 bits, the x and y initial values of the 128 bits are taken, 8 leading zeros are added in front, and the x and y are taken as 136 bits of data to carry out iteration. After the compensation, the whole calculation process theoretically requires 37 iterations of (128/4) +4+ 1. Where 128 is the original number of iterations, 1 is four predictions and needs more than the last 129 iterations, and 4 represents 4 single iterations with 4 iterations, 13, 40, and 121. Thus, ultimately, 37 clock cycles are required to achieve 113-bit calculation accuracy.
In the four-prediction CORDIC algorithm, in order to further reduce the computation delay, 16 sets of parallel computation modes are adopted for x and y in addition to z, and then x and y are selectively output according to the minimum value of the z computation result. And determining whether the iteration mode of each time is four times of prediction iteration or single time of repeated iteration by the loop control unit according to the clock, wherein the four times of prediction calculation unit adopts a four times of prediction CORDIC algorithm, the single time of repeated iteration calculation unit adopts a basic CORDIC algorithm to carry out single time of repeated iteration, and the lookup table unit outputs and calculates the required iteration angle value according to logic control.
In a specific implementation, the Z data path calculates the angle value predicted every four times, obtains the smallest one as the numerical value of the four iterations, i.e., the value of the t-th iteration, and outputs S ═ σ corresponding to the smallest valuei,σi+1,σi+2,σi+3]To the X data path and the Y data path. Let [ sigma ]ii+1i+2i+3]From [ -1, -1, -1, -1]To [1,1,1,1]Selecting the minimum absolute value of the 16 calculation results and outputting the minimum absolute value, and outputting the minimum absolute value corresponding to S ═ σ [ σ ]i,σi+1,σi+2,σi+3]。
And the X data path and the Y data path carry out iterative operation on the X and Y values, and the X and Y values of four iterations are predicted. Taking x as an example (y is similar to x), as shown in FIG. 6, the head is enteredThe method firstly enters a constant multiplier calculation unit. The coefficients of x used are each 2-(4i+6)、2-(2i+5)The coefficient of this series, the y coefficients used are 2-(i+3)、2-(3i+6)A list of coefficients. Integration by coefficient table: the results needed to be used were: x, 9x, 15x, 19x, 21x, 35x and y, 3y, 5y, 7y, 9y, 11y, 13y, 15 y. And carrying out constant calculation on the input x and y values and inputting the values to the shifting unit. The constant calculation unit adopts a shifting method, and example 35x is the sum of 32x, 2x and x, wherein 32x is x shifted left by 5 bits, and 2x is x shifted left by 1 bit.
Output calculation unit: output 0-output 15 correspond to [ sigma ]ii+1i+2i+3]=[-1,-1,-1,-1]The iteration result of (2). For example: the calculation formula of output0 is shown in formula (23):
Figure BDA0002869567740000111
in the shift operation unit, x × 2-(4i+6)Shift x right by (4i +6) bits, and so on. All possible result values of the set 16 are calculated in parallel, and finally the s value input by the address selector selects the result of the four predictions of the round to be output.
The floating point regularization module receives the addition result of the mantissa iteration part X and Y to obtain a 2X value, and the shifted value of the X, the sign bit S and the exponent part Y need to be processed against three data to output a standard floating point number.
The specific structure of the floating point regularization module is shown in fig. 7, and the main steps are as follows:
(1) and (3) leading zeros possibly exist in the X value of the result of the mantissa iteration part, leading zero detection is carried out on X to obtain the number n of the leading zeros, the X is shifted by n bits to the left to eliminate the leading zeros, and the corresponding Y is shifted by n bits to the right.
(2) In fact, a 128-bit floating point number only has 112 bits of mantissa, and the shifted data needs to be rounded and truncated to 112 bits.
(3) The rounded mantissa portion M and Y, S are normalized to output a standard floating point number, and the regularization of Y is calculated as Y + 16383.
In this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process or method.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims (9)

1. A method for performing floating-point exponent operations, comprising the steps of:
s1, preprocessing: the method comprises two parts of exception processing and input value preprocessing, wherein an input floating point exponential function M multiplied by 2 is firstly inputESplitting the input value into a sign bit S, an exponent E and a mantissa M, detecting whether the floating point exponent function is abnormal or not according to the exponent E, and if the input value is not abnormal, assuming that the floating point exponent function is MX 2EIs X, is Y, and is the mantissa M multiplied by (ln2)-1And moving the E bit to the left, and obtaining the mantissa part X by solving for 2X according to the convergence requirement of the CORDIC algorithm, thereby satisfying the formula
Figure FDA0002869567730000011
Defining 148-bit width num1 according to 2X, specifically: num1 ═ M × 2E-Y x ln2+ ln2, num1 shift-spliced by a multiplier to num 2;
s2, exponential function mantissa iteration: receiving num2 obtained by S1 preprocessing, performing four-time prediction iterative calculation on an X data path, a Y data path and a Z data path on an input value num2 through a four-time prediction CORDIC algorithm to obtain an X value and a Y value of 136 bits of output, inputting the X value and the Y value into an adder for operation to obtain a 2X operation result, and further obtaining a calculation result mantissa value X;
s3, floating point regularization: and leading zero detection is carried out on the mantissa value X and the exponent value Y obtained by processing in the steps S1 and S2, then the mantissa value X and the exponent value Y are converted into a standard floating point format through shift operation, and finally the mantissa value X and the exponent value Y are combined with a sign bit S to carry out normalized format output.
2. The method of claim 1, wherein the formula of the fourth prediction CORDIC algorithm in the step S2 is:
Figure FDA0002869567730000012
wherein σii+1i+2i+3Is a sign factor with the value of-1 or 1 and represents the predicted rotation direction of the current iteration, i represents the number of the four predicted iterations, and thetai,θi+1,θi+2,θ3Representing four angles of rotation, xi,yi,ziInitial values, X, representing the same principle of the ith four prediction iterations of the X data path, the Y data path and the Z datai+4,yi+4,zi+4And (4) representing the iteration results of the ith four prediction iterations of the X data path, the Y data path and the Z data path.
3. The method of claim 2, wherein the iterative calculation of four predictions for the X data path, the Y data path and the Z data path by the four-prediction CORDIC algorithm in step S2 is implemented as:
(1) traverse sigmaii+1i+2i+3Each takes the value of-1 or 1 to form 16 sigmaii+1i+2i+3Value, Z data path according to 16 kinds of sigmaii+1i+2i+3Carrying out value calculation on a corresponding z value;
(2) by selecting from the 16 results of zThe value of the nearest 0 as the number of the four prediction iterations, i.e., i, z, and the corresponding σii+1i+2i+3The predicted rotation direction S ═ σ as the current four prediction iterationsii+1i+2i+3]Outputs to X and Y data paths, the X and Y data paths being responsive to the direction of rotation S [ [ sigma ] ] passed theretoii+1i+2i+3]And calculating to obtain the x value and the y value of the result of the current four prediction iterations.
4. The method of claim 2, wherein the second implementation procedure of performing four iterative prediction calculations on the X data path, the Y data path, and the Z data path by using the four-prediction CORDIC algorithm in step S2 is as follows:
(1) traverse sigmaii+1i+2i+3Each takes the value of-1 or 1 to form 16 sigmaii+1i+2i+3Taking values for X data path, Y data path and Z data path separately for each sigmaii+1i+2i+3Carrying out parallel calculation on values, and simultaneously generating 16 groups of iteration results;
(2) selecting the value of the maximum approximate 0 of the 16 results of the Z data path as the value of the prediction iteration of the time, namely, the value of the maximum approximate 0, and then, taking the value of the sigma corresponding to the result of the value of the maximum approximate 0 as the value of the prediction iteration of the time, namely, the maximum approximate 0 of the 16 results of the Z data pathii+1i+2i+3And directly taking the generated X value of the iteration result of the X data path and the generated Y value of the iteration result of the Y data path as the result of the current four prediction iterations.
5. An apparatus for performing a floating-point exponent operation, the apparatus comprising:
a preprocessing module: the method is used for exception handling and input value preprocessing of an input floating-point exponential function, and firstly, the input floating-point exponential function is processed by Mx 2ESplitting the floating point exponent function into a sign bit S, an exponent E and a mantissa M, detecting whether the floating point exponent function is abnormal or not according to the exponent E, and if the input value is not normalIn the presence of exceptions, assuming a floating-point exponential function of Mx 2EIs X, is Y, and is the mantissa M multiplied by (ln2)-1And moving the E bit to the left, and obtaining the mantissa part X by solving for 2X according to the convergence requirement of the CORDIC algorithm, thereby satisfying the formula
Figure FDA0002869567730000021
Defining 148-bit width num1 according to 2X, specifically: num1 ═ M × 2E-Y x ln2+ ln2, num1 shift-spliced by a multiplier to num 2;
an exponential function mantissa iteration module: the device is used for receiving num2 obtained by S1 preprocessing, performing four-time prediction iterative calculation on an X data path, a Y data path and a Z data path on an input value num2 through a four-time prediction CORDIC algorithm to obtain an X value and a Y value of 136 bits of output, inputting the X value and the Y value into an adder for operation to obtain a 2X operation result, and further obtaining a calculation result mantissa value X;
a floating point regularization module: the method is used for leading zero detection of the mantissa value X and the exponent value Y obtained by processing in steps S1 and S2, then converting the mantissa value X and the exponent value Y into a standard floating point format through shift operation, and finally combining the mantissa value X and the exponent value Y with a sign bit S to output in a normalized format.
6. The apparatus of claim 5, wherein the formula for calculating the four-prediction CORDIC algorithm in the exponential function mantissa iteration module is as follows:
Figure FDA0002869567730000031
wherein σii+1i+2i+3Is a sign factor with the value of-1 or 1 and represents the predicted rotation direction of the current iteration, i represents the number of the four predicted iterations, and thetai,θi+1,θi+2,θ3Representing four angles of rotation, xi,yi,ziRepresents the ith four prediction iterations of X data path and Y data pathInitial value of Z data, xi+4,yi+4,zi+4And (4) representing the iteration results of the ith four prediction iterations of the X data path, the Y data path and the Z data path.
7. The apparatus of claim 6, wherein the iterative computation of four prediction iterations by the four prediction CORDIC algorithm for the X, Y, and Z data paths in the exponential function mantissa iteration module is implemented by:
(1) traverse sigmaii+1i+2i+3Each takes the value of-1 or 1 to form 16 sigmaii+1i+2i+3Value, Z data path according to 16 kinds of sigmaii+1i+2i+3Carrying out value calculation on a corresponding z value;
(2) selecting the value of the 16 results of z, which is most approximate to 0, as the value of the four prediction iterations, i.e., the value of the z, i.e., the value of the corresponding sigmaii+1i+2i+3The predicted rotation direction S ═ σ as the current four prediction iterationsii+1i+2i+3]Outputs to X and Y data paths, the X and Y data paths being responsive to the direction of rotation S [ [ sigma ] ] passed theretoii+1i+2i+3]And calculating to obtain the x value and the y value of the result of the current four prediction iterations.
8. The apparatus of claim 6, wherein the second implementation procedure of the iterative arithmetic of mantissas of exponential function mantissa module for performing four iterative computations of prediction on the X data path, the Y data path, and the Z data path by using four iterative predictive CORDIC algorithm is as follows:
(1) traverse sigmaii+1i+2i+3Each takes the value of-1 or 1 to form 16 sigmaii+1i+2i+3Taking values for each of the X, Y and Z data pathsσii+1i+2i+3Carrying out parallel calculation on values, and simultaneously generating 16 groups of iteration results;
(2) selecting the value, which is most approximate to 0, of the 16 results of the Z data path as the numerical value, namely, the ite _ Z, of the four prediction iterations, and then, enabling the sigma corresponding to the result of the ite _ Z to beii+1i+2i+3And directly taking the generated X value of the iteration result of the X data path and the generated Y value of the iteration result of the Y data path as the result of the current four prediction iterations.
9. The apparatus of claim 5, wherein the exponent function mantissa iteration module comprises: a fourth prediction Z data path calculation unit; a single iteration Z data path calculation unit; a four-prediction X data path calculation unit; a single iteration X data path calculation unit; a four-prediction Y data path calculation unit; a single iteration Y data path calculation unit; the iteration loop control unit is used for performing iteration loop control by using the counter and determining whether each iteration mode is four prediction iterations or single iteration; the lookup table unit is used for calculating a required iteration angle value according to the output of the iteration cycle control unit; the selector unit is used for selectively outputting the calculation result of the fourth-time prediction Z data path calculation unit and the calculation result of the single-time iteration Z data path calculation unit, selectively outputting the calculation result of the fourth-time iteration X data path calculation unit and the calculation result of the single-time iteration X data path calculation unit, and selectively outputting the calculation result of the fourth-time iteration Y data path calculation unit and the calculation result of the single-time iteration Y data path calculation unit.
CN202011592456.3A 2020-12-29 2020-12-29 Method and device for executing floating-point exponential operation Pending CN112783469A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011592456.3A CN112783469A (en) 2020-12-29 2020-12-29 Method and device for executing floating-point exponential operation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011592456.3A CN112783469A (en) 2020-12-29 2020-12-29 Method and device for executing floating-point exponential operation

Publications (1)

Publication Number Publication Date
CN112783469A true CN112783469A (en) 2021-05-11

Family

ID=75753237

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011592456.3A Pending CN112783469A (en) 2020-12-29 2020-12-29 Method and device for executing floating-point exponential operation

Country Status (1)

Country Link
CN (1) CN112783469A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114201140A (en) * 2021-12-16 2022-03-18 千芯半导体科技(北京)有限公司 Exponential function processing unit, method and neural network chip

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114201140A (en) * 2021-12-16 2022-03-18 千芯半导体科技(北京)有限公司 Exponential function processing unit, method and neural network chip

Similar Documents

Publication Publication Date Title
CN105468331B (en) Independent floating point conversion unit
US7921149B2 (en) Division and square root arithmetic unit
US20210349692A1 (en) Multiplier and multiplication method
KR100824189B1 (en) Processing method and computer system for summation of floating point data
KR100241076B1 (en) Floating- point multiply-and-accumulate unit with classes for alignment and normalization
US5993051A (en) Combined leading one and leading zero anticipator
JPH0542011B2 (en)
US6988119B2 (en) Fast single precision floating point accumulator using base 32 system
JP5640081B2 (en) Integer and multiply-add operations with saturation
US6728739B1 (en) Data calculating device and method for processing data in data block form
CN111936965A (en) Random rounding logic
JP3345894B2 (en) Floating point multiplier
CN116400883A (en) Floating point multiply-add device capable of switching precision
JP2585649B2 (en) Division circuit
US20020129075A1 (en) Apparatus and method of performing addition and rounding operation in parallel for floating-point arithmetic logical unit
US20040267853A1 (en) Method and apparatus for implementing power of two floating point estimation
US5408426A (en) Arithmetic unit capable of performing concurrent operations for high speed operation
US6847986B2 (en) Divider
CN112783469A (en) Method and device for executing floating-point exponential operation
CN116643718B (en) Floating point fusion multiply-add device and method of pipeline structure and processor
US5170371A (en) Method and apparatus for rounding in high-speed multipliers
CN116450085A (en) Extensible BFLoat 16-point multiplication arithmetic unit and microprocessor
US20050188000A1 (en) Adder
CN112860218B (en) Mixed precision arithmetic unit for FP16 floating point data and INT8 integer data operation
CN112783470A (en) Device and method for executing floating point logarithm operation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination