CN112783469A - Method and device for executing floating-point exponential operation - Google Patents
Method and device for executing floating-point exponential operation Download PDFInfo
- Publication number
- CN112783469A CN112783469A CN202011592456.3A CN202011592456A CN112783469A CN 112783469 A CN112783469 A CN 112783469A CN 202011592456 A CN202011592456 A CN 202011592456A CN 112783469 A CN112783469 A CN 112783469A
- Authority
- CN
- China
- Prior art keywords
- value
- data path
- iteration
- prediction
- mantissa
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/483—Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/483—Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
- G06F7/485—Adding; Subtracting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/483—Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
- G06F7/487—Multiplying; Dividing
Abstract
The invention discloses a method and a device for executing floating-point exponential operation, wherein a preprocessing module comprises the following steps: comprises performing exception processing on input floating point exponential function, and preprocessing with multiplier to obtain output valuenum2 and the exponent part of the result of the floating point exponential functionY(ii) a An exponential function mantissa iteration module: output value of preprocessing module by four times of predicting CORDIC algorithmnum2, performing four times of prediction iterative calculation of the X data path, the Y data path and the Z data path to obtainxValue sumyValue of willxValue sumyThe value is input into an adder to be operated to obtain the mantissa part of the calculation result of the floating-point exponential functionX(ii) a A floating point regularization module: mantissa valueXAnd value of the indexYLeading zero detection is carried out, then the detection is converted into a standard floating point format through shift operation, and finally the detection is combined with a sign bitSAre combined and combinedAnd outputting the row normalized format. The method uses the quartic prediction CORDIC algorithm to carry out the exponential operation of the floating point number, predicts the quartic iteration direction once, greatly reduces the iteration times and shortens the calculation period of the exponential function.
Description
Technical Field
The present invention relates to the field of exponentiation, and more particularly, to an apparatus and method for performing floating-point exponentiation.
Background
In real life, many aspects require the calculation of an exponential function. For example: aircraft control, voice transmission, navigation and the like in the aviation field; image and real-time information transmission in the aerospace field; the calculation of the floating point exponential function with high precision is also involved in the calculation of the payback in interest calculation in the financial field. The precision and the calculation speed of the exponential operation are improved, and the method has important significance for practical application.
In the design of an integrated circuit, due to the limitations of conditions such as a manufacturing process, a chip area and the like, a traditional floating-point exponent hardware operation unit is simple in structure and low in speed and is difficult to meet the calculation requirement, so that the floating-point exponent operation is realized by combining multiple mathematical transformations and a software method in practice. The method is easy to implement, but the operation efficiency is not high, and the method becomes a calculation bottleneck which is difficult to break through under the requirement of high-precision calculation.
Disclosure of Invention
The invention aims at the problems and provides a method and a device for executing floating point exponential operation, which utilize a novel improved quartic prediction CORDIC algorithm to predict four iteration direction values in one clock period, and improve the efficiency by four times.
The technical scheme of the invention is as follows: a method for performing a floating-point exponent operation is provided, comprising the steps of:
s1, preprocessing: the method comprises two parts of exception processing and input value preprocessing, wherein an input floating point exponential function M multiplied by 2 is firstly inputESplitting the input value into a sign bit S, an exponent E and a mantissa M, detecting whether the floating point exponent function is abnormal or not according to the exponent E, and if the input value is not abnormal, assuming that the floating point exponent function is MX 2EIs X, is Y, and is the mantissa M multiplied by (ln2)-1And moving the E bit to the left, and obtaining the mantissa part X by solving for 2X according to the convergence requirement of the CORDIC algorithm, thereby satisfying the formulaDefining 148-bit width num1 according to 2X, specifically: num1 ═ M × 2E-Y x ln2+ ln2, num1 shift-spliced by a multiplier to num 2;
s2, exponential function mantissa iteration: receiving num2 obtained by S1 preprocessing, performing four-time prediction iterative calculation on an X data path, a Y data path and a Z data path on an input value num2 through a four-time prediction CORDIC algorithm to obtain an X value and a Y value of 136 bits of output, inputting the X value and the Y value into an adder for operation to obtain a 2X operation result, and further obtaining a calculation result mantissa value X;
s3, floating point regularization: and leading zero detection is carried out on the mantissa value X and the exponent value Y obtained by processing in the steps S1 and S2, then the mantissa value X and the exponent value Y are converted into a standard floating point format through shift operation, and finally the mantissa value X and the exponent value Y are combined with a sign bit S to carry out normalized format output.
Further, the calculation formula of the fourth prediction CORDIC algorithm in step S2 is as follows:
wherein σi,σi+1,σi+2,σi+3Is a sign factor with the value of-1 or 1 and represents the predicted rotation direction of the current iteration, i represents the number of the four predicted iterations, and thetai,θi+1,θi+2,θ3Representing four angles of rotation, xi,yi,ziInitial values, X, representing the same principle of the ith four prediction iterations of the X data path, the Y data path and the Z datai+4,yi+4,zi+4And (4) representing the iteration results of the ith four prediction iterations of the X data path, the Y data path and the Z data path.
Further, the implementation process of performing four prediction iterative calculations on the X data path, the Y data path, and the Z data path by using the four prediction CORDIC algorithm in step S2 is as follows:
(1) traverse sigmai,σi+1,σi+2,σi+3Each takes the value of-1 or 1 to form 16 sigmai,σi+1,σi+2,σi+3Value, Z data path according to 16 kinds of sigmai,σi+1,σi+2,σi+3Carrying out value calculation on a corresponding z value;
(2) selecting the value of the 16 results of z, which is most approximate to 0, as the value of the four prediction iterations, i.e., the value of the z, i.e., the value of the corresponding sigmai,σi+1,σi+2,σi+3The predicted rotation direction S ═ σ as the current four prediction iterationsi,σi+1,σi+2,σi+3]Outputs to X and Y data paths, the X and Y data paths being responsive to the direction of rotation S [ [ sigma ] ] passed theretoi,σi+1,σi+2,σi+3]And calculating to obtain the x value and the y value of the result of the current four prediction iterations.
Further, the second implementation procedure of performing four prediction iterative computations on the X data path, the Y data path, and the Z data path by the four prediction CORDIC algorithm in step S2 is as follows:
(1) traverse sigmai,σi+1,σi+2,σi+3Each takes the value of-1 or 1 to form 16 sigmai,σi+1,σi+2,σi+3Taking values for X data path, Y data path and Z data path separately for each sigmai,σi+1,σi+2,σi+3Carrying out parallel calculation on values, and simultaneously generating 16 groups of iteration results;
(2) selecting the value of the maximum approximate 0 of the 16 results of the Z data path as the value of the prediction iteration of the time, namely, the value of the maximum approximate 0, and then, taking the value of the sigma corresponding to the result of the value of the maximum approximate 0 as the value of the prediction iteration of the time, namely, the maximum approximate 0 of the 16 results of the Z data pathi,σi+1,σi+2,σi+3And directly taking the generated X value of the iteration result of the X data path and the generated Y value of the iteration result of the Y data path as the result of the current four prediction iterations.
In another aspect of the present invention, an apparatus for performing a floating-point exponent operation is provided, including:
a preprocessing module: the method is used for exception handling and input value preprocessing of an input floating-point exponential function, and firstly, the input floating-point exponential function is processed by Mx 2ESplit into a sign bit S, an exponent E and a mantissaM, detecting whether the floating-point exponential function has abnormality or not according to the exponent E, and if the input value has no abnormality, assuming that the floating-point exponential function is Mx 2EIs X, is Y, and is the mantissa M multiplied by (ln2)-1And moving the E bit to the left, and obtaining the mantissa part X by solving for 2X according to the convergence requirement of the CORDIC algorithm, thereby satisfying the formulaDefining 148-bit width num1 according to 2X, specifically: num1 ═ M × 2E-Y x ln2+ ln2, num1 shift-spliced by a multiplier to num 2;
an exponential function mantissa iteration module: the device is used for receiving num2 obtained by S1 preprocessing, performing four-time prediction iterative calculation on an X data path, a Y data path and a Z data path on an input value num2 through a four-time prediction CORDIC algorithm to obtain an X value and a Y value of 136 bits of output, inputting the X value and the Y value into an adder for operation to obtain a 2X operation result, and further obtaining a calculation result mantissa value X;
a floating point regularization module: the method is used for leading zero detection of the mantissa value X and the exponent value Y obtained by processing in steps S1 and S2, then converting the mantissa value X and the exponent value Y into a standard floating point format through shift operation, and finally combining the mantissa value X and the exponent value Y with a sign bit S to output in a normalized format.
Further, the calculation formula of the four prediction CORDIC algorithm in the exponential function mantissa iteration module is as follows:
wherein σi,σi+1,σi+2,σi+3Is a sign factor with the value of-1 or 1 and represents the predicted rotation direction of the current iteration, i represents the number of the four predicted iterations, and thetai,θi+1,θi+2,θ3Representing four angles of rotation, xi,yi,ziInitial values, X, representing the same principle of the ith four prediction iterations of the X data path, the Y data path and the Z datai+4,yi+4,zi+4And (4) representing the iteration results of the ith four prediction iterations of the X data path, the Y data path and the Z data path.
Further, the implementation process of performing four prediction iterative calculations on the X data path, the Y data path, and the Z data path by the four prediction CORDIC algorithm in the exponential function mantissa iterative module is as follows:
(1) traverse sigmai,σi+1,σi+2,σi+3Each takes the value of-1 or 1 to form 16 sigmai,σi+1,σi+2,σi+3Value, Z data path according to 16 kinds of sigmai,σi+1,σi+2,σi+3Carrying out value calculation on a corresponding z value;
(2) selecting the value of the 16 results of z, which is most approximate to 0, as the value of the four prediction iterations, i.e., the value of the z, i.e., the value of the corresponding sigmai,σi+1,σi+2,σi+3The predicted rotation direction S ═ σ as the current four prediction iterationsi,σi+1,σi+2,σi+3]Outputs to X and Y data paths, the X and Y data paths being responsive to the direction of rotation S [ [ sigma ] ] passed theretoi,σi+1,σi+2,σi+3]And calculating to obtain the x value and the y value of the result of the current four prediction iterations.
Further, a second implementation process of performing four prediction iterative computations on the X data path, the Y data path, and the Z data path by using a four prediction CORDIC algorithm in the exponential function mantissa iterative module is as follows:
(1) traverse sigmai,σi+1,σi+2,σi+3Each takes the value of-1 or 1 to form 16 sigmai,σi+1,σi+2,σi+3Taking values for X data path, Y data path and Z data path separately for each sigmai,σi+1,σi+2,σi+3Carrying out parallel calculation on values, and simultaneously generating 16 groups of iteration results;
(2) selecting the value, which is most approximate to 0, of the 16 results of the Z data path as the numerical value, namely, the ite _ Z, of the four prediction iterations, and then, enabling the sigma corresponding to the result of the ite _ Z to bei,σi+1,σi+2,σi+3And directly taking the generated X value of the iteration result of the X data path and the generated Y value of the iteration result of the Y data path as the result of the current four prediction iterations.
Further, the exponential function mantissa iteration module comprises: a fourth prediction Z data path calculation unit; a single iteration Z data path calculation unit; a four-prediction X data path calculation unit; a single iteration X data path calculation unit; a four-prediction Y data path calculation unit; a single iteration Y data path calculation unit; the iteration loop control unit is used for performing iteration loop control by using the counter and determining whether each iteration mode is four prediction iterations or single iteration; the lookup table unit is used for calculating a required iteration angle value according to the output of the iteration cycle control unit; the selector unit is used for selectively outputting the calculation result of the fourth-time prediction Z data path calculation unit and the calculation result of the single-time iteration Z data path calculation unit, selectively outputting the calculation result of the fourth-time iteration X data path calculation unit and the calculation result of the single-time iteration X data path calculation unit, and selectively outputting the calculation result of the fourth-time iteration Y data path calculation unit and the calculation result of the single-time iteration Y data path calculation unit.
The method and the device for executing the floating-point exponential operation, provided by the invention, realize 128-bit high-precision floating-point exponential function operation, and have the beneficial effects that:
1. the improved quartic prediction CORDIC algorithm is used for conducting floating point number exponential operation, the quartic iteration direction is predicted by one time clock, the iteration times are greatly reduced, and the calculation period of the exponential function is shortened.
2. The floating-point exponential function arithmetic device is designed, and 113-bit full-precision output can be met under the condition that input and output are 128-bit floating-point numbers.
Drawings
FIG. 1 is a diagram illustrating a flow of floating-point exponential function operations according to an embodiment of the present invention;
FIG. 2 is a simplified structural diagram of an apparatus for implementing floating-point exponential function operation according to an embodiment of the present invention;
FIG. 3 is a schematic flow diagram of a pre-processing module according to an embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating a logic control method for four iterations of the predictive CORDIC algorithm according to an embodiment of the present invention;
FIG. 5 is a block diagram illustrating an exponential function mantissa iteration module according to an embodiment of the present invention;
FIG. 6 is a schematic diagram illustrating an iterative X data path of a four-prediction CORDIC algorithm according to an embodiment of the present invention;
FIG. 7 is a block diagram of a floating point regularization module according to an embodiment of the present invention.
Detailed Description
In order to further describe the technical scheme of the present invention in detail, the present embodiment is implemented on the premise of the technical scheme of the present invention, and detailed implementation modes and specific steps are given.
According to the standard rule IEE754 of floating-point function, the exponent floating-point number to be calculated in the embodiment of the invention is Mx 2EAssume that the mantissa portion of the calculation result is X and the exponent portion is Y. Then the calculation result of the corresponding exponential function is as follows (1):
modification of formula (2) yields formula (3):
since Y is an integer, the value of Y can be obtained by equation (3), and only the value of X is required, and the value of X needs to be preprocessed, assuming that:
2Y=eZ (4)
then it is possible to obtain:
Z=Y ln2 (5)
the expression for X, i.e., formula (6), can be obtained:
wherein Y is obtained by the formula (3), and the value range of X is as follows: x is more than 0.5 and less than 1.
From the range of the exponential function: mx 2EYln2 must fall within the convergence region of CORDIC [ -1.1182,1.1182 [ ]]And (4) the following steps. In this embodiment, the fourth prediction CORDIC algorithm is newly improved to calculate X, and the fourth prediction CORDIC algorithm is based on a rotation vector of the CORDIC algorithm under hyperbolic coordinate, and a corresponding CORDIC hyperbolic iteration formula is as follows:
direction of rotation sigmaiThe judgment of (1) is:
Thus, after N iterations one can obtain:
wherein, K is a rotation compensation factor, and can be known through the definition of a hyperbolic function:
eθ=coshθ+sinhθ (10)
e is calculated by adding the iterated x, y values and multiplying by the correction factorθThe value is obtained.
The CORDIC algorithm under the hyperbolic coordinate has the defect of overlarge iteration times under the high-precision condition, and the novel quartic prediction CORDIC algorithm is adopted for the hyperbolic coordinate system. The iteration times of the CORDIC are effectively reduced, and the operation index duration is shortened. The improved algorithm can predict four iteration direction values in one clock period, the efficiency is improved by four times, and each direction value can be judged according to the difference values of the four predicted angle values and the z value, so that the error is effectively reduced.
On the traditional CORDIC model equation, four σ values are predicted over one clock cycle (four σ values are predicted backwards per clock)i,σi+1,σi+2,σi+3Value), substituting the calculated formula (11) into the algorithm for obtaining four predicted CORDIC in turn:
due to sigmai,σi+1,σi+2,σi+3The values of the two elements are only-1 and 1, and the number of the case branches is only limited to 16, so that the first superposition mode in the rotation mode is as follows: and calculating 16 corresponding z values, and if the current time is in a rotation mode, selecting the z result which is closest to 0 in the 16 results of z as the z result of the current iteration. And corresponding sigmai,σi+1,σi+2,σi+3As the predicted rotation direction for the current iteration, the X and Y data paths are passed along with the predicted rotation direction S [ [ σ ] ]i,σi+1,σi+2,σi+3]And calculating to obtain the x value and the y value of the result of the current four-time prediction iteration, and finishing the current four-time prediction iteration.
When sigma isi,σi+1,σi+2,σi+3When the value is 1, the rotation is carried out towards the positive direction, namely, the required angle is approached, and at the moment, the following steps are carried out:
zi+1=zi-σiθi (12)
when sigma isi,σi+1,σi+2,σi+3When the angle is-1, the rotation is in the opposite direction, namely, the rotation is away from the required angle, and then:
zi+1=zi+σiθi (13)
in the case of data calculation with a bit width of n, the conventional CORDIC algorithm needs n iterations, and the number of iterations required is predicted to be reduced to n/4 by the four-time predictive CORDIC algorithm. Therefore, the biggest defect of the CORDIC algorithm, namely the delay problem, is solved, and the calculation time is greatly shortened.
In an embodiment of a method for performing floating-point exponent operation, as shown in fig. 1, the operation flow diagram of a floating-point exponent function is shown, where an input and a final result of the floating-point exponent function should both be floating-point numbers, when a 128-bit floating-point number num _ in is input, the method first enters a preprocessing module to perform exception judgment on an input floating point, and if the input floating-point number does not satisfy the floating-point number standard, an exception is thrown, and at this time, a sign signal is 0; if the input signal is correct, sign is 1, the input signal enters a preprocessing module, and the input 128-bit floating point number num _ in is split into a sign bit S, an exponent part E and a mantissa M according to a floating point representation specification, as shown in a formula (14):
the main design idea of the invention is to calculate the value of Y and then calculate the value of X according to the value of Y, the value of M and the value of E. The value range of X is [0.5,1] as derived from the above. According to the property of the exponential function, the corresponding abscissa of the rotation vector is negative at the moment, and the convergence requirement of the CORDIC algorithm is not met. And 2X takes on the value [1,2], when the rotation vector falls in the convergence region of CORDIC, so the value of 2X is calculated here, and the expression of 2X is shown in equation (15):
define 148 bits wide num 1:
num1=M×2E-Y×ln2+ln2 (16)
to ensure calculation accuracy, the lookup table bit width is defined as 128 bits, and num1 is set to a 130-bit mantissa iteration input value num2 in consideration of the sign bit and the carry bit.
Num2 is input into a mantissa iteration part, 136-bit x and y values are output through mantissa iteration operation, and due to the fact that shift operation is needed in the iteration process, the length x and y of a 128-bit word is increased by 8-bit compensation bit width, and precision loss caused by shift is prevented. According to the CORDIC hyperbolic model theorem, the formula (17) is shown in this case:
the formula for obtaining Mx is shown in formula (18) by inputting x and y to the adder:
the Mx bit width is 128, because the 2X value calculated at this time is not necessarily a standard mantissa in the floating point rule, and there may be leading zeros. Therefore, the number of front zeros needs to be detected by the front zero detection unit, and the front zeros are converted into a standard floating point format through the shift operation, so that the redundant bit width can reduce the precision loss generated in the shift process. And carrying out corresponding shift operation on the Y to ensure that the floating point number is not changed as a whole.
In fact, the 128-bit floating point number in the IEEE floating point specification has only 112 bits of mantissa, and it is necessary to round and truncate the shifted data to 112 bits, and then output the standard floating point number in a normalized format with Y, S.
The apparatus for performing floating-point exponent operation as shown in fig. 2 includes a preprocessing module: the method is used for exception handling and input value preprocessing of an input floating-point exponential function, and firstly, the input floating-point exponential function is processed by Mx 2ESplitting the data into a sign bit S, an exponent E and a mantissa M, and detecting a floating point exponent according to the exponent EWhether the number function has abnormality or not, if the input value has no abnormality, the floating point exponential function M multiplied by 2 is assumedEIs X, is Y, and is the mantissa M multiplied by (ln2)-1And moving the E bit to the left, and obtaining the mantissa part X by solving for 2X according to the convergence requirement of the CORDIC algorithm, thereby satisfying the formulaDefining 148-bit width num1 according to 2X, specifically: num1 ═ M × 2E-Y × ln2+ ln2, shift stitching num1 to get num 2.
The specific implementation mode is as follows: as shown in fig. 3, first, exception determination processing is performed on an input 128-bit floating point number. For the exponential function, there are only two kinds of input abnormal conditions, which are: NaN and ∞, wherein both the two cases need to judge E first, as shown in FIG. 3, where f-NaN indicates that the input value does not exist, and f-no indicates that the input value is infinite, which are both cases of input errors; and when no problem exists in the input, entering a Y calculation module, calculating a num2 value by using a multiplier after the Y value is obtained by calculation, and ending the preprocessing module.
The Y value is specifically calculated as follows: from the above derivation, the value of Y is the mantissa M multiplied by (ln2)-1Moving the E bit to the left will divide by ln2 to a sum (ln2)-1The multiplication of (2).
The mantissa M plus the concealment precision bit is actually 113 bits, will (ln2)-1Denoted as parameter a.
To avoid loss of precision, let 226-bit wide data numm be defined as:
from the above derivation:
Y-1<numm×2E<Y (21)
at this time, the calculation of Y can be obtained only by adding 1 to the integer part of the numm shift E bit, and since the bit width of Y is 15 bits, the exponent E is discussed in different cases:
(1) when E is greater than 0: the decimal point needs to be shifted to the right by E bit, at this time, the bit width of Y is 15, and if the decimal point is shifted to (225-14), the value of the integer part before the decimal point cannot be taken by Y, which may lose precision, resulting in an error result. Therefore, E < 13 is specified to ensure accuracy. The range of E is defined here to ensure the accuracy of the calculation, and is defined in a small range. The value range of E can be influenced by adopting different data expansion methods. Let b be num [225 (224-E) ], then Y be b +1, and the mantissa portion be num [ (225-E):0 ];
(2) when E is 0, the first two digits of the decimal point of numm are the integer part, and when c is numm [225:224], Y is c +1. The mantissa portion is num mm [223:0 ];
(3) when E is equal to-1, the decimal point is shifted left by one digit, and d is equal to num [225], then Y is equal to d +1, and the mantissa is num [224:0 ];
(4) when E is less than or equal to-2, no significant digit exists before decimal point, and when Y is equal to 1. Mantissa significand numm [225:0], preceded by (E +2) significant zeros.
The input limit of the input value E is 16396, and after the Y value is obtained, the next step needs to be obtainedThe exponent part of (2) is input into the mantissa iteration part to solve the value of X. Since 2X belongs to (1,2), we find the exponent part of 2X, defined as the number num1 of 148 bits precision:
num1=M×2E-Y×ln2+ln2 (22)
and (5) carrying out displacement splicing on num1 to obtain num2, and finishing the pretreatment.
An exponential function mantissa iteration module: receiving num2 obtained by the preprocessing module, and implementing four-step parallel branch CORDIC algorithm operation on num2, specifically: and performing four times of prediction iterative calculation on the X data path, the Y data path and the Z data path through a four times of prediction CORDIC algorithm to obtain an X value and a Y value of 136 bits of output, inputting the X value and the Y value into an adder for operation to obtain a 2X operation result, and further obtaining a mantissa value X of the calculation result.
The specific implementation mode is as follows: the mantissa iteration module receives the pre-processing module result num2 and implements the four-times predictive CORDIC algorithm using hardware. And calculating by iteration X, y and z to finally obtain values coshnum2 and sinhnum2 corresponding to the input value num2, and transmitting the result to an addition and combination module for addition operation to obtain an index value corresponding to num2, namely the 2X value.
In the preferred embodiment, in the quartic predictive CORDIC algorithm, in order to reduce the computation delay of the iterative algorithm, a parallel computation mode is adopted for the Z data path, and 16 possible values of the rotation direction sigma are computed in parallel, and sigma enabling Z to be closest to 0 is selectedi,σi+1,σi+2,σi+3Iterative operation of the X data path and the Y data path is carried out according to the value, so that the effect of predicting the four rotation directions through one operation can be achieved.
In the preferred embodiment, in the four-prediction CORDIC algorithm rotation mode, the iteration directions of x and y are determined by the value of z, so that the calculation result of z needs to be waited. Assuming a computation delay of t for a single coordinate four-step iterationiteThen at least 2t is needediteTo complete a complete full coordinate iteration operation. Although this approach has greatly reduced the delay required for the computation compared to the conventional CORDIC algorithm, there is still room for optimization. In order to further reduce the operation delay and improve the calculation efficiency, the calculation is carried out on z and the current iteration sigma is obtainedi,σi+1,σi+2,σi+3At the same time, 16 sets of parallel calculations are also performed on x and y, respectively, in the same manner, and x, y, and z simultaneously generate 16 sets of iteration results. And the final iteration results of x and y are directly obtained by carrying out multi-path selection on sigma values generated after the calculation of z is finished. This delays the two calculations otherwise required by the calculation by 2titeThe compression is one-time calculation delay and one-time delay of the multiplexer, and the delay of the multiplexer is negligible relative to the delay of a complex calculation circuit. Equivalent to 2titeIs compressed to titeThe calculation delay is reduced by half, and the calculation is effectively improvedEfficiency.
The logic control method for the four iterations of the predictive CORDIC algorithm is shown in fig. 4, and in order to ensure convergence of the algorithm, the iteration is repeated at i ═ 1, 4, 13, 40, 3K +1. (K is the last value of i). The algorithm uses a counter cnt which is initially 1, and each time the clock rising edge cnt is equal to cnt +4, so that the prediction CORDIC algorithm is performed four times when the cnt (i in the code) takes values of 1, 5, 9, 13, and … 129. The value is closest to the ideal iteration i value when cnt is 5, 13, 41, 121, so that the convergence can be guaranteed by choosing to repeat the iteration when cnt is 5, 13, 41, 121.
The iteration starts with i ═ 1 in the code and ends with 129. A state machine may be used to implement the iterative function described above. Four states are assumed:
00: initial state: giving an initial value of x-k, y-0, z-num 2, i-1, and emptying x _ next, y _ next, z _ next;
01: four prediction CORDIC iterations: let clock i equal to i +4, if i equal to 5 or i equal to 41, then state goes to 10; if i is 13 or 121, the state goes to 11; if i is 129, the iteration is ended, and the iterated x, y and z values are output.
10: -1 repeating iteration state: and determining the sign of the fourth iteration by judging the second lower bit S [1] of the output value S of the module Z _ pre, and repeating the iteration once by making i equal to i-1. If S1 is 1, then sigma of the third iteration is 1, if S1 is 0, then sigma of the third iteration is-1. And repeating by using a basic single iteration formula, and returning to the 01 state after repeated iteration of iteration.
11: repeating the iteration state: and determining the sign of the fourth iteration by judging the lowest bit S [0] of the output value of the module Z _ pre, and repeating the iteration once by making i equal to i. If S0 is 1, σ for the fourth iteration is 1, and if S0 is 0, σ for the fourth iteration is-1. And (5) carrying out repeated iteration by using a basic single iteration formula, and returning to the state of 01 after the repeated iteration is finished.
The structure diagram of the exponent function mantissa iteration module is shown in fig. 5, and the method is performed for floating point data with quadruple precision. In the embodiment, the maximum bit width of the input data is 128 bits, and the bit width of the input value of the mantissa exponent operation unit is 113 bits. However, since multiplication and shift operations are required in the calculation, higher requirements are imposed on the bit widths of x and y. Through analysis of 16 parallel operations in the iterative process, the maximum multiplication factor is 35, and the sign bit of 1 bit is supplemented, so that the requirement of calculation can be met only by supplementing at least 8 bits before the input mantissa value. The precision of the lookup table is 128 bits, the x and y initial values of the 128 bits are taken, 8 leading zeros are added in front, and the x and y are taken as 136 bits of data to carry out iteration. After the compensation, the whole calculation process theoretically requires 37 iterations of (128/4) +4+ 1. Where 128 is the original number of iterations, 1 is four predictions and needs more than the last 129 iterations, and 4 represents 4 single iterations with 4 iterations, 13, 40, and 121. Thus, ultimately, 37 clock cycles are required to achieve 113-bit calculation accuracy.
In the four-prediction CORDIC algorithm, in order to further reduce the computation delay, 16 sets of parallel computation modes are adopted for x and y in addition to z, and then x and y are selectively output according to the minimum value of the z computation result. And determining whether the iteration mode of each time is four times of prediction iteration or single time of repeated iteration by the loop control unit according to the clock, wherein the four times of prediction calculation unit adopts a four times of prediction CORDIC algorithm, the single time of repeated iteration calculation unit adopts a basic CORDIC algorithm to carry out single time of repeated iteration, and the lookup table unit outputs and calculates the required iteration angle value according to logic control.
In a specific implementation, the Z data path calculates the angle value predicted every four times, obtains the smallest one as the numerical value of the four iterations, i.e., the value of the t-th iteration, and outputs S ═ σ corresponding to the smallest valuei,σi+1,σi+2,σi+3]To the X data path and the Y data path. Let [ sigma ]i,σi+1,σi+2,σi+3]From [ -1, -1, -1, -1]To [1,1,1,1]Selecting the minimum absolute value of the 16 calculation results and outputting the minimum absolute value, and outputting the minimum absolute value corresponding to S ═ σ [ σ ]i,σi+1,σi+2,σi+3]。
And the X data path and the Y data path carry out iterative operation on the X and Y values, and the X and Y values of four iterations are predicted. Taking x as an example (y is similar to x), as shown in FIG. 6, the head is enteredThe method firstly enters a constant multiplier calculation unit. The coefficients of x used are each 2-(4i+6)、2-(2i+5)The coefficient of this series, the y coefficients used are 2-(i+3)、2-(3i+6)A list of coefficients. Integration by coefficient table: the results needed to be used were: x, 9x, 15x, 19x, 21x, 35x and y, 3y, 5y, 7y, 9y, 11y, 13y, 15 y. And carrying out constant calculation on the input x and y values and inputting the values to the shifting unit. The constant calculation unit adopts a shifting method, and example 35x is the sum of 32x, 2x and x, wherein 32x is x shifted left by 5 bits, and 2x is x shifted left by 1 bit.
Output calculation unit: output 0-output 15 correspond to [ sigma ]i,σi+1,σi+2,σi+3]=[-1,-1,-1,-1]The iteration result of (2). For example: the calculation formula of output0 is shown in formula (23):
in the shift operation unit, x × 2-(4i+6)Shift x right by (4i +6) bits, and so on. All possible result values of the set 16 are calculated in parallel, and finally the s value input by the address selector selects the result of the four predictions of the round to be output.
The floating point regularization module receives the addition result of the mantissa iteration part X and Y to obtain a 2X value, and the shifted value of the X, the sign bit S and the exponent part Y need to be processed against three data to output a standard floating point number.
The specific structure of the floating point regularization module is shown in fig. 7, and the main steps are as follows:
(1) and (3) leading zeros possibly exist in the X value of the result of the mantissa iteration part, leading zero detection is carried out on X to obtain the number n of the leading zeros, the X is shifted by n bits to the left to eliminate the leading zeros, and the corresponding Y is shifted by n bits to the right.
(2) In fact, a 128-bit floating point number only has 112 bits of mantissa, and the shifted data needs to be rounded and truncated to 112 bits.
(3) The rounded mantissa portion M and Y, S are normalized to output a standard floating point number, and the regularization of Y is calculated as Y + 16383.
In this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process or method.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.
Claims (9)
1. A method for performing floating-point exponent operations, comprising the steps of:
s1, preprocessing: the method comprises two parts of exception processing and input value preprocessing, wherein an input floating point exponential function M multiplied by 2 is firstly inputESplitting the input value into a sign bit S, an exponent E and a mantissa M, detecting whether the floating point exponent function is abnormal or not according to the exponent E, and if the input value is not abnormal, assuming that the floating point exponent function is MX 2EIs X, is Y, and is the mantissa M multiplied by (ln2)-1And moving the E bit to the left, and obtaining the mantissa part X by solving for 2X according to the convergence requirement of the CORDIC algorithm, thereby satisfying the formulaDefining 148-bit width num1 according to 2X, specifically: num1 ═ M × 2E-Y x ln2+ ln2, num1 shift-spliced by a multiplier to num 2;
s2, exponential function mantissa iteration: receiving num2 obtained by S1 preprocessing, performing four-time prediction iterative calculation on an X data path, a Y data path and a Z data path on an input value num2 through a four-time prediction CORDIC algorithm to obtain an X value and a Y value of 136 bits of output, inputting the X value and the Y value into an adder for operation to obtain a 2X operation result, and further obtaining a calculation result mantissa value X;
s3, floating point regularization: and leading zero detection is carried out on the mantissa value X and the exponent value Y obtained by processing in the steps S1 and S2, then the mantissa value X and the exponent value Y are converted into a standard floating point format through shift operation, and finally the mantissa value X and the exponent value Y are combined with a sign bit S to carry out normalized format output.
2. The method of claim 1, wherein the formula of the fourth prediction CORDIC algorithm in the step S2 is:
wherein σi,σi+1,σi+2,σi+3Is a sign factor with the value of-1 or 1 and represents the predicted rotation direction of the current iteration, i represents the number of the four predicted iterations, and thetai,θi+1,θi+2,θ3Representing four angles of rotation, xi,yi,ziInitial values, X, representing the same principle of the ith four prediction iterations of the X data path, the Y data path and the Z datai+4,yi+4,zi+4And (4) representing the iteration results of the ith four prediction iterations of the X data path, the Y data path and the Z data path.
3. The method of claim 2, wherein the iterative calculation of four predictions for the X data path, the Y data path and the Z data path by the four-prediction CORDIC algorithm in step S2 is implemented as:
(1) traverse sigmai,σi+1,σi+2,σi+3Each takes the value of-1 or 1 to form 16 sigmai,σi+1,σi+2,σi+3Value, Z data path according to 16 kinds of sigmai,σi+1,σi+2,σi+3Carrying out value calculation on a corresponding z value;
(2) by selecting from the 16 results of zThe value of the nearest 0 as the number of the four prediction iterations, i.e., i, z, and the corresponding σi,σi+1,σi+2,σi+3The predicted rotation direction S ═ σ as the current four prediction iterationsi,σi+1,σi+2,σi+3]Outputs to X and Y data paths, the X and Y data paths being responsive to the direction of rotation S [ [ sigma ] ] passed theretoi,σi+1,σi+2,σi+3]And calculating to obtain the x value and the y value of the result of the current four prediction iterations.
4. The method of claim 2, wherein the second implementation procedure of performing four iterative prediction calculations on the X data path, the Y data path, and the Z data path by using the four-prediction CORDIC algorithm in step S2 is as follows:
(1) traverse sigmai,σi+1,σi+2,σi+3Each takes the value of-1 or 1 to form 16 sigmai,σi+1,σi+2,σi+3Taking values for X data path, Y data path and Z data path separately for each sigmai,σi+1,σi+2,σi+3Carrying out parallel calculation on values, and simultaneously generating 16 groups of iteration results;
(2) selecting the value of the maximum approximate 0 of the 16 results of the Z data path as the value of the prediction iteration of the time, namely, the value of the maximum approximate 0, and then, taking the value of the sigma corresponding to the result of the value of the maximum approximate 0 as the value of the prediction iteration of the time, namely, the maximum approximate 0 of the 16 results of the Z data pathi,σi+1,σi+2,σi+3And directly taking the generated X value of the iteration result of the X data path and the generated Y value of the iteration result of the Y data path as the result of the current four prediction iterations.
5. An apparatus for performing a floating-point exponent operation, the apparatus comprising:
a preprocessing module: the method is used for exception handling and input value preprocessing of an input floating-point exponential function, and firstly, the input floating-point exponential function is processed by Mx 2ESplitting the floating point exponent function into a sign bit S, an exponent E and a mantissa M, detecting whether the floating point exponent function is abnormal or not according to the exponent E, and if the input value is not normalIn the presence of exceptions, assuming a floating-point exponential function of Mx 2EIs X, is Y, and is the mantissa M multiplied by (ln2)-1And moving the E bit to the left, and obtaining the mantissa part X by solving for 2X according to the convergence requirement of the CORDIC algorithm, thereby satisfying the formulaDefining 148-bit width num1 according to 2X, specifically: num1 ═ M × 2E-Y x ln2+ ln2, num1 shift-spliced by a multiplier to num 2;
an exponential function mantissa iteration module: the device is used for receiving num2 obtained by S1 preprocessing, performing four-time prediction iterative calculation on an X data path, a Y data path and a Z data path on an input value num2 through a four-time prediction CORDIC algorithm to obtain an X value and a Y value of 136 bits of output, inputting the X value and the Y value into an adder for operation to obtain a 2X operation result, and further obtaining a calculation result mantissa value X;
a floating point regularization module: the method is used for leading zero detection of the mantissa value X and the exponent value Y obtained by processing in steps S1 and S2, then converting the mantissa value X and the exponent value Y into a standard floating point format through shift operation, and finally combining the mantissa value X and the exponent value Y with a sign bit S to output in a normalized format.
6. The apparatus of claim 5, wherein the formula for calculating the four-prediction CORDIC algorithm in the exponential function mantissa iteration module is as follows:
wherein σi,σi+1,σi+2,σi+3Is a sign factor with the value of-1 or 1 and represents the predicted rotation direction of the current iteration, i represents the number of the four predicted iterations, and thetai,θi+1,θi+2,θ3Representing four angles of rotation, xi,yi,ziRepresents the ith four prediction iterations of X data path and Y data pathInitial value of Z data, xi+4,yi+4,zi+4And (4) representing the iteration results of the ith four prediction iterations of the X data path, the Y data path and the Z data path.
7. The apparatus of claim 6, wherein the iterative computation of four prediction iterations by the four prediction CORDIC algorithm for the X, Y, and Z data paths in the exponential function mantissa iteration module is implemented by:
(1) traverse sigmai,σi+1,σi+2,σi+3Each takes the value of-1 or 1 to form 16 sigmai,σi+1,σi+2,σi+3Value, Z data path according to 16 kinds of sigmai,σi+1,σi+2,σi+3Carrying out value calculation on a corresponding z value;
(2) selecting the value of the 16 results of z, which is most approximate to 0, as the value of the four prediction iterations, i.e., the value of the z, i.e., the value of the corresponding sigmai,σi+1,σi+2,σi+3The predicted rotation direction S ═ σ as the current four prediction iterationsi,σi+1,σi+2,σi+3]Outputs to X and Y data paths, the X and Y data paths being responsive to the direction of rotation S [ [ sigma ] ] passed theretoi,σi+1,σi+2,σi+3]And calculating to obtain the x value and the y value of the result of the current four prediction iterations.
8. The apparatus of claim 6, wherein the second implementation procedure of the iterative arithmetic of mantissas of exponential function mantissa module for performing four iterative computations of prediction on the X data path, the Y data path, and the Z data path by using four iterative predictive CORDIC algorithm is as follows:
(1) traverse sigmai,σi+1,σi+2,σi+3Each takes the value of-1 or 1 to form 16 sigmai,σi+1,σi+2,σi+3Taking values for each of the X, Y and Z data pathsσi,σi+1,σi+2,σi+3Carrying out parallel calculation on values, and simultaneously generating 16 groups of iteration results;
(2) selecting the value, which is most approximate to 0, of the 16 results of the Z data path as the numerical value, namely, the ite _ Z, of the four prediction iterations, and then, enabling the sigma corresponding to the result of the ite _ Z to bei,σi+1,σi+2,σi+3And directly taking the generated X value of the iteration result of the X data path and the generated Y value of the iteration result of the Y data path as the result of the current four prediction iterations.
9. The apparatus of claim 5, wherein the exponent function mantissa iteration module comprises: a fourth prediction Z data path calculation unit; a single iteration Z data path calculation unit; a four-prediction X data path calculation unit; a single iteration X data path calculation unit; a four-prediction Y data path calculation unit; a single iteration Y data path calculation unit; the iteration loop control unit is used for performing iteration loop control by using the counter and determining whether each iteration mode is four prediction iterations or single iteration; the lookup table unit is used for calculating a required iteration angle value according to the output of the iteration cycle control unit; the selector unit is used for selectively outputting the calculation result of the fourth-time prediction Z data path calculation unit and the calculation result of the single-time iteration Z data path calculation unit, selectively outputting the calculation result of the fourth-time iteration X data path calculation unit and the calculation result of the single-time iteration X data path calculation unit, and selectively outputting the calculation result of the fourth-time iteration Y data path calculation unit and the calculation result of the single-time iteration Y data path calculation unit.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011592456.3A CN112783469A (en) | 2020-12-29 | 2020-12-29 | Method and device for executing floating-point exponential operation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011592456.3A CN112783469A (en) | 2020-12-29 | 2020-12-29 | Method and device for executing floating-point exponential operation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112783469A true CN112783469A (en) | 2021-05-11 |
Family
ID=75753237
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011592456.3A Pending CN112783469A (en) | 2020-12-29 | 2020-12-29 | Method and device for executing floating-point exponential operation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112783469A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114201140A (en) * | 2021-12-16 | 2022-03-18 | 千芯半导体科技(北京)有限公司 | Exponential function processing unit, method and neural network chip |
-
2020
- 2020-12-29 CN CN202011592456.3A patent/CN112783469A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114201140A (en) * | 2021-12-16 | 2022-03-18 | 千芯半导体科技(北京)有限公司 | Exponential function processing unit, method and neural network chip |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105468331B (en) | Independent floating point conversion unit | |
US7921149B2 (en) | Division and square root arithmetic unit | |
US20210349692A1 (en) | Multiplier and multiplication method | |
KR100824189B1 (en) | Processing method and computer system for summation of floating point data | |
KR100241076B1 (en) | Floating- point multiply-and-accumulate unit with classes for alignment and normalization | |
US5993051A (en) | Combined leading one and leading zero anticipator | |
JPH0542011B2 (en) | ||
US6988119B2 (en) | Fast single precision floating point accumulator using base 32 system | |
JP5640081B2 (en) | Integer and multiply-add operations with saturation | |
US6728739B1 (en) | Data calculating device and method for processing data in data block form | |
CN111936965A (en) | Random rounding logic | |
JP3345894B2 (en) | Floating point multiplier | |
CN116400883A (en) | Floating point multiply-add device capable of switching precision | |
JP2585649B2 (en) | Division circuit | |
US20020129075A1 (en) | Apparatus and method of performing addition and rounding operation in parallel for floating-point arithmetic logical unit | |
US20040267853A1 (en) | Method and apparatus for implementing power of two floating point estimation | |
US5408426A (en) | Arithmetic unit capable of performing concurrent operations for high speed operation | |
US6847986B2 (en) | Divider | |
CN112783469A (en) | Method and device for executing floating-point exponential operation | |
CN116643718B (en) | Floating point fusion multiply-add device and method of pipeline structure and processor | |
US5170371A (en) | Method and apparatus for rounding in high-speed multipliers | |
CN116450085A (en) | Extensible BFLoat 16-point multiplication arithmetic unit and microprocessor | |
US20050188000A1 (en) | Adder | |
CN112860218B (en) | Mixed precision arithmetic unit for FP16 floating point data and INT8 integer data operation | |
CN112783470A (en) | Device and method for executing floating point logarithm operation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |