WO2006022048A1 - 演算方法および装置 - Google Patents
演算方法および装置 Download PDFInfo
- Publication number
- WO2006022048A1 WO2006022048A1 PCT/JP2005/007250 JP2005007250W WO2006022048A1 WO 2006022048 A1 WO2006022048 A1 WO 2006022048A1 JP 2005007250 W JP2005007250 W JP 2005007250W WO 2006022048 A1 WO2006022048 A1 WO 2006022048A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- value
- argument
- unit
- mantissa
- exponent
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/02—Digital function generators
- G06F1/03—Digital function generators working, at least partly, by table look-up
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/483—Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/14—Conversion to or from non-weighted codes
- H03M7/24—Conversion to or from floating-point codes
Definitions
- the present invention relates to an arithmetic technique, and more particularly to an arithmetic method and apparatus for executing an arithmetic operation on a value expressed in a floating-point format.
- Fixed-point numbers and floating-point numbers are generally used as representation techniques when computers and DSPs (Digital Signal Processors) handle numerical values. Since floating-point numbers require processing with respect to decimal points, the calculation speed of floating-point numbers tends to be slower than the calculation speed of fixed-point numbers that have a fixed decimal point at a specific position. However, floating-point numbers can change the position of the decimal point, so numbers can be represented with high precision when the absolute value is small, while numbers can be represented even when the absolute value is large. it can.
- the present invention has been made in view of such a situation, and an object of the present invention is to provide an arithmetic method for reducing the amount of arithmetic processing when performing arithmetic on a numerical value expressed in a floating-point format. It is to provide a method and apparatus.
- One embodiment of the present invention is an arithmetic device. This device is included in the input argument according to the input part that inputs the argument of the function expressed in the floating-point format including the exponent part and the mantissa part, and the conversion rule of the exponent part determined according to the function.
- a conversion unit that converts the exponent part a storage unit that preliminarily stores a value obtained by converting the mantissa part included in the argument according to a mantissa conversion rule determined according to the function, and among the input arguments
- a storage unit that preliminarily stores a value obtained by converting the mantissa part included in the argument according to a mantissa conversion rule determined according to the function, and among the input arguments
- the storage unit may be defined as the number of indexes in the numerical power table in which 1 is added to the total number of values that can be taken by at least the approximate value of the mantissa part of the argument.
- the conversion rule of the exponent part determined according to the function and “the conversion rule of the mantissa part determined according to the function” are rules determined according to the function to be calculated. This is a rule for converting a part or a mantissa part.
- the number of indexes is defined by the number obtained by adding 1 to the number that can be taken by the index even when the processing amount of the operation is reduced by using the table. Therefore, values larger and smaller than the value approximated by a part of the argument can be prepared as an index, and the accuracy of the return value can be improved.
- the function corresponding to the argument input in the input unit is an arithmetic operation of the reciprocal of the argument
- the storage unit stores the value table index value obtained by approximating the mantissa part of the argument by the upper bits of the mantissa part.
- the acquisition unit derives a value obtained by approximating the mantissa part of the input argument by the upper bits of the mantissa part and a value obtained by adding 1 to the approximated value as a plurality of indentations of the table.
- the value obtained by approximating the mantissa part of the argument by the upper bits of the mantissa part corresponds to a value obtained by extracting a predetermined bit from the plurality of bits constituting the mantissa part.
- Original The number of bits in the mantissa part may be different from the number of approximated bits.
- the power bit array contains the same pattern of bits, so this is included in the approximation.
- the function corresponding to the argument input in the input unit is the calculation of the square root of the argument, and the storage unit stores the least significant bit value of the exponent part of the argument and the mantissa part of the argument.
- the acquisition part is the lowest bit value of the exponent part of the input argument and the mantissa part of the input argument May be derived as a plurality of indexes in the table by a value formed by approximating the higher order bits of the mantissa part and a value obtained by adding 1 to the formed value.
- the least significant bit value of the exponent part of the argument and the value formed by approximating the mantissa part of the argument by the upper bits of the mantissa part are a plurality of the mantissa parts Of these bits, the value obtained by extracting a predetermined bit from the upper bit and the lowest bit value of the exponent part are combined.
- the approximation may be as described above.
- Another aspect of the present invention is a calculation method.
- This method is included in the input argument by the step of inputting the argument of the function expressed in the floating-point format including the exponent part and the mantissa part, and the conversion rule of the exponent part determined according to the function.
- a table in which values converted from the mantissa included in the argument are pre-stored according to the step of converting the exponent part and the mantissa conversion rule determined according to the function.
- a plurality of indexes of the table are derived, respectively, and a step of acquiring each of the table force values based on the derived indexes and a converted exponent part are obtained.
- the table to be referred to in the obtaining step may be defined as the number of numerical power indexes obtained by adding 1 to the total number of values that can be taken by the approximate value of at least the mantissa part of the argument. Good.
- Yet another embodiment of the present invention is a program.
- This program uses a predetermined interface to input function arguments expressed in floating-point format including exponent and mantissa, and conversion rules for exponents determined according to the function.
- ,input To refer to a table in which the converted value of the mantissa part included in the argument is stored in the memory in advance by the step of converting the exponent part included in the argument and the mantissa conversion rule determined according to the function.
- a plurality of indexes of the table are derived, and the table power is obtained by accessing the memory based on the derived indexes.
- the table to be referred to in the obtaining step is defined as the number of numerical power indexes obtained by adding 1 to the total number of values that can be taken by at least the approximate value of the mantissa part of the argument. Let the computer do that.
- FIG. 1 is a diagram showing a format of a floating-point number according to Embodiment 1 of the present invention.
- FIG. 2 is a diagram showing a configuration of an image display apparatus according to Embodiment 1 of the present invention.
- FIG. 3 is a diagram showing a data structure of a table stored in the storage unit of FIG.
- FIG. 4 is a diagram showing a configuration of a derivation unit in FIG.
- FIG. 5 is a diagram schematically showing a procedure of arithmetic processing by the arithmetic device of FIG. 2.
- FIGS. 6 (a)-(b) are diagrams showing the relationship between the approximate solution and the true solution by the arithmetic unit of FIG.
- FIG. 7 is a diagram showing a configuration of a derivation unit according to Embodiment 2 of the present invention.
- FIG. 8 is a diagram schematically showing a procedure of arithmetic processing by an arithmetic device including the deriving unit in FIG.
- FIG. 9 is a diagram showing a data structure of a table stored in the storage unit according to the third embodiment of the present invention. It is.
- FIG. 10 is a diagram schematically showing a procedure of arithmetic processing performed by an arithmetic device including the storage unit of FIG. 9.
- Embodiment 1 of the present invention is an image display device that performs lighting calculations and image processing calculations to generate an image to be displayed on a display device such as a display, and in particular, calculates the reciprocal of a floating-point number therein. It is related to the calculation device.
- the arithmetic unit inputs a floating point number composed of a sign part, an exponent part, and a mantissa part as arguments, and separates it into a sign part, an exponent part, and a mantissa part.
- the arithmetic unit performs subtraction between the value stored in advance and the exponent part. This subtraction corresponds to the reciprocal.
- a value obtained by converting the mantissa part so as to correspond to the reciprocal is stored in advance as an entry in the table.
- the index of this table is formed by the upper bits of the exponent part, but the total number of indexes is defined by the number that can be expressed by the upper bits of the exponent part plus 1.
- the arithmetic unit extracts the high-order bits of the mantissa part in order to generate an index, and further adds 1 to the extracted high-order bits to generate the second index. After that, referring to the table, the values corresponding to the two entries are obtained by the generated two indexes.
- the arithmetic unit synthesizes the sign part and the subtracted exponent part for each of the values corresponding to the two entries to generate two temporary return values.
- the arithmetic unit performs subtraction on the exponent part and performs conversion based on the table on the mantissa part.
- multiplication and division processing with a large amount of processing is interpolation processing. Therefore, the entire processing amount can be reduced.
- the table index is the upper few bits of the mantissa
- the table size can be reduced.
- the interpolation process is executed while reducing the scale of the table, a decrease in accuracy can be suppressed.
- the total number of indexes is defined by a number that can be represented by the upper few bits of the exponent part plus 1. Therefore, the upper few bits of the exponent part included in the input argument is the maximum value. Even if it exists, the arithmetic unit can perform the interpolation process.
- FIG. 1 shows a floating-point number format according to the first embodiment of the present invention.
- the sign part 10 reflects the sign of the numerical value and is composed of a 1-bit unsigned integer.
- the exponent part 12 represents an integer value that is a power of 2, and consists of an 8-bit unsigned integer.
- the mantissa part 14 represents a value from 1.0 to 2.0, and consists of a 23-bit unsigned integer. As shown in the figure, if the sign part 10 is s, the exponent part 12 is e, and the mantissa part 14 is m, these represent the following floating-point numbers.
- the next X is the function argument.
- converted exponent part a value obtained by converting the exponent part 12 (hereinafter referred to as “converted exponent part”) is indicated by e ′, and a value obtained by converting the mantissa part 14 (hereinafter referred to as “converted mantissa part”) is indicated by m ′. If shown, these values are shown as follows.
- the conversion exponent part and the conversion mantissa part are derived for the exponent part 12 and the mantissa part 14 of the argument X, the reciprocal lZx of X can be calculated.
- the conversion exponent part can be derived by 8-bit subtraction, but the conversion mantissa part cannot be derived by a simple arithmetic unit, so a table is used. At this time, if the 23 bits of the mantissa part 14 are used as the table index, the table size becomes large, so the upper 8 bits of the mantissa part 14 are used as the table. This is shown as follows.
- table0 [m [0: 7]] is a value obtained from the table using the upper 8 bits of the mantissa part 14 as an index, that is, the value of the entry.
- the value of the entry corresponds to the converted mantissa part.
- the size of one entry is 23 bits.
- the i-th entry, table0 [i] stores the value of (2 23 -i) / (2 23 + i) * 2 23 as an integer.
- the entry value is set to 2 23 -1 if the value entered in the entry exceeds 2 23 -1.
- the precision of the mantissa part is about 8 bits. Therefore, in this embodiment, a table having 257 entries with 1 added to the number that can be represented by the index is prepared in advance, and two adjacent bits are calculated from the upper 8 bits of the mantissa part 14 and the value obtained by adding 1 to the table. Generate index for To do. In addition, the values of the two entries are extracted from the two indexes, the values of the two entries are interpolated, and a more accurate approximate solution is derived.
- the entry value obtained from the smaller index is referred to as conversion mantissa part A
- the entry value obtained from the larger index is referred to as conversion mantissa part B.
- a temporary return value corresponding to the converted mantissa part A (hereinafter referred to as” temporary return value A ") is converted to a.
- the temporary return value (hereinafter referred to as “temporary return value B”) b corresponding to the mantissa part B is expressed as follows.
- 1 Zx is represented as follows.
- the conversion values e ′ and m ′ are in an appropriate range.
- FIG. 2 shows a configuration of the image display apparatus 100 according to Embodiment 1 of the present invention.
- the image display device 100 includes an image processing unit 20, an image output unit 22, an arithmetic device 24, and a storage unit 38.
- the arithmetic device 24 includes an input unit 26, a separation unit 28, a first conversion unit 30, a second conversion unit 32, an acquisition unit 34, and a deduction unit 36.
- the signal also includes a conversion code part 200, a conversion exponent part 202, a conversion mantissa part 204, and a return value 208.
- the image processing unit 20 performs lighting calculation and image processing calculation in order to generate an image. For example, the image processing unit 20 performs a lighting calculation to generate a sphere by a surface model in computer graphics or the like. Such lighting calculations require reciprocal calculations to derive a normalized vector. When it is necessary to calculate the reciprocal number, the image processing unit 20 outputs an argument to the arithmetic unit 24. Further, if a return value, that is, a calculated reciprocal number is input from the arithmetic unit 24, the image processing unit 20 generates an image using the reciprocal number.
- the image output unit 22 displays the image generated by the image processing unit 20.
- the image output unit 22 is configured by a display or the like.
- the input unit 26 inputs function arguments expressed in a floating-point format including the sign unit 10, the exponent unit 12, and the mantissa unit 14 as shown in FIG.
- the argument is input from the image processing unit 20.
- the separation unit 28 separates the input argument into a sign part 10, an exponent part 12, and a mantissa part 14. Separation unit 28 outputs sign unit 10 to first conversion unit 30, outputs exponent unit 12 to second conversion unit 32, and outputs mantissa unit 14 to acquisition unit 34 and derivation unit 36.
- the first conversion unit 30 performs predetermined conversion on the encoding unit 10. When the function to be calculated is the reciprocal, as shown in Equation 5, the sign part 10 in the argument and the sign part 10 in the return value have the same value. That is, the first conversion unit 30 outputs the input encoding unit 10 to the deriving unit 36 as it is.
- the encoding unit 10 output from the first conversion unit 30 to the deriving unit 36 is referred to as a conversion encoding unit 200.
- the second conversion unit 32 converts the exponent part included in the exponent part 12 according to the exponent part conversion rule determined according to the function.
- the conversion rule corresponds to the subtraction process of subtracting the exponent 12 from “253” as shown in Equation 4, Equation 7, and Equation 9.
- the code part 10 is “254” or more, it is “0”.
- the result of the subtraction is output to the derivation unit 36 as the conversion index unit 202.
- the conversion exponent part 202 corresponds to e ′ in Equation 4, Equation 7, and Equation 9.
- the storage unit 38 stores in advance as a table the values obtained by converting the mantissa part 14 according to the mantissa part conversion rules determined according to the function.
- the conversion rule corresponds to the calculation formula of m 'in Equation 4.
- the table stored in the storage unit 38 uses the upper 8 bits of the mantissa part 14 as an index. That is, an index approximates the argument.
- the total number of values that can be taken by the upper 8 bits of the mantissa part 14 is 256.
- the number of table indexes is 257, which is 256 plus 1.
- the storage unit 38 takes m in Expression 4 as an entry.
- FIG. 3 shows the data structure of the table stored in the storage unit 38.
- Index is defined from “0” to “256”, that is, 257 items.
- entries corresponding to “indexes” are stored from “C0” to “C256”.
- C0 is a value obtained by substituting “0” for m with respect to m in Equation 4
- C1 is a value obtained by substituting “1” for m.
- the obtaining unit 34 derives the index of the table stored in the storage unit 38 by extracting the upper 8 bits from the 23 bits constituting the mantissa unit 14, that is, by approximating the mantissa unit 14 with the upper 8 bits. . Further, the acquisition unit 34 derives a number obtained by adding 1 to the mantissa part 14 approximated by the upper 8 bits in order to derive two indexes. As a result, the acquisition unit 34 derives two indexes. Thereafter, the acquisition unit 34 acquires the values of the two entries from the table stored in the storage unit 38 based on the two indexes. That is, in the table of FIG.
- the values of the two entries corresponding to the two indexes that is, the converted mantissa part 204A and the converted mantissa part 204B are acquired.
- the acquired two entry values correspond to m ′ in Expression 7 and m ′′ in Expression 9.
- the values of the two entries are output as conversion mantissa part 204 to deriving part 36.
- Storage part 38 The index may correspond to an address when is composed of a storage medium such as a memory.
- the deriving unit 36 adds a conversion sign unit 200 and a conversion exponent unit 202 to correspond to each of the two conversion mantissa units 204 acquired by the acquisition unit 34, and has the same floating-point format as the argument.
- the temporary return value A and the temporary return value B expressed by are derived.
- the provisional return value A corresponds to a in Equation 8
- the provisional return value B corresponds to b in Equation 10.
- the floating-point format similar to the argument is the format shown in FIG. 1.
- the conversion code part 200, the conversion exponent part 202, and the conversion mantissa part 204 are arranged in this order from the left.
- the deriving unit 36 performs interpolation on the temporary return value A and the temporary return value B while using the lower 15 bits of the mantissa part 14 to derive the return value 208 of the function.
- the lower 15 bits of the mantissa part 14 are converted to a floating point number c as shown in Equation 11.
- Interpolation by a, b, and c is executed by linear interpolation as shown in Equation 12.
- This configuration can be realized in hardware by an arbitrary computer's CPU, memory, and other LSI, and in software, it can be realized by a program loaded with memory. Depicts functional blocks realized by. Therefore, those skilled in the art will understand that these functional blocks can be realized in various forms by hardware only, software only, or a combination thereof.
- FIG. 4 shows a configuration of the derivation unit 36.
- the derivation unit 36 includes a synthesis unit 40 and an interpolation unit 42.
- the synthesis unit 40 generates a temporary return value by synthesizing the conversion code unit 200, the conversion exponent unit 202, and the conversion mantissa unit 204. Since the conversion mantissa part 204 includes two values, a provisional return value A and a provisional return value B are generated so as to correspond to each value. These generations are simply processes for arranging the conversion code part 200, the conversion exponent part 202, and the conversion mantissa part 204 as shown in FIG.
- the combining unit 40 outputs the provisional return value A and the provisional return value B to the interpolation unit 42.
- the interpolation unit 42 performs linear interpolation on the tentative return value A and the tentative return value B to generate a return value 208.
- the lower 15 bits of mantissa 14 are floated Use the value converted to a number.
- the provisional return value A, provisional return value B, and converted value are indicated by a, b, and c in Equation 12
- return value 208 is indicated by lZx in Equation 12.
- c is an index showing how close the upper 8 bits of the mantissa part 14 used for the mantissa part 14 force index are. Also, if the value power of the upper 8 bits of the mantissa part 14 is also distant, it is equivalent to the value of another index.
- FIG. 5 schematically shows the procedure of arithmetic processing by the arithmetic device 24.
- the input unit 26 inputs a 32-bit argument (S10).
- the separation unit 28 separates the input argument. That is, the 1-bit sign part 10 is extracted (S12), the 8-bit exponent part 12 is extracted (S14), and the upper 8 bits of the mantissa part 14 are extracted (S16).
- the first conversion unit 30 outputs the encoding unit 10 as the conversion encoding unit 200.
- the second conversion unit 32 inputs the fixed value “253” (S20), performs subtraction with the fixed value and the exponent part 12 (S22), and outputs the conversion exponent part 202.
- the acquisition unit 34 generates two indexes from the upper 8 bits of the mantissa part 14 (S24), acquires the conversion mantissa part 204A from the storage unit 38 based on the index (S26), and also converts the mantissa part 204B. Obtain (S28).
- the synthesizing unit 40 generates a conversion code unit 200, a conversion exponent unit 202, a conversion mantissa unit 204A, and a temporary return value A (S30), as well as a conversion code unit 200, a conversion exponent unit 202, and a conversion mantissa.
- the part 204 B force also generates a provisional return value B (S32).
- the interpolation unit 42 extracts the lower 15 bits of the mantissa part 14 (S18), and converts the lower 15 bits of the mantissa part 14 into a floating-point number (S34). Further, the interpolation unit 42 interpolates the temporary return value A and the temporary return value B based on the converted floating point number (S36), and generates a return value. Finally, a 32-bit return value is output (S38).
- FIGS. 6 (a) to 6 (b) show the approximate solution by the arithmetic unit 24, that is, the relationship between lZx in Equation 12 and the true solution. Based on these figures, this embodiment will explain the reason why the accuracy is improved when interpolation is performed.
- FIG. 6 (a) shows an approximate solution by interpolation processing in this embodiment.
- the sample point (A) and sample point (B) on the horizontal axis are dissimilar values corresponding to the index of the force table.
- the vertical axis shows the entry corresponding to the index and the approximate solution.
- the two-point force at Sample point (A) and Sample point (B) The approximate solution between A and B is calculated by linear interpolation at two points referring to the adjacent entry.
- the input is negative, only the origin is targeted, and the essence does not change, so the description is omitted.
- the second derivative of the true solution is shown as follows.
- the mantissa part 14 of the floating-point number is an 8-bit index, and the positional relationship between the true solution and the entry in this case is as shown in FIG. 6 (b).
- Figure 6 (a) is not a case.
- the error in the interval between Sample point (A) and Sample point (B) is minimized at the center. That is, the accuracy of the approximate solution in the interval between Sample point (A) and Sample point (B) is not less than the accuracy at Sampling point.
- the above explanation also applies to forces other than lZx.
- the table index is generated with a value approximating the mantissa, the size of the table can be reduced.
- the number of indexes is defined as the total number of approximate values that can be taken as 1, the number of indexes is specified, so multiple indexes can be generated for all approximate values, improving the accuracy of the return value. it can.
- the exponent part since the exponent part only performs subtraction processing, the amount of processing can be reduced.
- the return value is derived by executing the interpolation processing for a plurality of values acquired from the table, the accuracy of the return value can be improved. Further, the reciprocal calculation is executed with high accuracy while suppressing the processing amount.
- the amount of processing can be reduced.
- the calculation of the floating point number can be executed by one linear interpolation calculation, the processing amount can be reduced.
- reciprocals with at least 17-bit precision can be calculated.
- the table can also perform reciprocal calculations on 23-bit entries with 257 indexes.
- the bit size in one entry can be reduced to adjust the trade-off between accuracy and SRAM (Static Random Access Memory) capacity.
- the second embodiment of the present invention relates to an arithmetic unit that calculates the reciprocal of a floating-point number argument, as in the first embodiment.
- the arithmetic device acquires values corresponding to two entries, that is, two conversion mantissa parts by using two indexes while referring to the table.
- the arithmetic unit interpolates the two converted mantissa parts, and then combines the interpolated value, the converted code part, and the converted exponent part to generate a return value. To do. In other words, the order of interpolation and synthesis is different from Example 1.
- the image display device 100 according to the second embodiment is the same type as the image display device 100 shown in FIG.
- FIG. 7 shows a configuration of the deriving unit 36 according to Embodiment 2 of the present invention.
- the derivation unit 36 includes an interpolation unit 42 and a synthesis unit 40.
- the derivation unit 36 in FIG. 7 differs from the derivation unit 36 in FIG. 4 in the order of the synthesis unit 40 and the interpolation unit 42.
- the interpolation unit 42 performs an interpolation operation on the two converted mantissa units 204 input from the acquisition unit 34 (not shown). That is, the conversion mantissa part 204A and the conversion mantissa part 204B are linearly interpolated.
- the interpolation unit 42 uses a value obtained by converting the lower 15 bits of the mantissa part 14 into a floating-point number in order to perform linear interpolation. Specifically, linear interpolation is performed by replacing a in equation 12 with m 'and b with m ", and the value instead of 1 / x in equation 12 (hereinafter referred to as" interpolated mantissa part ”) is can get. Since the properties such as c are the same as those in Example 1, the description thereof is omitted. Further, the interpolation unit 42 outputs the interpolation mantissa part to the synthesis unit 40.
- the combining unit 40 generates a return value 208 by combining the conversion code unit 200, the conversion exponent unit 202, and the interpolation mantissa unit. These generations are simply processing of arranging the conversion code part 200, the conversion exponent part 202, and the interpolated mantissa part as shown in FIG.
- the synthesizer 40 returns 20 Output 8
- FIG. 8 schematically shows the procedure of the arithmetic processing by the arithmetic device 24.
- Step 50 to step 68 correspond to step 10 to step 28 in FIG.
- the interpolating unit 42 interpolates the converted mantissa part A and the converted mantissa part B using the lower 15 bits of the mantissa part 14 (S70), generates an interpolated mantissa part, and outputs it to the synthesizer 40.
- the synthesizing unit 40 synthesizes the conversion sign unit 200, the conversion exponent unit 202, and the interpolated mantissa part to generate a return value of the function (S72). Further, a 32-bit return value is output (S74).
- the same effect as in the first embodiment can be obtained.
- the amount of processing can be reduced because the interpolation calculation is performed on the converted mantissa part having a smaller number of bits than the temporary return value. Further, since the processing amount is small, the processing speed is increased. In addition, power consumption can be reduced.
- the third embodiment of the present invention relates to an arithmetic unit that calculates the reciprocal of the square root for an argument of a floating-point number, unlike the previous embodiments.
- the arithmetic device separates the argument into the sign part, the exponent part, and the mantissa part, and also derives the converted exponent part by subtracting the exponent part power, and the conversion mantissa part of the table Store them as entries, and obtain the two converted mantissa parts from the two indexes.
- two temporary return values are derived by combining the sign part and the exponent part with the two converted mantissa parts, respectively. Finally, interpolation is performed on the two temporary return values to derive the return value.
- the number of bits in the index is 8 bits.
- the content is a combination of the least significant 1 bit of the exponent part and the most significant 7 bits of the mantissa part.
- the value of the conversion mantissa part varies depending on whether the exponent part is even or odd. Therefore, by inserting a bit indicating whether the exponent part is even or odd into the most significant bit of the index and making the value corresponding to each index as an entry, a table that takes into account even-oddness of the exponent part is realized. it can.
- the index is a continuous value. Nevertheless, entries may not be consecutive values. In other words, the minimum bit of the index is 0 and the lower bits are all 1s, and the minimum bit of the next index is 1 and the lower bits are all 0s. Since these are the values of the conversion mantissa corresponding to the even and odd exponent parts, respectively, as described above, they are discontinuous.
- the arithmetic unit according to this embodiment devises processing in such a discontinuous case, and outputs a normal value while using the same table. As in the previous examples, the number of indexes is the number that can be represented by 8 bits plus 1.
- the formula is developed by classifying the case where the exponent part 12 is an even number and the odd number.
- the exponent part 1 2 is an even number, it is as follows.
- e ′ as the conversion exponent part 202 and m as the conversion mantissa part 204 are defined as follows.
- e ' 190-e / 2
- e ′ that is the conversion exponent part 202 and m that is the conversion mantissa part 204 are defined as follows.
- the lower 1 bit of the exponent 12 is used to discriminate between even and odd numbers. .
- it indicates whether the upper 1 bit of the 8 bits forming the index is an even number or an odd number. Therefore, when the number is even, the index from “0” to “127” and the corresponding entry are used. If the number is odd, the index from “128” to “255” and the corresponding entry are used.
- it is stored in the value table of the entries divided in advance for even numbers and odd numbers. In other words, the values are stored in half of the value table of the conversion mantissa part 204 expressed as even numbers and odd numbers as shown in Equations 17 and 20.
- a table having 257 indexes and entries is prepared, two adjacent entries are acquired from the 8-bit index, and the values of these two entries are used. To obtain an approximate solution.
- the provisional return value A is indicated as follows.
- the conversion exponent part 202 and the conversion mantissa part 204 are shown as follows.
- table0 [e [7] m [0: 6]] is the value obtained from the table, that is, the entry value, using the lower 1 bit of the exponent part 12 and the upper 7 bits of the mantissa part 14 as indexes. is there.
- the value of the entry corresponds to the conversion mantissa part 204.
- the size of one entry is 23 bits.
- tableO [i] stores the value of m 'in Equation 17 or Equation 20 as an integer!
- provisional return value B is expressed as follows.
- the conversion index part 202 is shown as follows.
- the conversion mantissa part 204 is shown as follows.
- the arithmetic device is configured to execute the processing of Expression 22 to Expression 29.
- Equation 31 The sum of Equation 31 and Equation 34 is as follows.
- e ′ that is the conversion exponent part 202 can be expressed by an 8-bit unsigned integer
- m ′ that is the conversion mantissa part 204 can be expressed by a 23-bit unsigned integer.
- the image display device 100 according to the second embodiment is the same type as the image display device 100 shown in FIG. However, some configurations and components have different functions. Here, we will focus on the differences.
- the separation unit 28 separates the input argument into a sign part 10, an exponent part 12, and a mantissa part 14.
- the separation unit 28 outputs the lower 1 bit of the exponent part 12 and the mantissa part 14 to the acquisition unit 34.
- the signal line for outputting the lower 1 bit of the exponent part 12 from the separation part 28 to the acquisition part 34 is not shown in FIG.
- the first conversion unit 30 converts the encoding unit 10 into a positive value and outputs the converted value to the deriving unit 36 as a conversion encoding unit 200.
- the second conversion unit 32 executes the calculation of Expression 25 or Expression 26. Unlike the first embodiment, this calculation includes division by two. However, since division by 2 can be realized by bit shift, the substantial increase in the amount of processing is small.
- the storage unit 38 stores the mantissa part 14 according to the mantissa part conversion rule determined according to the function.
- the converted value is stored in advance as a table.
- the function is the reciprocal of the square root, so the transformation rule corresponds to the formula of m in Equation 17 or m in Equation 20. That is, if the lower 1 bit of the exponent part 1 2 is “0”, m in Expression 17 is an entry, and if the lower 1 bit of the exponent part 12 is “1”, m in Expression 20 is an entry.
- the former corresponds to the case where the exponent part 12 is an even number
- the latter corresponds to the case where the lower 1 bit of the exponent part 12 is an odd number.
- the table stored in the storage unit 38 uses the lower 1 bit of the exponent part 12 and the upper 7 bits of the mantissa part 14 as indexes so that at least the mantissa part 14 is included.
- the index approximates the argument by these values.
- the number of table indexes is 257, which is 256 plus 1.
- FIG. 9 shows a data structure of a table stored in the storage unit 38 according to Embodiment 3 of the present invention.
- “Index” and “Entry” in FIG. 9 correspond to “Index” and “Entry” in FIG. 3, respectively.
- the index from “0” to “127” corresponds to the case where the lower 1 bit of the exponent 12 is 0, and the corresponding entry corresponds to m in Equation 17.
- the index of “128” force “256” corresponds to the case where the lower 1 bit of the exponent 12 is 1, and the corresponding entry corresponds to m in Equation 20.
- the acquisition unit 34 derives two indexes. Thereafter, the acquisition unit 34 acquires the values of the two entries for the table force stored in the storage unit 38 based on the two indexes. In other words, in the table in Fig. 3, two entries corresponding to the two indexes are obtained. The two acquired entries correspond to m ′ in Expression 7 and m ′′ in Expression 9. The two entries are output as conversion mantissa part 204 to deriving part 36.
- the acquisition unit 34 combines the upper 7 bits of the mantissa part 14 and the lower 1 bit of the exponent part 12; that is, at least approximates the mantissa part 14 to store the table stored in the storage unit 38.
- the index of is derived.
- the acquisition unit 34 derives a number obtained by adding 1 to the above-described index in order to derive two indexes. After that, the process of obtaining the two converted mantissa parts 204 from the storage unit 38 based on the two derived indexes is the same as that in the first embodiment, and thus the description thereof is omitted.
- the deriving unit 36 generates a temporary return value A and a temporary return value B from the conversion code unit 200, the conversion exponent unit 202, and the two conversion mantissa units 204. Further, the provisional return value A and the provisional return value B are interpolated to generate a return value 208.
- the process of generating a provisional return value corresponds to Expression 22 and Expression 24, and the interpolation process follows Expression 28 and Expression 29. Since these processes are the same as those in the first embodiment, description thereof is omitted.
- the acquisition unit 34 when the values of the two indexes are "127" and "128", that is, the values of the entries stored in the storage unit 38 corresponding to the two indexes are discontinuous.
- the second conversion unit 32 Based on the predetermined instruction, the second conversion unit 32 also derives a value obtained by subtracting a predetermined value from the conversion index unit 202. This corresponds to Equation 25.
- the deriving unit 36 generates the temporary return value A as usual.
- the provisional return value B is generated as the conversion index part 202 using a value obtained by subtracting 1 from the conversion index part 202.
- a return value 208 is derived based on the provisional return value A and the provisional return value B.
- the value of the entry “127” in FIG. 9 is “0x4030”, and the value of the entry “128” is “0x7 hidden”. If the entry “128” is “0x000000”, the force that can be used to derive the temporary return value B using Equation 26. In fact, another value “0x7 concealment” for odd numbers is stored, so the index part Is approximated by lowering by 1. In other words, the following relationship is used.
- FIG. 10 schematically shows the procedure of the arithmetic processing by the arithmetic device 24.
- the input unit 26 inputs a 32-bit argument (S100).
- the separation unit 28 separates the input argument. That is, the 1-bit sign part 10 is extracted (S102), the 8-bit exponent part 12 is extracted (S104), and the lower 1 bit of the exponent part 12 and the upper 7 bits of the mantissa part 14 are extracted (S106).
- the first conversion unit 30 converts the code unit 10 into a positive value (S110) and outputs it as the conversion code unit 200.
- the second conversion unit 32 inputs a fixed value “380” (S112), performs an integer operation such as Equation 26 using the fixed value and the exponent part 12 (S114), and outputs the conversion exponent part 202. .
- the explanation for the integer operation as shown in Equation 25 is omitted.
- the acquisition unit 34 generates two indexes from the lower 1 bit of the exponent part 12 and the upper 7 bits of the mantissa part 14 (S116), and based on these indexes.
- the conversion mantissa part 204A is acquired from the storage unit 38 (SI 18), and the conversion mantissa part 204B is also acquired (S120).
- the deriving unit 36 generates a conversion code unit 200, a conversion exponent unit 202, a conversion mantissa unit 204, a conversion mantissa unit 204 A force also generates a temporary return value A (S122), a conversion code unit 200, a conversion exponent unit 202, a conversion mantissa unit 20 4B force also generates a temporary return value B (S124). Further, the deriving unit 36 extracts the lower 16 bits of the mantissa part 14 (S108), and converts the lower 16 bits of the mantissa part 14 into a floating-point number (S126). The deriving unit 36 interpolates the temporary return value A and the temporary return value B based on the converted floating-point number (S128), and generates a return value. Finally, a 32-bit return value is output (S130).
- the size of the table can be reduced.
- the number of value power indexes with 1 being added to the total number of approximate values that can be taken multiple indexes can be generated for all indexes, and the accuracy of the return value of the inverse of the square root Can be improved.
- the exponent part since the exponent part only performs subtraction processing and bit shift, the amount of processing can be reduced.
- the return value is derived by performing the interpolation process on a plurality of values obtained from the table, the accuracy of the return value of the inverse square root can be improved.
- multiplication is executed only for interpolation processing, the amount of processing can be reduced.
- calculation of the reciprocal of the square root can be executed with high accuracy while suppressing the processing amount.
- the conversion mantissa part can be stored in one table regardless of whether the exponent part is even or odd.
- the continuity of the two values can be maintained because processing is performed to reduce the value of the conversion index part. It is also possible to calculate the reciprocal of the square root with an accuracy of at least 16 bits. Also, based on a table with 23-bit entries with 257 indexes, the reciprocal of the square root can be calculated.
- Embodiment 3 of the present invention relates to an arithmetic unit that calculates a square root for a floating-point number argument.
- the arithmetic device according to the fourth embodiment has the same configuration as the arithmetic device according to the third embodiment. Thus, the square root can be calculated.
- provisional return value A is expressed as follows.
- the conversion exponent part 202 and the conversion mantissa part 204 are shown as follows.
- provisional return value B is expressed as follows.
- the conversion index part 202 is shown as follows.
- the conversion mantissa part 204 is shown as follows.
- the image display device 100 according to the fourth embodiment is the same type as the image display device 100 according to the third embodiment.
- the difference between the two is that the second conversion unit 32 corresponds to the equations 41 and 42 and the value force equation 37 of the entry stored in the storage unit 38. Therefore, the description of the image display device 100 is omitted.
- the square root calculation can be performed while the effects described in the third embodiment are obtained.
- the second conversion unit 32 executes a subtraction process between the fixed value and the exponent unit 12.
- the present invention is not limited to this.
- the second conversion unit 32 may execute addition processing.
- the sign of the exponent part 12 is inverted and added to the fixed value.
- the configuration of the second conversion unit 32 varies. In other words, the amount of processing, such as multiplication processing, must be high!
- the acquisition unit 34 generates two indexes, and acquires two conversion mantissa parts 204 from the storage unit 38 based on the two indexes.
- the acquisition unit 34 may generate a plurality of indexes and acquire the plurality of converted mantissa units 204 from the storage unit 38 based on the plurality of indexes.
- the processing power of the interpolation is modified so as to correspond to a plurality of conversion mantissa parts 204 or a plurality of temporary return values. According to this modification, the accuracy of the approximate value can be increased. That is, two or more conversion mantissa parts 204 may be used.
- Embodiments 1 to 4 of the present invention the reciprocal, the reciprocal of the square root, and the square root performed by the arithmetic unit 24 have been described.
- the present invention is not limited to this.
- the arithmetic unit 24 may calculate other functions.
- the present invention is provided for various functions. it can. In other words, it may be an operation on an argument expressed by a floating-point number.
- the arithmetic unit 24 executes any one of the reciprocal, the reciprocal of the square root, and the square root.
- the calculation device 24 may be capable of calculating a plurality of functions.
- entries corresponding to a plurality of functions are stored in the storage unit 38 in advance, and an instruction indicating the type of function to be calculated is input to the calculation device 24.
- the computing device 24 specifies a processing method such as the first conversion unit 30 according to the instruction, and executes the computation. According to this modification, it is possible to cope with various functions while suppressing an increase in the circuit scale of the arithmetic unit 24. In other words, an operation on an argument expressed by a floating point number may be used.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computational Mathematics (AREA)
- Computing Systems (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Nonlinear Science (AREA)
- Complex Calculations (AREA)
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/661,375 US8694567B2 (en) | 2004-08-27 | 2005-04-14 | Method and apparatus for arithmetic operation on a value represented in a floating-point format |
EP05730631A EP1783601A1 (en) | 2004-08-27 | 2005-04-14 | Operation method and apparatus |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2004248395A JP2006065633A (ja) | 2004-08-27 | 2004-08-27 | 演算方法および装置 |
JP2004-248395 | 2004-08-27 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2006022048A1 true WO2006022048A1 (ja) | 2006-03-02 |
Family
ID=35967269
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2005/007250 WO2006022048A1 (ja) | 2004-08-27 | 2005-04-14 | 演算方法および装置 |
Country Status (4)
Country | Link |
---|---|
US (1) | US8694567B2 (ja) |
EP (1) | EP1783601A1 (ja) |
JP (1) | JP2006065633A (ja) |
WO (1) | WO2006022048A1 (ja) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100640807B1 (ko) * | 2005-08-09 | 2006-11-02 | 엘지전자 주식회사 | 선형보간을 이용한 루트연산방법 및 이를 구현할 수 있는이동통신단말기 |
US8346831B1 (en) * | 2006-07-25 | 2013-01-01 | Vivante Corporation | Systems and methods for computing mathematical functions |
JP5372581B2 (ja) * | 2009-04-16 | 2013-12-18 | 日置電機株式会社 | 測定値変換装置および測定値変換方法 |
GB2483902B (en) * | 2010-09-24 | 2018-10-24 | Advanced Risc Mach Ltd | Vector floating point argument reduction |
US8745111B2 (en) | 2010-11-16 | 2014-06-03 | Apple Inc. | Methods and apparatuses for converting floating point representations |
US9015217B2 (en) | 2012-03-30 | 2015-04-21 | Apple Inc. | Transcendental and non-linear components using series expansion |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS61292742A (ja) * | 1985-06-20 | 1986-12-23 | Matsushita Electric Ind Co Ltd | 演算装置 |
JPS6278629A (ja) * | 1985-10-02 | 1987-04-10 | Hitachi Denshi Ltd | 逆数値演算方式 |
JP2003029961A (ja) * | 2001-07-11 | 2003-01-31 | Sony Corp | 除算方法および除算装置 |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH03286212A (ja) * | 1990-03-29 | 1991-12-17 | Yokogawa Electric Corp | 正弦波データ発生回路 |
JPH04314126A (ja) * | 1991-04-12 | 1992-11-05 | Nec Corp | 逆数発生装置 |
JPH06278629A (ja) | 1993-03-26 | 1994-10-04 | Mazda Motor Corp | 車両の操舵装置 |
JPH07248841A (ja) * | 1994-03-09 | 1995-09-26 | Mitsubishi Electric Corp | 非線形関数発生装置およびフォーマット変換装置 |
JPH0926954A (ja) | 1995-07-12 | 1997-01-28 | Ricoh Co Ltd | 補間装置およびその方式 |
US6223192B1 (en) * | 1997-10-23 | 2001-04-24 | Advanced Micro Devices, Inc. | Bipartite look-up table with output values having minimized absolute error |
GB2372353A (en) * | 2000-12-20 | 2002-08-21 | Sicon Video Corp | Method and apparatus for calculating a reciprocal |
US6976043B2 (en) * | 2001-07-30 | 2005-12-13 | Ati Technologies Inc. | Technique for approximating functions based on lagrange polynomials |
US7440987B1 (en) * | 2003-02-25 | 2008-10-21 | Qualcomm Incorporated | 16 bit quadrature direct digital frequency synthesizer using interpolative angle rotation |
-
2004
- 2004-08-27 JP JP2004248395A patent/JP2006065633A/ja active Pending
-
2005
- 2005-04-14 WO PCT/JP2005/007250 patent/WO2006022048A1/ja active Application Filing
- 2005-04-14 EP EP05730631A patent/EP1783601A1/en not_active Withdrawn
- 2005-04-14 US US11/661,375 patent/US8694567B2/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS61292742A (ja) * | 1985-06-20 | 1986-12-23 | Matsushita Electric Ind Co Ltd | 演算装置 |
JPS6278629A (ja) * | 1985-10-02 | 1987-04-10 | Hitachi Denshi Ltd | 逆数値演算方式 |
JP2003029961A (ja) * | 2001-07-11 | 2003-01-31 | Sony Corp | 除算方法および除算装置 |
Also Published As
Publication number | Publication date |
---|---|
US20080104160A1 (en) | 2008-05-01 |
US8694567B2 (en) | 2014-04-08 |
EP1783601A1 (en) | 2007-05-09 |
JP2006065633A (ja) | 2006-03-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8745111B2 (en) | Methods and apparatuses for converting floating point representations | |
KR20190090817A (ko) | 부동 소수점 수를 누산하기 위한 산술 연산을 수행하는 장치 및 방법 | |
US7188133B2 (en) | Floating point number storage method and floating point arithmetic device | |
JP6193531B1 (ja) | 融合された乗算−加算演算のエミュレーション | |
WO2006022048A1 (ja) | 演算方法および装置 | |
US10095475B2 (en) | Decimal and binary floating point rounding | |
WO2012038708A1 (en) | Floating-point vector normalisation | |
US10416962B2 (en) | Decimal and binary floating point arithmetic calculations | |
US20040254970A1 (en) | Apparatus and method for adjusting exponents of floating point numbers | |
CN112651496A (zh) | 一种处理激活函数的硬件电路及芯片 | |
US20160364209A1 (en) | Apparatus and method for inhibiting roundoff error in a floating point argument reduction operation | |
JPH09212337A (ja) | 浮動小数点演算処理装置 | |
TW200532552A (en) | Methods and apparatus for performing mathematical operations using scaled integers | |
Tsen et al. | A combined decimal and binary floating-point multiplier | |
KR102559930B1 (ko) | 수학적 함수들을 연산하기 위한 시스템 및 방법들 | |
JP2007293863A (ja) | データレベル並行性を使用する四面体補間計算のための方法 | |
GB2549153A (en) | Apparatus and method for supporting a conversion instruction | |
CN115268832A (zh) | 浮点数取整的方法、装置以及电子设备 | |
JP4613992B2 (ja) | Simd演算器、simd演算器の演算方法、演算処理装置及びコンパイラ | |
CN108182050B (zh) | 利用四输入点积电路计算三角函数 | |
Hass | Synthesizing optimal fixed-point arithmetic for embedded signal processing | |
JP2020071642A (ja) | 演算処理装置及びその制御方法 | |
JP4163967B2 (ja) | 浮動小数点演算装置 | |
KR20010067226A (ko) | 인터폴레이션 방법 및 장치 | |
Moore | Specialized Multiplier Circuits |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS KE KG KM KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2005730631 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWP | Wipo information: published in national office |
Ref document number: 2005730631 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 11661375 Country of ref document: US |
|
WWP | Wipo information: published in national office |
Ref document number: 11661375 Country of ref document: US |