WO2002052401A2 - Method of obtaining correctly rounded results in ieee double precision format of transcendental functions - Google Patents
Method of obtaining correctly rounded results in ieee double precision format of transcendental functions Download PDFInfo
- Publication number
- WO2002052401A2 WO2002052401A2 PCT/US2001/049052 US0149052W WO02052401A2 WO 2002052401 A2 WO2002052401 A2 WO 2002052401A2 US 0149052 W US0149052 W US 0149052W WO 02052401 A2 WO02052401 A2 WO 02052401A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- polynomial
- double precision
- pair
- precision
- storage device
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/544—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
Definitions
- the present invention relates to data processing systems having a floating point arithmetic unit. More particularly, the present invention relates to a data processing system performing correctly rounded floating point operations.
- Floating point units conventionally use hardware or software to perform mathematical operations. Regardless of the approach used, the floating point unit can produce an inexact result, that is a result having an error in the last few digits. Rounding the result discards one or more of the trailing significant digits and adjusts the retained part in accordance with some specified rule.
- Floating point notation is a method of writing numeric quantities by using an exponent and a significand (also called the mantissa).
- the exponent represents the location of the decimal point in base 10 of the number
- the significand represents the significant digits of the number.
- the number 12.846 can also be written as 1.2846 x 10 1 which comprises an exponent of 1 and a significand of 1.2846.
- the exponent and the significand of the result of a floating point operation exceed a microprocessor's word size.
- the addition of two numbers having n significand bits might result in a sum of having n+1 bits.
- the precision of a floating point number is determined by the length of the exponent and the significand. The longer the exponent and the significand length, the more precise it becomes. However, the tradeoff is that calculations take longer because more bits are being manipulated. Commonly, precisions are adopted according to the specifications set forth in Institute Electrical and Electronic Engineers (IEEE) standard 754 (1981). The standard specifies four precisions for a binary machine: single, single extended, double, and double extended. Implementations are not required to have all four precisions.
- single precision includes 1 sign bit, 7 exponent bits and 24 significand bits.
- Double precision includes 1 sign bit, 10 exponent bits and 53 significand bits.
- the sign bit, exponent bits, and significand bits are packed in a 32 bit word for single precision, and a 64 bit word for double precision.
- a ; method for correctly rounding a result of transcendental function such as sine, cosine, tangent, arc tangent, exponent, logarithm, or power in double precision arithmetic is disclosed. All arguments of the functions that are undefined, overflow or underflow are first purged off. Then the given argument is reduced within a primary range to compute the function as a pair of double precision numbers. Multi-precision arithmetic with a specified word length is used if the sum of the pair of double precision number is close to the boundary case. The multi-precision arithmetic is repeated with greater precision each time until the result of the multi-precision arithmetic is not close to the boundary case. The result of the multi-precision arithmetic or the sum of said pair of double precision numbers is finally rounded according to the given rounding mode.
- All arguments of the functions that are undefined, overflow or underflow are first purged off. Then the given argument is reduced within a primary range to compute the function as a
- FIG. 1 is a block diagram illustrating a method for correctly rounding one of the transcendental functions sine, cosine, tangent, arc tangent, exponential, logarithm, and power in IEEE double precision arithmetic in accordance with a specific embodiment of the present invention.
- FIG. 2 is a system block diagram of a system for implementing the method of the present invention.
- the components, algorithmic steps, and/or data structures are implemented using a source code programmed in the C source computer programming language.
- This implementation is not intended to be limiting in any way. Different implementations may be used and may include other types of language, computing platforms, program storage devices and/or computer programs.
- devices of a less general purpose nature such as hardwired devices, devices relying on FPGA (field programmable gate array) or ASIC (application specific integrated circuit) technology, or the like, may also be used without departing from the scope and spirit of the inventive concepts disclosed herewith.
- a transcendental function is a transcendental number, i.e., a number that cannot be represented with a finite number of digits. Normally one simply cuts the processing off after a certain specified number of bits, hence resulting in an approximation error, then rounds the number using a conventional rounding algorithm, and declares the result.
- the problem with this approach is that two different machines could, conceivably, produce different approximation errors and return slightly different rounded results.
- the precept of the JAVA programming language is that all machines should return the same result.
- the method of the present invention utilizes an iterative approach to request additional bits of the solution until the number can be correctly rounded even in the presence of an approximation error. Since the number will never be equal to a boundary case (which is the case for a rational number), the iterative process will terminate. It is strongly believed that one will never need to do more than a few iterations to achieve correct roundedness. In particular, the first computation will result in 73 bits accuracy, which will yield correctly rounded result 99.9999% of time. A second iteration will give 120 bits accuracy improving the chances of success to 1-2 "67 .
- the method of the present invention attempts to achieve "correct rounding" in compliance with IEEE standards without forcing all calculations to be done at a number of bits which would guarantee a solution in all cases. In this way, the correct rounding is performed significantly more efficiently than, for example, a 120 bit calculation were require for all results.
- FIG. 1 is a block diagram illustrating a general method for correctly rounding one of the transcendental functions sine, cosine, tangent, arc tangent, exponential, logarithm, and power in IEEE double precision arithmetic in accordance with a specific embodiment of the present invention.
- x is an argument that a transcendental function f(x) operates on.
- the transcendental function f may be sine, cosine, tangent, exponential, logarithm, power, or arc tangent (or a combination thereof).
- bit pattern of x is examined. If it is a not a number (NaN), then the code "NaN” is returned. If it is a positive infinite number, then a "+Infinity"code is returned. Next, the range of x is checked. If x is more than the threshold for an overflow condition of the exponential function, then an overflow result is returned. If x is less than the threshold for an underflow condition of the exponential function, then an underflow result is returned.
- argument x is preferably reduced to argument y such that y falls within a primary range.
- ⁇ /7z/4. Let n k mod 4. Then the computation of the trigonometric functions are reduced to the computation of sin(v) or cos(y) as indicated in the table below:
- n is chosen such that Also, y must be less than ln2/32 in absolute value.
- the value of the exponential of x is equal to
- n r*2 where l/sqrt(2) ⁇ r ⁇ sqrt(2) .
- a value is chosen from a table so that
- y r/t t .
- log( ) is computed as follows:
- ⁇ ogix n* ⁇ og(2) + logf ⁇ ) + log(y) where log(t,) is obtained from a look-up table.
- n is determined by rounding 16 /ln2 to the nearest integer. Then x - rc ⁇ 2)/16 is computed by simulating higher precision arithmetic as follows. First, (ln2)/16 is pre-computed to 144 significant bits and then split into four 36-bit chunks, ln2_nbhl, In2_nbh2, In2_nbh3, In2_nbh4. Each chunk is represented by a floating point number.
- f(y) where y is represented as two floating point number yl and y2, will also make use of simulated higher precision arithmetic.
- coefficients a are generated to achieve the desired accuracy of f(y).
- the above polynomial can be computed using a simulated higher precision arithmetic. However, higher precision arithmetic could be slow.
- an integer k is predetermined such that the "tail" portion of polynomial sny + ... + a n y" can be computed using normal arithmetic without affecting the desired final accuracy of f(y). Then simulated higher precision arithmetic is used to compute f(y) to achieve more than 73 significant bit accuracy. Then method of the simulation is similar to the code in the computation of the argument reduction.
- fix represented as a pair of double precision numbers (zl,z2), is retrieved by reversing the argument reduction step from f(y) to achieve at least 73 bits accuracy.
- a block 104 the sum of double precision number zl and z2 are checked for boundary cases by examining the bit pattern from the 54 th significant bit to the 73 rd significant bit.
- the rounding mode is nearest/even: if the 54 th significant bit is 0 and the rest of the bits from the 55 th significant bit to the 73 rd significant bit are l's, or if the 54 th significant bit is 1 and the rest of the bits from the 55 th significant bit to the 73 rd significant bit are O's, then this number is identified as too close to the boundary (halfway) number.
- a multi-precision routine as defined below in block 106 must be used for the computation of f(x) again. However, such multi-precision routine actually slows the computation.
- nw The number of words for a multi- precision number is called nw.
- the reliable number of significant bits is 24*(nw-3).
- the multi-precision number, mx is an integer array defined by:
- mx[0] is the sign word
- mx[l] is the exponent word
- mx[2] is the leading word
- mx[3] to mx[nw-l] are the 24-bit chung significant words.
- the addition, subtraction, multiplication, and division of multi-precision numbers are accomplished by calling special subroutines.
- An example of a multiplication subroutine code written in C language is illustrated below:
- the code first converts the "digit" of the multi-precision number a[] to a floating point number. Then it performs multiplication using an algorithm learnt in school. During the computation, care must be taken to avoid inexact arithmetic.
- exp( ) (((exp(z) 2 ) 2 )-) 2
- exp(x) 2" expO).
- a decision block 110 checks whether the multi-precision result of function fix) which has 24*(nw-3) reliable significant bits is too close to boundary cases. The procedure for checking boundary cases is repeated from block 104. If the multi- precision result with nw number of words is a boundary case, nw is replaced with nw+2 in block 112, and block 106 is reiterated. Otherwise, the result from block 110 is the correctly rounded result of function fix). Such result is then rounded in block 108 per IEEE standard 754.
- FIG. 2 is a system block diagram of a system for implementing the method of the present invention.
- the number on which a transcendental function will operate on is entered into the system through input 202.
- Input 202 may be in the form of any input source, such as a keyboard, or data stored in other format readable by input 202.
- a storage medium 204 for storing the above code can take the form of a hard drive, floppy disk, or laser disc.
- Storage medium 204 provides instructions to computing system 206 that comprises a central processing unit (CPU) 208 and a floating point unit (FPU) 210. Both CPU 208 and FPU 210 perform operations on the number provided by input 202.
- An output 212 provides a user with the results of the computation. Such output may be in the form of a display, printed output, or other output form.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Mathematics (AREA)
- Computing Systems (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Complex Calculations (AREA)
Abstract
Description
Claims
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2002227437A AU2002227437A1 (en) | 2000-12-27 | 2001-12-17 | Method of obtaining correctly rounded results in ieee double precision format of transcendental functions |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US75262800A | 2000-12-27 | 2000-12-27 | |
US09/752,628 | 2000-12-27 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2002052401A2 true WO2002052401A2 (en) | 2002-07-04 |
WO2002052401A3 WO2002052401A3 (en) | 2003-01-23 |
Family
ID=25027107
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2001/049052 WO2002052401A2 (en) | 2000-12-27 | 2001-12-17 | Method of obtaining correctly rounded results in ieee double precision format of transcendental functions |
Country Status (2)
Country | Link |
---|---|
AU (1) | AU2002227437A1 (en) |
WO (1) | WO2002052401A2 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109271134A (en) * | 2018-12-13 | 2019-01-25 | 上海燧原科技有限公司 | Surmount function operation method and device, storage medium and electronic equipment |
-
2001
- 2001-12-17 AU AU2002227437A patent/AU2002227437A1/en not_active Abandoned
- 2001-12-17 WO PCT/US2001/049052 patent/WO2002052401A2/en not_active Application Discontinuation
Non-Patent Citations (5)
Title |
---|
"CORRECTLY ROUNDED ELEMENTARY FUNCTIONS" IBM TECHNICAL DISCLOSURE BULLETIN, IBM CORP. NEW YORK, US, vol. 33, no. 1A, 1 June 1990 (1990-06-01), pages 353-355, XP000117760 ISSN: 0018-8689 * |
LEFEVRE V ET AL: "Towards correctly rounded transcendentals" PROCEEDINGS 13TH IEEE SYMPOSIUM ON COMPUTER ARITHMETIC. ASILOMAR, CA, JULY 6 - 9, 1997, IEEE SYMPOSIUM ON COMPUTER ARITHMETIC, LOS ALAMITOS, CA: IEEE COMP. SOC. PRESS, US, 6 July 1997 (1997-07-06), pages 132-137, XP010241202 ISBN: 0-8186-7846-1 * |
LYNCH T ET AL: "The K5 transcendental functions" COMPUTER ARITHMETIC, 1995., PROCEEDINGS OF THE 12TH SYMPOSIUM ON BATH, UK 19-21 JULY 1995, LOS ALAMITOS, CA, USA,IEEE COMPUT. SOC, US, 19 July 1995 (1995-07-19), pages 163-170, XP010146644 ISBN: 0-8186-7089-4 * |
SCHULTE M ET AL: "Exact rounding of certain elementary functions" COMPUTER ARITHMETIC, 1993. PROCEEDINGS., 11TH SYMPOSIUM ON WINDSOR, ONT., CANADA 29 JUNE-2 JULY 1993, LOS ALAMITOS, CA, USA,IEEE COMPUT. SOC, 29 June 1993 (1993-06-29), pages 138-145, XP010128520 ISBN: 0-8186-3862-1 * |
ZIV A: "FAST EVALUATION OF ELEMENTARY MATHEMATICAL FUNCTIONS WITH CORRECTLY ROUNDED LAST BIT" ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, ASSOCIATION FOR COMPUTING MACHINERY, NEW YORK, US, vol. 17, no. 3, 1 September 1991 (1991-09-01), pages 410-423, XP000446457 ISSN: 0098-3500 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109271134A (en) * | 2018-12-13 | 2019-01-25 | 上海燧原科技有限公司 | Surmount function operation method and device, storage medium and electronic equipment |
CN109271134B (en) * | 2018-12-13 | 2020-08-25 | 上海燧原科技有限公司 | Transcendental function operation method and device, storage medium and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
WO2002052401A3 (en) | 2003-01-23 |
AU2002227437A1 (en) | 2002-07-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lefèvre et al. | Toward correctly rounded transcendentals | |
US5404324A (en) | Methods and apparatus for performing division and square root computations in a computer | |
US5892697A (en) | Method and apparatus for handling overflow and underflow in processing floating-point numbers | |
US5546333A (en) | Data processor having a data table for performing a dual function of alphanumeric notice and numerical calculations | |
CN1271508C (en) | Producing logarithmic signal approximation with vairable precision | |
KR100994862B1 (en) | Floating-point processor with reduced power requirements for selectable subprecision | |
EP0421092B1 (en) | Method and apparatus for performing mathematical functions using polynomial approximation and a rectangular aspect ratio multiplier | |
US5954789A (en) | Quotient digit selection logic for floating point division/square root | |
US5671170A (en) | Method and apparatus for correctly rounding results of division and square root computations | |
EP0329789A1 (en) | Galois field arithmetic unit | |
US5463574A (en) | Apparatus for argument reduction in exponential computations of IEEE standard floating-point numbers | |
US7865882B2 (en) | Fast correctly rounding floating point conversion and identifying exceptional conversion | |
EP0416308A2 (en) | Rectangular array signed digit multiplier | |
JPH05216620A (en) | Method and circuit for normalizing floating point | |
US20230161555A1 (en) | System and method performing floating-point operations | |
WO2002052401A2 (en) | Method of obtaining correctly rounded results in ieee double precision format of transcendental functions | |
JP6919539B2 (en) | Arithmetic processing unit and control method of arithmetic processing unit | |
US8239430B2 (en) | Accuracy improvement in CORDIC through precomputation of the error bias | |
CN114895871A (en) | Trailing or leading digit predictor | |
EP0377992A2 (en) | Floating point division method and apparatus | |
CN115357216A (en) | Data processing method, medium, electronic device, and program product | |
US20220137925A1 (en) | Device and method for hardware-efficient adaptive calculation of floating-point trigonometric functions using coordinate rotate digital computer (cordic) | |
US5710730A (en) | Divide to integer | |
US20090216823A1 (en) | Method, system and computer program product for verifying floating point divide operation results | |
Cowlishaw | General decimal arithmetic specification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
AK | Designated states |
Kind code of ref document: A3 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A3 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
122 | Ep: pct application non-entry in european phase | ||
NENP | Non-entry into the national phase in: |
Ref country code: JP |
|
WWW | Wipo information: withdrawn in national office |
Country of ref document: JP |