US20120239719A1 - Floating-Point Addition Acceleration - Google Patents
Floating-Point Addition Acceleration Download PDFInfo
- Publication number
- US20120239719A1 US20120239719A1 US13/487,307 US201213487307A US2012239719A1 US 20120239719 A1 US20120239719 A1 US 20120239719A1 US 201213487307 A US201213487307 A US 201213487307A US 2012239719 A1 US2012239719 A1 US 2012239719A1
- Authority
- US
- United States
- Prior art keywords
- exponent
- mantissa
- floating
- point
- normalized
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/483—Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
- G06F7/485—Adding; Subtracting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/499—Denomination or exception handling, e.g. rounding or overflow
- G06F7/49936—Normalisation mentioned as feature only
Definitions
- the present invention relates to electronics, and in particular, to the performance of mathematical operations by electronic circuits.
- the invention is a machine-implemented method for generating a normalized floating-point sum from at least first and second floating-point addends, where the floating point sum comprises a mantissa and an exponent.
- the mantissa of the normalized floating-point sum is generated based on the first and second floating-point addends.
- a plurality of possible values for the exponent of the normalized floating-point sum is generated based on a common exponent value.
- One of the possible values is then selected to generate the exponent of the normalized floating-point sum.
- FIG. 1 is a graphical depiction of a prior-art IEEE 754 single-precision 32-bit number 102 .
- FIG. 2 is a block diagram of circuitry 200 designed to implement a prior-art one-step FLSD method.
- FIG. 3 includes Table 1 which presents all the possible values of an LSD pointer in a system with a 24-bit mantissa, and the corresponding shift values in decimal and binary.
- FIG. 4 is a block diagram of prior-art Exponent Exceptions module 246 of FIG. 2 .
- FIG. 5 is a block diagram of circuitry 500 designed to implement an FLSD method according to one embodiment of the present invention.
- FIG. 6 is a block diagram of Exponent and Exception Lookup module 560 of FIG. 5 and Exponent Exceptions module 546 of FIG. 5 .
- FIG. 7 is a block diagram of each Exponent Precomputation module 602 of FIG. 6 .
- Floating-point representation of a number is scientific notation, i.e., s ⁇ i ⁇ f ⁇ b n , where s is the sign digit (+1 or ⁇ 1), i is the leading digit, f is the fraction, b is the base, and n is the exponent.
- i ⁇ f is the significand, or mantissa, and is typically represented by m.
- IEEE 754 a widely-used floating-point format, assumes a base b of 2 and an implied leading digit i of 1, yielding s ⁇ 1 ⁇ f ⁇ 2′. Thus, IEEE 754 data structures need only encode sign digit s, fraction f, and exponent n. IEEE 754 posits two types of floating-point formats: single-precision 32-bit format and double-precision 64-bit format. All examples which follow assume single-precision 32-bit format.
- FIG. 1 is a graphical depiction of a prior-art IEEE 754 single-precision 32-bit data structure 102 .
- Data structure 102 is terminated on the right with bit 0 and terminated on the left with bit 31 .
- Data structure 102 comprises three fields: 1-bit sign digit field 104 , 8-bit exponent field 106 , and 23-bit fraction field 108 .
- Sign digit field 104 (bit 31 ) represents sign digit s.
- Exponent field 106 represents exponent n. Starting with least-significant bit (LSB) 23 and ending with most-significant bit (MSB) 30 , exponent field 106 comprises 8 bits, and as such can store 8-bit binary numbers from 0 through 255 decimal. However, a floating-point format typically needs to represent both negative and positive exponents n. IEEE 754 represents both negative and positive exponents by using a bias, i.e., a number added to n which yields the value which will be stored. IEEE 754 single-precision 32-bit format uses a bias value of 127.
- exponent n of ⁇ 27 decimal will be stored in exponent field 106 as ( ⁇ 27+127) or 100 decimal (i.e., 01100100 binary), and an exponent n of 88 will be stored as (88+127) or 215 decimal (i.e., 11010111 binary).
- the binary values 00000000 and 11111111 are reserved for exceptions (discussed below).
- the effective range of exponent n is ⁇ 126 to +127.
- exponent field 106 is 10000011 binary, or 130 decimal. Subtracting a bias of 127 from 130 yields an exponent n of 3.
- Fraction field 108 represents fraction f.
- Fraction field 108 comprises 23 bits, starting with LSB 0 and ending with MSB 22.
- the 23 bits of fraction field 108 together with an implied leading 1 (except when exponent field 106 is all zeroes), yield a 24-bit mantissa.
- fraction field 108 is 01010100000000000000000, i.e., binary mantissa 1.010101, i.e., 1.328125 decimal.
- floating-point data structure 102 represents +1 ⁇ 1.328125 ⁇ 2 3 , or 10.625 decimal.
- floating-point numbers be normalized, i.e., that there be only one significant digit (which in binary can only be 1) to the left of the radix point of mantissa m.
- the addition of two floating-point numbers typically involves two normalized floating-point addends A and B, and yields a normalized sum S.
- the mantissas of these three numbers are designated M A , M B , and M S , respectively, and the exponents are designated E A , E B , and E S .
- mantissas M A and M B can be represented in 2's complement format, thus allowing a single simple structure to perform addition or subtraction.
- the general process for summing addends A and B consists of three steps: de-normalization of the addends, mantissa addition, and normalization of the sum.
- a typical method for de-normalizing addends is to increase the smallest exponent E smallest by x to equal the largest exponent E largest and shift the binary point of mantissa M smallest of the addend with the smallest exponent x places to the left to yield de-normalized mantissa M smallest,d .
- M smallest,d and the mantissa M largest associated with the largest exponent are added to yield a possibly un-normalized mantissa sum M S,u . If both addends are positive, or if both addends are negative, then the left-most significant digit (LSD) of M S,u will be either one or two places to the left of the binary point. If one addend is positive and the other is negative, then the LSD of M S,u can occur anywhere from one place to the left of the binary point to 23 places to the right. In the example above, M largest 1.11111 and M smallest,d 0.00001 are added to yield M S,u 10.00000.
- LSD left-most significant digit
- M S,u In normalization of the sum, if M S,u is not already in normalized form, then it is normalized to yield normalized sum S. In other words, if required, the binary point of M S,u is shifted left or right as appropriate until there is only one significant digit to the left of the binary point to yield normalized mantissa sum M S . Then, E largest is adjusted by y to yield the exponent E S of normalized sum S. In the example above, M S,u 10.0 is normalized by shifting the binary point one place to the left to yield M S 1.0. Then, E largest 7 is increased by 1 to yield E S of 8.
- An exception occurs when a floating-point operation yields a result which cannot be represented in the floating-point numbering system used.
- Three common exceptions are overflow, underflow, and zero.
- Overflow and underflow exceptions occur when addition results in a sum, the absolute value of which is either too large (overflow) or too small (underflow) to be represented in the floating-point numbering system used.
- IEEE 754 32-bit single-precision format is not capable of representing a positive number greater than (2 ⁇ 2 ⁇ 23 ) ⁇ 2 127 (positive overflow) or less than 2 ⁇ 126 (positive underflow), or a negative number the absolute value of which is greater than (2 ⁇ 2 ⁇ 23 ) ⁇ 2 127 (negative overflow), or less than 2 ⁇ 126 (negative underflow).
- IEEE 754 with its implied leading digit of 1, is incapable of naturally representing 0 (zero exception).
- a system When a system encounters an exception, it typically generates a corresponding exception signal. In a typical system, that exception signal is then trapped and processed in a manner determined by the system administrator. In a system using IEEE floating-point format, the typical manner for processing exceptions is to use special, reserved combinations of exponents and fractions for specific exceptions, and to use the sign digit of the intermediate result.
- an overflow exception is typically represented by a fraction of all 0s and an exponent of all 1s (also known as infinity), and the sign digit of the intermediate result.
- positive overflow is represented by positive infinity
- negative overflow is represented by negative infinity.
- a negative underflow exception is typically represented by either negative zero (i.e., fraction is 0, exponent is 0, and sign digit is 1) or the smallest negative number that can be represented (i.e., ⁇ 2 ⁇ 126 ).
- a positive underflow exception is typically represented by either positive zero (i.e., fraction is 0, exponent is 0, and sign digit is 0) or the smallest positive number that can be represented (i.e., 2 ⁇ 126 ).
- a zero exception is typically represented by a fraction of all 0s, an exponent of all 0s, and a sign digit of 0. Note that the implied leading digit of 1 is not used in this case.
- U.S. Pat. No. 4,758,974 describes a set of related methods for reducing the time required for the addition of floating-point addends.
- the key to this time reduction is to calculate in parallel both the un-normalized mantissa sum M S,u and the location of the left-most significant digit (LSD) within M S,u .
- LSD left-most significant digit
- FIG. 2 is a block diagram of circuitry 200 designed to implement a prior-art one-step FLSD method.
- bolded arrows indicate the critical timing path.
- Exponent Compare module 202 receives exponents E A ( 204 ) and E B ( 206 ), determines which is the greatest (E largest ), and outputs three values.
- E largest ( 208 ) is sent to Add Exponent Adjustment module 210 .
- M largest Selector Bit ( 212 ) which indicates which of M A and M B is associated with E largest , is sent to Mantissa Selection module 214 .
- Mantissa Selection module 214 receives mantissas M A ( 220 ) and M B ( 222 ), and also receives the M largest Selector Bit ( 212 ) from Exponent Compare module 202 .
- the M largest Selector Bit ( 212 ) tells Mantissa Selection module 214 which of M A ( 220 ) and M B ( 222 ) is associated with E largest .
- Mantissa Selection module 214 sends the mantissa M smallest ( 224 ) of the addend with the smallest exponent to De Normalize M smallest module 218 , and sends the mantissa M largest ( 226 ) of the addend with the largest exponent to Add Mantissas/FLSD module 228 .
- M smallest module 218 de-normalizes M smallest ( 224 ), i.e., shifts the binary point of M smallest ( 224 ) to the left by the number of places indicated by the M smallest Shift Value ( 216 ) received from Exponent Compare module 202 . Module 218 then sends the de-normalized result, M smallest,d ( 230 ), to Add Mantissas/FLSD module 228 .
- Add Mantissas/FLSD module 228 adds M largest ( 226 ) and M smallest,d ( 230 ), and sends resulting un-normalized mantissa sum M S,u ( 232 ) to Normalize M S,u module 234 . If M S,u ( 232 ) is 0, then Zero-Exception Signal ( 236 ) is set to 1; otherwise, to 0.
- module 228 also finds the location of the left-most significant digit (LSD) of M S,u ( 232 ) and encodes that location as a 25-bit LSD Pointer ( 238 ). Specifically, 24 of the 25 bits of the LSD Pointer ( 238 ) will be 0, and the location of a single bit of value 1 within the pointer will indicate the location of the LSD within M S,u ( 232 ).
- Module 228 sends the LSD Pointer ( 238 ) to Encode Exponent Adjust module 240 and to Normalize M S,u module 234 .
- Encode Exponent Adjust module 240 encodes the LSD Pointer ( 238 ) into a 2's complement binary Exponent Shift Value ( 242 ) to be added to E largest ( 208 ) to yield the exponent E s of the normalized sum ( 244 ).
- FIG. 3 includes Table 1 which presents all the possible values of an LSD pointer in a system with a 24-bit mantissa, and the corresponding shift values in decimal and binary.
- Encode Exponent Adjust module 240 then sends the Exponent Shift Value ( 242 ) to Add Exponent Adjustment module 210 .
- Add Exponent Adjustment module 210 adds the Exponent Shift Value ( 242 ) to E largest ( 208 ) to yield a 9-bit exponent E S ( 244 ), and sends E S ( 244 ) to Exponent Exceptions module 246 .
- E s ( 244 ) is a 9-bit number because exponent adjustment may result in an overflow/underflow exception and the 9 th bit can be used for detection of overflow and underflow.
- Exponent Exceptions module 246 receives 9-bit exponent E s ( 244 ) from module 210 and the Zero-Exception Signal ( 236 ) from Add Mantissas/FLSD module 228 , determines whether an exception has occurred, and outputs 1) the appropriate E s value ( 248 ), 2) the appropriate Overflow-Exception Signal ( 250 ), and 3) the appropriate Underflow-Exception Signal ( 252 ).
- FIG. 4 is a block diagram of prior-art Exponent Exceptions module 246 of FIG. 2 .
- Received 9-bit exponent E s ( 244 of FIG. 2 ) is sent to logic blocks 402 and 404 , and to data input D 0 of multiplexor 406 . If an overflow exception has occurred, then logic block 402 sets Overflow-Exception Signal ( 250 of FIG. 2 ) to 1; otherwise, module 402 sets Overflow-Exception Signal ( 250 of FIG. 2 ) to 0.
- Overflow-Exception Signal ( 250 of FIG. 2 ) goes to 1) Select M s or Exception Value module 256 of FIG. 2 , 2) NOR gate 408 , and 3) select input S 2 on multiplexor 406 .
- logic block 404 sets the Underflow-Exception Signal ( 252 of FIG. 2 ) to 1; otherwise, module 404 sets the Underflow-Exception Signal ( 252 of FIG. 2 ) to 0.
- the Underflow-Exception Signal ( 252 of FIG. 2 ) goes to 1) Select M s or Exception Value module 256 of FIG. 2 , 2) NOR gate 408 , and 3) select input S 3 on multiplexor 406 .
- Module 246 also receives the Zero-Exception Signal ( 236 ) from module 228 of FIG. 2 , which is sent to 1) NOR gate 408 and 2 ) select input S 1 on multiplexor 406 .
- NOR gate 408 receives three inputs: Overflow-Exception Signal ( 250 of FIG. 2 ), Underflow-Exception Signal ( 252 of FIG. 2 ), and Zero-Exception Signal ( 228 of FIG. 2 ). If all three of these inputs are 0 (i.e., there are no exceptions), then NOR gate 408 will output a 1 to select input S 0 on multiplexor 406 ; otherwise, it will output a 0.
- Multiplexor 406 receives four data inputs (D 0 , D 1 , D 2 , D 3 ) and four select inputs (S 0 , S 1 , S 2 , S 3 ). If the Underflow-Exception Signal ( 252 of FIG. 2 ) on select input S 3 is 1, then multiplexor 406 will output the 8-bit Underflow-Exception Exponent Value ( 410 ) on data input D 3 , e.g., an E s ( 248 of FIG. 2 ) consisting of all zeroes. If the Overflow-Exception Signal ( 250 of FIG.
- multiplexor 406 will output the 8-bit Overflow-Exception Exponent Value ( 412 ) on data input D 2 , e.g., an E s ( 248 ) consisting of all ones. If the Zero-Exception Signal ( 236 of FIG. 2 ) on select input S 1 is 1, then multiplexor 406 will output the 8-bit Zero-Exception Exponent Value ( 414 ) on data input D 1 , e.g., an E s ( 248 of FIG. 2 ) consisting of all zeroes.
- NOR gate 406 will output a 1 to select input S 0 , causing multiplexor 406 to output the first eight bits of the E s ( 244 of FIG. 2 ) received from Add Exponent Adjustment module 210 of FIG. 2 (data input D 0 ).
- Normalize M S,u module 234 normalizes un-normalized mantissa sum M s,u ( 232 ), i.e., if M s,u is not already in normal form, the binary point of M s,u is shifted to the left or right by the number of placed indicated by the LSD Pointer ( 238 ). Module 234 then sends resulting normalized mantissa sum M s ( 254 ) to Select M s or Exception Value module 256 .
- Select M s or Exception Value module 256 receives normalized mantissa sum M s ( 254 ) from Normalize M S,u module 234 , and the Overflow-Exception Signal ( 250 of FIG. 2 ) and the Underflow-Exception Signal ( 252 of FIG. 2 ) from Exponent Exceptions module 246 . If the Overflow-Exception Signal ( 250 of FIG. 2 ) is 1, then module 256 will output the mantissa value assigned to overflow exceptions (e.g., 00000000) as the final value 258 for M s . Likewise, if the Underflow-Exception Signal ( 252 of FIG.
- module 256 will output the mantissa value assigned to underflow exceptions (e.g., 00000000) as the final value 258 for M s . If neither of those two exception signals is 1, then module 256 outputs normalized mantissa sum M s ( 254 ) received from module 234 as the final value 258 for M s .
- underflow exceptions e.g., 00000000
- mantissa addition e.g., module 228 of FIG. 2
- any normalization adjustment of E largest ( 208 ) e.g., modules 240 and 210 of FIG. 2
- the operations of encoding e.g., module 240 of FIG. 2
- the LSD Pointer 238
- adding e.g., module 210 of FIG. 2
- the Exponent Shift Value 242
- E s 244
- FIG. 5 is a block diagram of circuitry 500 designed to implement an FLSD method according to one embodiment of the present invention.
- Modules 502 , 514 , 518 , 528 , 534 , and 556 are analogous to modules 202 , 214 , 218 , 228 , 234 , and 256 of FIG. 2 .
- Signals 520 , 522 , 526 , 524 , 530 , 532 , 554 , 558 , 504 , 506 , 512 , 516 , 518 , 536 , 538 , 550 , 552 , and 548 are analogous to signals 220 , 222 , 226 , 224 , 230 , 232 , 254 , 258 , 204 , 206 , 212 , 216 , 208 , 236 , 238 , 250 , 252 , and 248 of FIG. 2 .
- Modules 240 and 210 of FIG. 2 have been removed and replaced with a new Exponent and Exception Lookup module 560 .
- Modules 246 and 546 differ in several respects.
- Add Mantissas/FLSD module 528 now sends the LSD Pointer ( 538 ) to Exponent Exceptions module 546 .
- Exponent Compare module 502 now sends E largest ( 508 ) to new Exponent and Exception Lookup module 560 .
- Exponent and Exception Lookup module 560 sends its output 562 to Exponent Exceptions module 546 .
- FIG. 6 is a block diagram of Exponent and Exception Lookup module 560 of FIG. 5 and Exponent Exceptions module 546 of FIG. 5 .
- Modules 560 and 546 receive 1) E largest ( 508 of FIG. 5 ) from Exponent Compare module 502 of FIG. 2 , 2) the LSD Pointer ( 538 of FIG. 5 ), and 3) the Zero-Exception Signal ( 536 of FIG. 5 ) from Add Mantissas/FLSD module 528 of FIG. 5 .
- Modules 560 and 546 of FIG. 6 generate all N m +1 possible values of exponent E s and then, based on the Zero-Exception Signal ( 536 of FIG. 5 ) and the LSD Pointer ( 538 of FIG.
- Exponent and Exception Lookup module 560 of FIG. 5 receives an 8-bit E largest ( 502 of FIG. 5 ) from Exponent Compare module 502 of FIG. 5 , and sends that received E largest value to a number of Exponent Precomputation modules 602 simultaneously.
- the number of modules 602 will equal the number of mantissa digits of the numbering system, plus one (i.e., N m +1).
- the numbering system is IEEE 754 single-precision format.
- the number of mantissa digits is 24, and therefore there are 24+1 or 25 modules 602 .
- Associated with each 602 module is a constant ranging from ⁇ 1 to +23.
- FIG. 7 is a block diagram of each Exponent Precomputation module 602 of FIG. 6 .
- Module 602 of FIG. 6 receives 8-bit input E largest ( 508 of FIG. 5 ), adds a constant value 704 to E largest ( 508 of FIG. 5 ), determines whether an underflow or overflow exception occurred as a result of that addition, and outputs a 10-bit Out i ( 562 of FIG. 5 ).
- the default value of bits 8 and 9 of Out i is 0 (i.e., the no overflow or underflow case).
- Adder 702 adds its associated constant value 704 to E largest ( 508 of FIG. 5 ) to yield a 9-bit sum E i ( 714 ).
- the 9 th bit of E i ( 714 ) is to accommodate potential overflow/underflow situations.
- E i ( 714 ) is then sent to logic blocks 706 , 708 , and 716 .
- logic block 706 sets bit 8 of Out i ( 562 of FIG. 5 ) to 1, and logic block 716 sets bits 0 through 7 of Out i ( 562 of FIG. 5 ) to a specified maximum exponent value. If an underflow exception has occurred, logic block 708 sets bit 9 of Out i ( 562 of FIG. 5 ) to 1, and logic block 716 sets bits 0 through 7 of Out i ( 562 of FIG. 5 ) to a specified minimum exponent value. If no exception has occurred, then bits 8 and 9 of Out i ( 562 of FIG. 5 ) remain 0, and logic block 716 sets bits 0 through 7 of Out i ( 562 of FIG. 5 ) equal to bits 0 through 7 of E i ( 714 ).
- module 560 of FIG. 5 sends 25 10-bit Out i ( 562 of FIG. 5 ) values to data (bus) inputs D 0 through D 24 of multiplexor 610 in Exponent Exceptions module 546 of FIG. 5 .
- Module 546 of FIG. 5 also receives the LSD Pointer ( 538 of FIG. 5 ) and the Zero-Exception Signal ( 536 of FIG. 5 ) from Add Mantissa/FLSD module 528 of FIG. 5 .
- the Zero-Exception Signal ( 536 of FIG. 5 ) will be a 1; otherwise, it is a 0.
- the Zero-Exception Signal ( 536 of FIG. 5 ) is sent to select input S 25 on multiplexor 610 , and also to inverter 606 , the output of which is sent to an input of each of 25 AND gates 604 .
- the other input to each of 25 AND gates 604 is one of the 25 bits of the LSD Pointer ( 538 of FIG. 5 ).
- the 25 outputs of AND gates 604 are sent to select inputs S 0 through S 24 on multiplexor 610 .
- Zero-Exception Signal ( 536 of FIG. 5 ) is 1 (i.e., mantissa addition resulted in 0)
- the LSD Pointer ( 538 of FIG. 5 ) will be overwritten with a string of zeroes by AND gates 604 , and thus none of the values D 0 through D 24 on multiplexor 610 will be selected. Instead, a value of 1 at select input S 25 will cause multiplexor 610 to select the Zero-Exception Value 608 at data input D 25 , i.e., an 8-bit string of all 0s.
- inverter 606 and AND gates 604 cause the LSD Pointer ( 538 of FIG. 5 ) to pass through to select inputs S 0 through S 24 , selecting one of 25 Out i values ( 562 of FIG. 5 ).
- Bits 0 through 7 of the selected Out i are outputted as normalized exponent E s ( 548 of FIG. 5 ).
- Bit 8 of the selected Out i is outputted as the Overflow-Exception Signal ( 550 of FIG. 5 ), and bit 9 of the selected Out i is outputted as the Underflow-Exception Signal ( 552 of FIG. 5 ).
- N addends are compared to determine E largest .
- N ⁇ 1 addends all but the one with the largest exponent
- multiple addends are added/subtracted to yield two addends.
- multiple mantissas can be added using a carry save adder tree to reduce N addends to two values, analogous to the tree-reduction operations used in parallel multipliers.
- the two addends are processed via the method discussed above. In theory, this can be done with nearly the same speed as a single pair of operands with only some time for the tree reduction added to the critical timing path
- One step in the addition of two floating-point addends is to de-normalize (if necessary) one or more of the addends such that their exponents match.
- the embodiment of the present invention presented above and in the attached figures accomplishes this step by de-normalizing the addend with the smallest exponent until the exponents match, i.e., by increasing E smallest by x so that E smallest is equal to E largest , and shifting the binary point of M smallest x places to the left.
- Alternative methods include 1) de-normalizing the addend with the largest exponent until the exponents match, and 2) adjusting both addends until their exponents match a third, common value.
- module 560 of FIG. 5 uses N m +1 adders to compute the N m +1 possible values of E s , another method would be to use an (N m +1)-deep lookup table.
- modules 502 , 514 , 518 , 528 , 534 , and 556 may be said to be implemented by a mantissa generator
- module 560 may be said to be implemented by an exponent and exception generator
- module 546 may be said to be implemented by an exponent selector.
- modules 502 , 514 , 518 , and 528 may be said to be implemented by an unnormalized mantissa sum generator
- modules 534 and 556 may be said to be implemented by a normalized mantissa generator.
- modules 502 , 514 , and 518 may be said to be implemented by a de-normalizer, and module 528 may be said to implement a mantissa adder, an LSD pointer generator, and a zero-exception generator.
- any one of a number of alternate devices could have been specified, e.g., tri-state drivers, parallel switches, etc.
- IEEE 754 64-bit double-precision format has an 11-bit exponent field and a 52-bit fraction field.
- an embodiment of the present invention would possess 54 adders 602 in FIG. 6 , 54 AND gates 604 in FIG. 6 , and a multiplexor 610 possessing 55 data inputs and 55 select inputs.
- circuits including possible implementation as a single integrated circuit, a multi-chip module, a single card, or a multi-card circuit pack
- present invention is not so limited.
- various functions of circuit elements may also be implemented as processing blocks in a software program.
- Such software may be employed in, for example, a digital signal processor, micro-controller, or general purpose computer.
- each numerical value and range should be interpreted as being approximate as if the word “about” or “approximately” preceded the value of the value or range.
- figure numbers and/or figure reference labels in the claims is intended to identify one or more possible embodiments of the claimed subject matter in order to facilitate the interpretation of the claims. Such use is not to be construed as necessarily limiting the scope of those claims to the embodiments shown in the corresponding figures.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Mathematics (AREA)
- Computing Systems (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Nonlinear Science (AREA)
- Complex Calculations (AREA)
Abstract
Description
- This is a continuation of co-pending application Ser. No. 12/180,759, filed on Jul. 28, 2008, as attorney docket no.
Rigge 13, the teachings of which are incorporated herein by reference. - 1. Field of the Invention
- The present invention relates to electronics, and in particular, to the performance of mathematical operations by electronic circuits.
- 2. Description of the Related Art
- Floating-point representation of a number in scientific notation is well known in the art, as are the IEEE 754 floating-point format and floating-point data structures. U.S. Pat. No. 4,758,974, the teachings of which are hereby incorporated by reference in their entirety, describes a set of related methods for reducing the time required for the addition of floating-point addends.
- In one embodiment, the invention is a machine-implemented method for generating a normalized floating-point sum from at least first and second floating-point addends, where the floating point sum comprises a mantissa and an exponent. The mantissa of the normalized floating-point sum is generated based on the first and second floating-point addends. Independent of mantissa generation, a plurality of possible values for the exponent of the normalized floating-point sum is generated based on a common exponent value. One of the possible values is then selected to generate the exponent of the normalized floating-point sum.
- Other aspects, features, and advantages of the invention will become more fully apparent from the following detailed description, the appended claims, and the accompanying drawings in which like reference numerals identify similar or identical elements.
-
FIG. 1 is a graphical depiction of a prior-art IEEE 754 single-precision 32-bit number 102. -
FIG. 2 is a block diagram ofcircuitry 200 designed to implement a prior-art one-step FLSD method. -
FIG. 3 includes Table 1 which presents all the possible values of an LSD pointer in a system with a 24-bit mantissa, and the corresponding shift values in decimal and binary. -
FIG. 4 is a block diagram of prior-art ExponentExceptions module 246 ofFIG. 2 . -
FIG. 5 is a block diagram ofcircuitry 500 designed to implement an FLSD method according to one embodiment of the present invention. -
FIG. 6 is a block diagram of Exponent andException Lookup module 560 ofFIG. 5 and ExponentExceptions module 546 ofFIG. 5 . -
FIG. 7 is a block diagram of each ExponentPrecomputation module 602 ofFIG. 6 . - Floating-Point Number Format
- Floating-point representation of a number is scientific notation, i.e., s×i·f×bn, where s is the sign digit (+1 or −1), i is the leading digit, f is the fraction, b is the base, and n is the exponent. The term i·f is the significand, or mantissa, and is typically represented by m.
- IEEE 754, a widely-used floating-point format, assumes a base b of 2 and an implied leading digit i of 1, yielding s×1·f×2′. Thus, IEEE 754 data structures need only encode sign digit s, fraction f, and exponent n. IEEE 754 posits two types of floating-point formats: single-precision 32-bit format and double-precision 64-bit format. All examples which follow assume single-precision 32-bit format.
-
FIG. 1 is a graphical depiction of a prior-art IEEE 754 single-precision 32-bit data structure 102.Data structure 102 is terminated on the right withbit 0 and terminated on the left withbit 31.Data structure 102 comprises three fields: 1-bitsign digit field 104, 8-bit exponent field 106, and 23-bit fraction field 108. - Sign digit field 104 (bit 31) represents sign digit s. Sign
digit field 104 can be either a 0 or a 1, where 0 indicates +1 and 1 indicates −1. For the example shown inFIG. 1 ,sign digit field 104 is 0, which means s=+1. -
Exponent field 106 represents exponent n. Starting with least-significant bit (LSB) 23 and ending with most-significant bit (MSB) 30,exponent field 106 comprises 8 bits, and as such can store 8-bit binary numbers from 0 through 255 decimal. However, a floating-point format typically needs to represent both negative and positive exponents n. IEEE 754 represents both negative and positive exponents by using a bias, i.e., a number added to n which yields the value which will be stored. IEEE 754 single-precision 32-bit format uses a bias value of 127. Thus, an exponent n of −27 decimal will be stored inexponent field 106 as (−27+127) or 100 decimal (i.e., 01100100 binary), and an exponent n of 88 will be stored as (88+127) or 215 decimal (i.e., 11010111 binary). The binary values 00000000 and 11111111 are reserved for exceptions (discussed below). As such, the effective range of exponent n is −126 to +127. InFIG. 1 ,exponent field 106 is 10000011 binary, or 130 decimal. Subtracting a bias of 127 from 130 yields an exponent n of 3. -
Fraction field 108 represents fraction f.Fraction field 108 comprises 23 bits, starting withLSB 0 and ending withMSB 22. The 23 bits offraction field 108, together with an implied leading 1 (except whenexponent field 106 is all zeroes), yield a 24-bit mantissa. InFIG. 1 ,fraction field 108 is 01010100000000000000000, i.e., binary mantissa 1.010101, i.e., 1.328125 decimal. - Thus, floating-
point data structure 102 represents +1×1.328125×23, or 10.625 decimal. - Addition of Floating-Point Numbers
- It is desirable that floating-point numbers be normalized, i.e., that there be only one significant digit (which in binary can only be 1) to the left of the radix point of mantissa m. Thus, the addition of two floating-point numbers typically involves two normalized floating-point addends A and B, and yields a normalized sum S. The mantissas of these three numbers are designated MA, MB, and MS, respectively, and the exponents are designated EA, EB, and ES. In alternative floating-point formats to IEEE 754, mantissas MA and MB can be represented in 2's complement format, thus allowing a single simple structure to perform addition or subtraction.
- The general process for summing addends A and B consists of three steps: de-normalization of the addends, mantissa addition, and normalization of the sum.
- In de-normalization of the addends, if exponents EA and EB are not equal, then one or more of the addends is de-normalized until EA and EB match. A typical method for de-normalizing addends is to increase the smallest exponent Esmallest by x to equal the largest exponent Elargest and shift the binary point of mantissa Msmallest of the addend with the smallest exponent x places to the left to yield de-normalized mantissa Msmallest,d. For example, to add 1.0×22 (addend A) and 1.11111×27 (addend B), the method described above would increase Esmallest (in this case, EA, or 2) by 5 so that Esmallest equals Elargest (in this case, EB, or 7). Then the binary point of Msmallest (in this case, MA, or 1.0) is shifted an equal number of places to the left (i.e., 5) to yield de-normalized mantissa Msmallest,d 0.00001.
- When exponents EA and EB are equal, then one of the exponents is arbitrarily selected as Elargest.
- In mantissa addition, Msmallest,d and the mantissa Mlargest associated with the largest exponent are added to yield a possibly un-normalized mantissa sum MS,u. If both addends are positive, or if both addends are negative, then the left-most significant digit (LSD) of MS,u will be either one or two places to the left of the binary point. If one addend is positive and the other is negative, then the LSD of MS,u can occur anywhere from one place to the left of the binary point to 23 places to the right. In the example above, Mlargest 1.11111 and Msmallest,d 0.00001 are added to yield MS,u 10.00000.
- In normalization of the sum, if MS,u is not already in normalized form, then it is normalized to yield normalized sum S. In other words, if required, the binary point of MS,u is shifted left or right as appropriate until there is only one significant digit to the left of the binary point to yield normalized mantissa sum MS. Then, Elargest is adjusted by y to yield the exponent ES of normalized sum S. In the example above, MS,u 10.0 is normalized by shifting the binary point one place to the left to yield MS 1.0. Then,
E largest 7 is increased by 1 to yield ES of 8. - Exceptions: Overflow, Underflow, and Zero
- An exception occurs when a floating-point operation yields a result which cannot be represented in the floating-point numbering system used. Three common exceptions are overflow, underflow, and zero. Overflow and underflow exceptions occur when addition results in a sum, the absolute value of which is either too large (overflow) or too small (underflow) to be represented in the floating-point numbering system used. For example, IEEE 754 32-bit single-precision format is not capable of representing a positive number greater than (2−2−23)×2127 (positive overflow) or less than 2−126 (positive underflow), or a negative number the absolute value of which is greater than (2−2−23)×2127 (negative overflow), or less than 2−126 (negative underflow). Furthermore, IEEE 754, with its implied leading digit of 1, is incapable of naturally representing 0 (zero exception).
- When a system encounters an exception, it typically generates a corresponding exception signal. In a typical system, that exception signal is then trapped and processed in a manner determined by the system administrator. In a system using IEEE floating-point format, the typical manner for processing exceptions is to use special, reserved combinations of exponents and fractions for specific exceptions, and to use the sign digit of the intermediate result.
- Specifically, an overflow exception is typically represented by a fraction of all 0s and an exponent of all 1s (also known as infinity), and the sign digit of the intermediate result. Thus, positive overflow is represented by positive infinity, and negative overflow is represented by negative infinity. A negative underflow exception is typically represented by either negative zero (i.e., fraction is 0, exponent is 0, and sign digit is 1) or the smallest negative number that can be represented (i.e., −2 −126). A positive underflow exception is typically represented by either positive zero (i.e., fraction is 0, exponent is 0, and sign digit is 0) or the smallest positive number that can be represented (i.e., 2−126). Lastly, a zero exception is typically represented by a fraction of all 0s, an exponent of all 0s, and a sign digit of 0. Note that the implied leading digit of 1 is not used in this case.
- Find Left-Most Significant Digit (FLSD)
- U.S. Pat. No. 4,758,974 describes a set of related methods for reducing the time required for the addition of floating-point addends. The key to this time reduction is to calculate in parallel both the un-normalized mantissa sum MS,u and the location of the left-most significant digit (LSD) within MS,u. Thus, the subsequent normalization adjustments of MS,u and Elargest can take place in parallel rather than in serial.
- The methods described in U.S. Pat. No. 4,758,974 are referred to as Find Left-most Significant Digit, or FLSD. One of those methods is a two-step method, wherein a first step finds an approximate location of the LSD, and a second step finds the exact location. Another method finds the exact location of the LSD in one step (the one-step FLSD method).
-
FIG. 2 is a block diagram ofcircuitry 200 designed to implement a prior-art one-step FLSD method. InFIG. 2 , bolded arrows indicate the critical timing path. - Exponent Compare
module 202 receives exponents EA (204) and EB (206), determines which is the greatest (Elargest), and outputs three values. Elargest (208) is sent to AddExponent Adjustment module 210. Mlargest Selector Bit (212), which indicates which of MA and MB is associated with Elargest, is sent toMantissa Selection module 214. Msmallest Shift Value (216), which represents the difference between exponents EA and EB, is sent toDe-Normalize module 218. -
Mantissa Selection module 214 receives mantissas MA (220) and MB (222), and also receives the Mlargest Selector Bit (212) from Exponent Comparemodule 202. The Mlargest Selector Bit (212) tellsMantissa Selection module 214 which of MA (220) and MB (222) is associated with Elargest.Mantissa Selection module 214 sends the mantissa Msmallest (224) of the addend with the smallest exponent to De Normalize Msmallest module 218, and sends the mantissa Mlargest (226) of the addend with the largest exponent to Add Mantissas/FLSD module 228. - De-normalize Msmallest module 218 de-normalizes Msmallest (224), i.e., shifts the binary point of Msmallest (224) to the left by the number of places indicated by the Msmallest Shift Value (216) received from Exponent Compare
module 202.Module 218 then sends the de-normalized result, Msmallest,d (230), to Add Mantissas/FLSD module 228. - Add Mantissas/
FLSD module 228 adds Mlargest (226) and Msmallest,d (230), and sends resulting un-normalized mantissa sum MS,u (232) to Normalize MS,u module 234. If MS,u (232) is 0, then Zero-Exception Signal (236) is set to 1; otherwise, to 0. - At the same time as mantissa addition,
module 228 also finds the location of the left-most significant digit (LSD) of MS,u (232) and encodes that location as a 25-bit LSD Pointer (238). Specifically, 24 of the 25 bits of the LSD Pointer (238) will be 0, and the location of a single bit ofvalue 1 within the pointer will indicate the location of the LSD within MS,u (232).Module 228 sends the LSD Pointer (238) to Encode Exponent Adjustmodule 240 and to Normalize MS,u module 234. - Encode Exponent Adjust
module 240 encodes the LSD Pointer (238) into a 2's complement binary Exponent Shift Value (242) to be added to Elargest (208) to yield the exponent Es of the normalized sum (244).FIG. 3 includes Table 1 which presents all the possible values of an LSD pointer in a system with a 24-bit mantissa, and the corresponding shift values in decimal and binary. Encode Exponent Adjustmodule 240 then sends the Exponent Shift Value (242) to AddExponent Adjustment module 210. - Add
Exponent Adjustment module 210 adds the Exponent Shift Value (242) to Elargest (208) to yield a 9-bit exponent ES (244), and sends ES (244) toExponent Exceptions module 246. Es (244) is a 9-bit number because exponent adjustment may result in an overflow/underflow exception and the 9th bit can be used for detection of overflow and underflow. -
Exponent Exceptions module 246 receives 9-bit exponent Es (244) frommodule 210 and the Zero-Exception Signal (236) from Add Mantissas/FLSD module 228, determines whether an exception has occurred, and outputs 1) the appropriate Es value (248), 2) the appropriate Overflow-Exception Signal (250), and 3) the appropriate Underflow-Exception Signal (252). -
FIG. 4 is a block diagram of prior-artExponent Exceptions module 246 ofFIG. 2 . Received 9-bit exponent Es (244 ofFIG. 2 ) is sent tologic blocks multiplexor 406. If an overflow exception has occurred, thenlogic block 402 sets Overflow-Exception Signal (250 ofFIG. 2 ) to 1; otherwise,module 402 sets Overflow-Exception Signal (250 ofFIG. 2 ) to 0. Overflow-Exception Signal (250 ofFIG. 2 ) goes to 1) Select Ms orException Value module 256 ofFIG. 2 , 2) NORgate 408, and 3) select input S2 onmultiplexor 406. - Similarly, if an underflow exception has occurred, then
logic block 404 sets the Underflow-Exception Signal (252 ofFIG. 2 ) to 1; otherwise,module 404 sets the Underflow-Exception Signal (252 ofFIG. 2 ) to 0. The Underflow-Exception Signal (252 ofFIG. 2 ) goes to 1) Select Ms orException Value module 256 ofFIG. 2 , 2) NORgate 408, and 3) select input S3 onmultiplexor 406. -
Module 246 also receives the Zero-Exception Signal (236) frommodule 228 ofFIG. 2 , which is sent to 1) NORgate 408 and 2) select input S1 onmultiplexor 406. - NOR
gate 408 receives three inputs: Overflow-Exception Signal (250 ofFIG. 2 ), Underflow-Exception Signal (252 ofFIG. 2 ), and Zero-Exception Signal (228 ofFIG. 2 ). If all three of these inputs are 0 (i.e., there are no exceptions), then NORgate 408 will output a 1 to select input S0 onmultiplexor 406; otherwise, it will output a 0. -
Multiplexor 406 receives four data inputs (D0, D1, D2, D3) and four select inputs (S0, S1, S2, S3). If the Underflow-Exception Signal (252 ofFIG. 2 ) on select input S3 is 1, then multiplexor 406 will output the 8-bit Underflow-Exception Exponent Value (410) on data input D3, e.g., an Es (248 ofFIG. 2 ) consisting of all zeroes. If the Overflow-Exception Signal (250 ofFIG. 2 ) on select input S2 is 1, then multiplexor 406 will output the 8-bit Overflow-Exception Exponent Value (412) on data input D2, e.g., an Es (248) consisting of all ones. If the Zero-Exception Signal (236 ofFIG. 2 ) on select input S1 is 1, then multiplexor 406 will output the 8-bit Zero-Exception Exponent Value (414) on data input D1, e.g., an Es (248 ofFIG. 2 ) consisting of all zeroes. Otherwise, if none of the three exception signals is 1, then NORgate 406 will output a 1 to select input S0, causingmultiplexor 406 to output the first eight bits of the Es (244 ofFIG. 2 ) received from AddExponent Adjustment module 210 ofFIG. 2 (data input D0). - Returning to
FIG. 2 , Normalize MS,u module 234 normalizes un-normalized mantissa sum Ms,u (232), i.e., if Ms,u is not already in normal form, the binary point of Ms,u is shifted to the left or right by the number of placed indicated by the LSD Pointer (238).Module 234 then sends resulting normalized mantissa sum Ms (254) to Select Ms orException Value module 256. - Select Ms or
Exception Value module 256 receives normalized mantissa sum Ms (254) from Normalize MS,u module 234, and the Overflow-Exception Signal (250 ofFIG. 2 ) and the Underflow-Exception Signal (252 ofFIG. 2 ) fromExponent Exceptions module 246. If the Overflow-Exception Signal (250 ofFIG. 2 ) is 1, thenmodule 256 will output the mantissa value assigned to overflow exceptions (e.g., 00000000) as thefinal value 258 for Ms. Likewise, if the Underflow-Exception Signal (252 ofFIG. 2 ) is 1, thenmodule 256 will output the mantissa value assigned to underflow exceptions (e.g., 00000000) as thefinal value 258 for Ms. If neither of those two exception signals is 1, thenmodule 256 outputs normalized mantissa sum Ms (254) received frommodule 234 as thefinal value 258 for Ms. - Floating-Point Addition Acceleration
- In the prior-art one-step FLSD method illustrated in
FIG. 2 , mantissa addition (e.g.,module 228 ofFIG. 2 ) must be completed before any normalization adjustment of Elargest (208) (e.g.,modules FIG. 2 ) can begin. Further, the operations of encoding (e.g.,module 240 ofFIG. 2 ) the LSD Pointer (238) into the Exponent Shift Value (242), and then adding (e.g.,module 210 ofFIG. 2 ) the Exponent Shift Value (242) to Elargest (208) to yield Es (244), take significant amounts of time. - In a floating-point system with Nm mantissa bits and Ne applicable exceptions, an addition of two numbers can result in only (Nm+1)+Ne possible values of Es: Elargest+1, Elargest, Elargest−1, . . . Elargest−(Nm−1), plus Ne exceptions (e.g., underflow, overflow, zero). Thus, in the addition of two numbers in IEEE 754 32-bit single-precision format, there are only (24+1+3) or 28 possible values for Es, a much smaller number than the roughly 256 possible exponent values.
- Thus, it is possible to compute all possible values of Es in parallel with each other and independent of mantissa addition. Those computed values then become data inputs to a selection device, such as a multiplexor, tri-state driver, parallel switches, etc. Then, instead of encoding the LSD Pointer into the Exponent Shift Value and adding that value to Elargest, the LSD Pointer itself becomes a control input of the selection device. The operations of encoding and addition/subtraction are replaced with the less-time-consuming operations of multiplexing and selecting. As such, a 10-15% reduction in computation time over the prior-art method can be realized.
-
FIG. 5 is a block diagram ofcircuitry 500 designed to implement an FLSD method according to one embodiment of the present invention.Modules modules FIG. 2 .Signals signals FIG. 2 .Modules FIG. 2 have been removed and replaced with a new Exponent andException Lookup module 560.Modules - Add Mantissas/
FLSD module 528 now sends the LSD Pointer (538) toExponent Exceptions module 546. Exponent Comparemodule 502 now sends Elargest (508) to new Exponent andException Lookup module 560. Exponent andException Lookup module 560 sends itsoutput 562 toExponent Exceptions module 546. -
FIG. 6 is a block diagram of Exponent andException Lookup module 560 ofFIG. 5 andExponent Exceptions module 546 ofFIG. 5 .Modules FIG. 5 ) from Exponent Comparemodule 502 ofFIG. 2 , 2) the LSD Pointer (538 ofFIG. 5 ), and 3) the Zero-Exception Signal (536 ofFIG. 5 ) from Add Mantissas/FLSD module 528 ofFIG. 5 .Modules FIG. 6 generate all Nm+1 possible values of exponent Es and then, based on the Zero-Exception Signal (536 ofFIG. 5 ) and the LSD Pointer (538 ofFIG. 5 ), output 1) the Es value (548 ofFIG. 5 ) selected by the LSD Pointer (538 ofFIG. 5 ), 2) the Overflow-Exception Signal (550 ofFIG. 5 ), and 3) the Underflow-Exception Signal (552 ofFIG. 5 ). - Exponent and
Exception Lookup module 560 ofFIG. 5 receives an 8-bit Elargest (502 ofFIG. 5 ) from Exponent Comparemodule 502 ofFIG. 5 , and sends that received Elargest value to a number ofExponent Precomputation modules 602 simultaneously. The number ofmodules 602 will equal the number of mantissa digits of the numbering system, plus one (i.e., Nm+1). InFIG. 6 , the numbering system is IEEE 754 single-precision format. Thus, the number of mantissa digits is 24, and therefore there are 24+1 or 25modules 602. Associated with each 602 module is a constant ranging from −1 to +23. -
FIG. 7 is a block diagram of eachExponent Precomputation module 602 ofFIG. 6 .Module 602 ofFIG. 6 receives 8-bit input Elargest (508 ofFIG. 5 ), adds aconstant value 704 to Elargest (508 ofFIG. 5 ), determines whether an underflow or overflow exception occurred as a result of that addition, and outputs a 10-bit Outi (562 ofFIG. 5 ). The default value ofbits -
Adder 702 adds its associatedconstant value 704 to Elargest (508 ofFIG. 5 ) to yield a 9-bit sum Ei (714). The 9th bit of Ei (714) is to accommodate potential overflow/underflow situations. Ei (714) is then sent tologic blocks - If an overflow exception has occurred, then
logic block 706 sets bit 8 of Outi (562 ofFIG. 5 ) to 1, andlogic block 716 setsbits 0 through 7 of Outi (562 ofFIG. 5 ) to a specified maximum exponent value. If an underflow exception has occurred,logic block 708 sets bit 9 of Outi (562 ofFIG. 5 ) to 1, andlogic block 716 setsbits 0 through 7 of Outi (562 ofFIG. 5 ) to a specified minimum exponent value. If no exception has occurred, thenbits FIG. 5 ) remain 0, andlogic block 716 setsbits 0 through 7 of Outi (562 ofFIG. 5 ) equal tobits 0 through 7 of Ei (714). - Returning to FIG. F,
module 560 ofFIG. 5 sends 25 10-bit Outi (562 ofFIG. 5 ) values to data (bus) inputs D0 through D24 ofmultiplexor 610 inExponent Exceptions module 546 ofFIG. 5 .Module 546 ofFIG. 5 also receives the LSD Pointer (538 ofFIG. 5 ) and the Zero-Exception Signal (536 ofFIG. 5 ) from Add Mantissa/FLSD module 528 ofFIG. 5 . - If the mantissa addition performed in
module 528 ofFIG. 5 results in a 0, then the Zero-Exception Signal (536 ofFIG. 5 ) will be a 1; otherwise, it is a 0. The Zero-Exception Signal (536 ofFIG. 5 ) is sent to select input S25 onmultiplexor 610, and also toinverter 606, the output of which is sent to an input of each of 25 ANDgates 604. The other input to each of 25 ANDgates 604 is one of the 25 bits of the LSD Pointer (538 ofFIG. 5 ). The 25 outputs of ANDgates 604 are sent to select inputs S0 through S24 onmultiplexor 610. - If the Zero-Exception Signal (536 of
FIG. 5 ) is 1 (i.e., mantissa addition resulted in 0), then the LSD Pointer (538 ofFIG. 5 ) will be overwritten with a string of zeroes by ANDgates 604, and thus none of the values D0 through D24 onmultiplexor 610 will be selected. Instead, a value of 1 at select input S25 will causemultiplexor 610 to select the Zero-Exception Value 608 at data input D25, i.e., an 8-bit string of all 0s. - If, instead, the Zero-Exception Signal (536 of
FIG. 5 ) is 0 (i.e., mantissa addition resulted in some number other than 0), theninverter 606 and ANDgates 604 cause the LSD Pointer (538 ofFIG. 5 ) to pass through to select inputs S0 through S24, selecting one of 25 Outi values (562 ofFIG. 5 ).Bits 0 through 7 of the selected Outi are outputted as normalized exponent Es (548 ofFIG. 5 ).Bit 8 of the selected Outi is outputted as the Overflow-Exception Signal (550 ofFIG. 5 ), andbit 9 of the selected Outi is outputted as the Underflow-Exception Signal (552 ofFIG. 5 ). - Although the present invention has been described in the context of the addition of two addends, other embodiments of the present invention can add different numbers N of addends. In such embodiments, all N addends are compared to determine Elargest. Then, N−1 addends (all but the one with the largest exponent) are de-normalized. Then, multiple addends are added/subtracted to yield two addends. For example, multiple mantissas can be added using a carry save adder tree to reduce N addends to two values, analogous to the tree-reduction operations used in parallel multipliers. Then the two addends are processed via the method discussed above. In theory, this can be done with nearly the same speed as a single pair of operands with only some time for the tree reduction added to the critical timing path
- One step in the addition of two floating-point addends is to de-normalize (if necessary) one or more of the addends such that their exponents match. The embodiment of the present invention presented above and in the attached figures accomplishes this step by de-normalizing the addend with the smallest exponent until the exponents match, i.e., by increasing Esmallest by x so that Esmallest is equal to Elargest, and shifting the binary point of Msmallest x places to the left. However, the present invention is not so limited to that one method. Alternative methods include 1) de-normalizing the addend with the largest exponent until the exponents match, and 2) adjusting both addends until their exponents match a third, common value.
- Furthermore, any use of the word “addition” regarding operands should be understood to represent both the operations of addition and subtraction.
- Furthermore, although
module 560 ofFIG. 5 uses Nm+1 adders to compute the Nm+1 possible values of Es, another method would be to use an (Nm+1)-deep lookup table. - In
FIG. 5 ,modules module 560 may be said to be implemented by an exponent and exception generator, andmodule 546 may be said to be implemented by an exponent selector. Within the mantissa generator,modules modules modules module 528 may be said to implement a mantissa adder, an LSD pointer generator, and a zero-exception generator. - While the exemplary embodiments of the present invention specify a multiplexor as the selection device for selecting the proper value of Es, any one of a number of alternate devices could have been specified, e.g., tri-state drivers, parallel switches, etc.
- While the techniques described presume internal use of signed mantissas in 2's complement format, there exist other ways of processing a signed mantissa. For example, one could employ a sign magnitude adder/subtractor, or represent the sign magnitude mantissas in 1's complement format. Another method would be to use sequential logic to compute N+1 values, which may be useful in a heavily pipelined processor.
- While the exemplary embodiments of the present invention have been described with respect to IEEE 754 32-bit floating-point numbering format, other embodiments of the present invention can accommodate other floating-point formats. For example, IEEE 754 64-bit double-precision format has an 11-bit exponent field and a 52-bit fraction field. To accommodate IEEE 754 64-bit double-precision format, an embodiment of the present invention would possess 54
adders 602 inFIG. 6 , 54 ANDgates 604 inFIG. 6 , and amultiplexor 610 possessing 55 data inputs and 55 select inputs. - While the exemplary embodiments of the present invention have been described with respect to processes of circuits, including possible implementation as a single integrated circuit, a multi-chip module, a single card, or a multi-card circuit pack, the present invention is not so limited. As would be apparent to one skilled in the art, various functions of circuit elements may also be implemented as processing blocks in a software program. Such software may be employed in, for example, a digital signal processor, micro-controller, or general purpose computer.
- Unless explicitly stated otherwise, each numerical value and range should be interpreted as being approximate as if the word “about” or “approximately” preceded the value of the value or range.
- It will be further understood that various changes in the details, materials, and arrangements of the parts which have been described and illustrated in order to explain the nature of this invention may be made by those skilled in the art without departing from the scope of the invention as expressed in the following claims.
- The use of figure numbers and/or figure reference labels in the claims is intended to identify one or more possible embodiments of the claimed subject matter in order to facilitate the interpretation of the claims. Such use is not to be construed as necessarily limiting the scope of those claims to the embodiments shown in the corresponding figures.
- It should be understood that the steps of the exemplary methods set forth herein are not necessarily required to be performed in the order described, and the order of the steps of such methods should be understood to be merely exemplary. Likewise, additional steps may be included in such methods, and certain steps may be omitted or combined, in methods consistent with various embodiments of the present invention.
- Although the elements in the following method claims, if any, are recited in a particular sequence with corresponding labeling, unless the claim recitations otherwise imply a particular sequence for implementing some or all of those elements, those elements are not necessarily intended to be limited to being implemented in that particular sequence.
- Reference herein to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments necessarily mutually exclusive of other embodiments. The same applies to the term “implementation.”
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/487,307 US20120239719A1 (en) | 2008-07-28 | 2012-06-04 | Floating-Point Addition Acceleration |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/180,759 US8214416B2 (en) | 2008-07-28 | 2008-07-28 | Floating-point addition acceleration |
US13/487,307 US20120239719A1 (en) | 2008-07-28 | 2012-06-04 | Floating-Point Addition Acceleration |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/180,759 Continuation US8214416B2 (en) | 2008-07-28 | 2008-07-28 | Floating-point addition acceleration |
Publications (1)
Publication Number | Publication Date |
---|---|
US20120239719A1 true US20120239719A1 (en) | 2012-09-20 |
Family
ID=41569586
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/180,759 Expired - Fee Related US8214416B2 (en) | 2008-07-28 | 2008-07-28 | Floating-point addition acceleration |
US13/487,307 Abandoned US20120239719A1 (en) | 2008-07-28 | 2012-06-04 | Floating-Point Addition Acceleration |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/180,759 Expired - Fee Related US8214416B2 (en) | 2008-07-28 | 2008-07-28 | Floating-point addition acceleration |
Country Status (1)
Country | Link |
---|---|
US (2) | US8214416B2 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9785405B2 (en) | 2015-05-29 | 2017-10-10 | Huawei Technologies Co., Ltd. | Increment/decrement apparatus and method |
US9836278B2 (en) | 2015-05-29 | 2017-12-05 | Huawei Technologies Co., Ltd. | Floating point computation apparatus and method |
US20230176817A1 (en) * | 2021-11-18 | 2023-06-08 | Imagination Technologies Limited | Floating Point Adder |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8214416B2 (en) * | 2008-07-28 | 2012-07-03 | Agere Systems Inc. | Floating-point addition acceleration |
WO2011137209A1 (en) | 2010-04-30 | 2011-11-03 | Cornell University | Operand-optimized asynchronous floating-point units and methods of use thereof |
US9047119B2 (en) * | 2010-07-01 | 2015-06-02 | Telefonaktiebolaget L M Ericsson (Publ) | Circular floating-point number generator and a circular floating-point number adder |
US8825727B2 (en) | 2012-03-15 | 2014-09-02 | International Business Machines Corporation | Software-hardware adder |
US9830129B2 (en) | 2013-11-21 | 2017-11-28 | Samsung Electronics Co., Ltd. | High performance floating-point adder with full in-line denormal/subnormal support |
US10019227B2 (en) | 2014-11-19 | 2018-07-10 | International Business Machines Corporation | Accuracy-conserving floating-point value aggregation |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8214416B2 (en) * | 2008-07-28 | 2012-07-03 | Agere Systems Inc. | Floating-point addition acceleration |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4758974A (en) | 1985-01-29 | 1988-07-19 | American Telephone And Telegraph Company, At&T Bell Laboratories | Most significant digit location |
US5010508A (en) * | 1989-02-14 | 1991-04-23 | Intel Corporation | Prenormalization for a floating-point adder |
US5373461A (en) * | 1993-01-04 | 1994-12-13 | Motorola, Inc. | Data processor a method and apparatus for performing postnormalization in a floating-point execution unit |
US5684729A (en) * | 1994-09-19 | 1997-11-04 | Hitachi, Ltd. | Floating-point addition/substraction processing apparatus and method thereof |
US5923575A (en) * | 1997-08-15 | 1999-07-13 | Motorola, Inc. | Method for eletronically representing a number, adder circuit and computer system |
US6094668A (en) * | 1997-10-23 | 2000-07-25 | Advanced Micro Devices, Inc. | Floating point arithmetic unit including an efficient close data path |
-
2008
- 2008-07-28 US US12/180,759 patent/US8214416B2/en not_active Expired - Fee Related
-
2012
- 2012-06-04 US US13/487,307 patent/US20120239719A1/en not_active Abandoned
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8214416B2 (en) * | 2008-07-28 | 2012-07-03 | Agere Systems Inc. | Floating-point addition acceleration |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9785405B2 (en) | 2015-05-29 | 2017-10-10 | Huawei Technologies Co., Ltd. | Increment/decrement apparatus and method |
US9836278B2 (en) | 2015-05-29 | 2017-12-05 | Huawei Technologies Co., Ltd. | Floating point computation apparatus and method |
US20230176817A1 (en) * | 2021-11-18 | 2023-06-08 | Imagination Technologies Limited | Floating Point Adder |
US11829728B2 (en) * | 2021-11-18 | 2023-11-28 | Imagination Technologies Limited | Floating point adder |
US12314682B2 (en) | 2021-11-18 | 2025-05-27 | Imagination Technologies Limited | Floating point adder |
Also Published As
Publication number | Publication date |
---|---|
US20100023574A1 (en) | 2010-01-28 |
US8214416B2 (en) | 2012-07-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20120239719A1 (en) | Floating-Point Addition Acceleration | |
US9639326B2 (en) | Floating-point adder circuitry | |
US9582248B2 (en) | Standalone floating-point conversion unit | |
US4758972A (en) | Precision rounding in a floating point arithmetic unit | |
EP0483864B1 (en) | Hardware arrangement for floating-point addition and subtraction | |
US10019231B2 (en) | Apparatus and method for fixed point to floating point conversion and negative power of two detector | |
US5993051A (en) | Combined leading one and leading zero anticipator | |
US12417078B2 (en) | Floating point accumulater with a single layer of shifters in the significand feedback | |
US7290023B2 (en) | High performance implementation of exponent adjustment in a floating point design | |
CN112783471A (en) | Device and method for calculating sine, cosine and arc tangent functions based on CORDIC algorithm | |
US20070038693A1 (en) | Method and Processor for Performing a Floating-Point Instruction Within a Processor | |
US11429349B1 (en) | Floating point multiply-add, accumulate unit with carry-save accumulator | |
WO2020242526A1 (en) | Multi-input floating-point adder | |
CN102495714A (en) | Apparatus and method for performing floating point subtraction and apparatus and method for predicting symbolic digit | |
US11275559B2 (en) | Circular accumulator for floating point addition | |
US6990505B2 (en) | Method/apparatus for conversion of higher order bits of 64-bit integer to floating point using 53-bit adder hardware | |
US11366638B1 (en) | Floating point multiply-add, accumulate unit with combined alignment circuits | |
US10275218B1 (en) | Apparatus and method for subtracting significand values of floating-point operands | |
US20060136536A1 (en) | Data processing apparatus and method for converting a fixed point number to a floating point number | |
GB2549153A (en) | Apparatus and method for supporting a conversion instruction | |
CN113377334B (en) | Floating point data processing method and device and storage medium | |
WO2022204069A1 (en) | Floating point multiply-add, accumulate unit with carry-save accumulator | |
KR100929423B1 (en) | Floating-Point Complex Arithmetic Unit | |
Nguyen et al. | A combined IEEE half and single precision floating point multipliers for deep learning | |
JP3257278B2 (en) | Normalizer using redundant shift number prediction and shift error correction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: AGERE SYSTEMS INC., PENNSYLVANIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:RIGGE, LAWRENCE A.;REEL/FRAME:028308/0855 Effective date: 20080725 |
|
AS | Assignment |
Owner name: DEUTSCHE BANK AG NEW YORK BRANCH, AS COLLATERAL AG Free format text: PATENT SECURITY AGREEMENT;ASSIGNORS:LSI CORPORATION;AGERE SYSTEMS LLC;REEL/FRAME:032856/0031 Effective date: 20140506 |
|
AS | Assignment |
Owner name: AVAGO TECHNOLOGIES GENERAL IP (SINGAPORE) PTE. LTD Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:AGERE SYSTEMS LLC;REEL/FRAME:035365/0634 Effective date: 20140804 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |
|
AS | Assignment |
Owner name: AGERE SYSTEMS LLC, PENNSYLVANIA Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENT RIGHTS (RELEASES RF 032856-0031);ASSIGNOR:DEUTSCHE BANK AG NEW YORK BRANCH, AS COLLATERAL AGENT;REEL/FRAME:037684/0039 Effective date: 20160201 Owner name: LSI CORPORATION, CALIFORNIA Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENT RIGHTS (RELEASES RF 032856-0031);ASSIGNOR:DEUTSCHE BANK AG NEW YORK BRANCH, AS COLLATERAL AGENT;REEL/FRAME:037684/0039 Effective date: 20160201 |