WO1999060475A1 - A leading bit anticipator for floating point multiplication - Google Patents

A leading bit anticipator for floating point multiplication Download PDF

Info

Publication number
WO1999060475A1
WO1999060475A1 PCT/US1999/008050 US9908050W WO9960475A1 WO 1999060475 A1 WO1999060475 A1 WO 1999060475A1 US 9908050 W US9908050 W US 9908050W WO 9960475 A1 WO9960475 A1 WO 9960475A1
Authority
WO
WIPO (PCT)
Prior art keywords
binary
floating point
zero bit
tuple
leading
Prior art date
Application number
PCT/US1999/008050
Other languages
French (fr)
Inventor
Narsing K. Vijayrao
Sudarshan Kumar
Original Assignee
Intel Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corporation filed Critical Intel Corporation
Priority to AU34926/99A priority Critical patent/AU3492699A/en
Publication of WO1999060475A1 publication Critical patent/WO1999060475A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/74Selecting or encoding within a word the position of one or more bits having a specified value, e.g. most or least significant one or zero detection, priority encoders
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F5/00Methods or arrangements for data conversion without changing the order or content of the data handled
    • G06F5/01Methods or arrangements for data conversion without changing the order or content of the data handled for shifting, e.g. justifying, scaling, normalising
    • G06F5/012Methods or arrangements for data conversion without changing the order or content of the data handled for shifting, e.g. justifying, scaling, normalising in floating-point computations

Definitions

  • the present invention relates to floating point multiplier and add units in microprocessors, and more particularly, to floating point multiplier and add units with leading bit anticipators.
  • Fig. 1 is a high-level diagram of a microprocessor with a floating point multiply unit.
  • Fig. 2 is a high-level diagram of a portion of a floating point multiply unit with a leading non-zero bit anticipator.
  • Fig. 3 is an embodiment of a leading non-zero bit anticipator. Detailed Description of Embodiments
  • Fig. 1 is a high-level diagram of microprocessor 100 with floating point multiply functional unit 110.
  • Registers 120 and 140 hold two floating point numbers a and b to be multiplied together, where A and B denote their mantissas, respectively, in registers 150 and 160.
  • functional unit 200 is a high-level diagram of a portion of floating point multiply functional unit 110.
  • CSA carry-save adder
  • This sum is performed by full adder functional unit 220.
  • the carry and sum terms are 128 bits wide, so that the product P obtained from adding the carry and sum terms is also 128 bits wide. Other embodiments will have different word sizes.
  • shift register functional unit 130 To speed up the multiplication of floating point numbers, it is desirable to set up shift register functional unit 130 to properly shift the output of full adder 220 before P is finally computed. In this way, shift register 130 will be ready to shift P when it is available from full adder 220. It is therefore desirable to anticipate, or predict, the position of the leading non-zero bit of P based only upon the carry and sum terms.
  • This prediction function is performed by leading bit anticipator (LZA) 240. As described below, LZA 240 does not always predict exactly the position of the leading non-zero bit. However, at most it will mispredict by one position.
  • the final result of shift register 130 after shifting, denoted by P' will therefore be, to within one bit shift, the desired mantissa of the product p. Depending upon P, a final bit shift of P' may be required, but this is not time consuming. This final bit shift is not shown in Fig. 2.
  • Fig. 3 provides a high-level diagram of an embodiment of LZA 240.
  • the Boolean binary operation OR is applied to each pair of bits of C and S.
  • Functional unit 310 represents an array of OR gates, each OR gate applying an OR binary operation to a pair of bits from C and S.
  • the predicted leading non-zero bit position of the product P is the position of the leading non-zero bit of C OR S.
  • Priority encoder 320 asserts one of its output lines 330 corresponding to the leading non-zero bit position of C OR S.
  • Output lines 330 provide a binary tuple indicative of the leading non-zero bit position of C OR S, and are coupled to shift register 130 to provide this prediction information so that shift register 130 can be set up to shift P before it is computed by full adder 220.
  • the product P discussed earlier is not shifted to have a leading non-zero bit.
  • the embodiments disclosed herein have utility for denormalized numbers because the position of the leading non-zero bit is still useful for determining the amount of shifting necessary for denormalized numbers.

Abstract

A floating point multiplier unit (200) with a leading bit anticipator (240) for predicting the leading non-zero bit of the sum of carry and sum terms, the leading bit anticipator comprising an array of logic gates to provide a binary tuple indicative of the logical OR of the carry and sum terms.

Description

A Leading Bit Anticipator for Floating Point Multiplication
Field of Invention
The present invention relates to floating point multiplier and add units in microprocessors, and more particularly, to floating point multiplier and add units with leading bit anticipators.
Background
In many floating point multiplier units in which two floating point numbers are to be multiplied, the mantissas are multiplied together and normalized, where the normalization involves shifting until the final mantissa has a leading non-zero bit. To speed up the floating point multiplication, it is useful to predict the amount of shifting necessary, so that a shift circuit can be configured while the product is still being computed. Add units also can benefit from predicting the leading non-zero bit of the sum. Some execution units can perform both multiplication and addition.
It is desirable for leading non-zero bit prediction to be fast and simple to implement so that the shift circuit can be quickly configured.
Brief Description of the Drawings
Fig. 1 is a high-level diagram of a microprocessor with a floating point multiply unit.
Fig. 2 is a high-level diagram of a portion of a floating point multiply unit with a leading non-zero bit anticipator.
Fig. 3 is an embodiment of a leading non-zero bit anticipator. Detailed Description of Embodiments
Fig. 1 is a high-level diagram of microprocessor 100 with floating point multiply functional unit 110. Registers 120 and 140 hold two floating point numbers a and b to be multiplied together, where A and B denote their mantissas, respectively, in registers 150 and 160. The product ? = ab may be computed by obtaining the product of the mantissas P = AB in register 130, setting register 130 to shift P so that its leading bit is 1 (i.e., normalization), and properly computing the exponent of p based upon the exponents of a and b as well as the number of bit shifts applied to P.
In Fig. 2, functional unit 200 is a high-level diagram of a portion of floating point multiply functional unit 110. In the particular embodiment of Fig. 1, the product P of the mantissas A and B is obtained by first obtaining carry terms C and sum terms S by carry-save adder (CSA) 210, where C and S are binary tuples and where P is related to the carry and sum terms by P = C + S. This sum is performed by full adder functional unit 220. In the particular embodiment of Fig. 2, it is seen that the carry and sum terms are 128 bits wide, so that the product P obtained from adding the carry and sum terms is also 128 bits wide. Other embodiments will have different word sizes.
To speed up the multiplication of floating point numbers, it is desirable to set up shift register functional unit 130 to properly shift the output of full adder 220 before P is finally computed. In this way, shift register 130 will be ready to shift P when it is available from full adder 220. It is therefore desirable to anticipate, or predict, the position of the leading non-zero bit of P based only upon the carry and sum terms. This prediction function is performed by leading bit anticipator (LZA) 240. As described below, LZA 240 does not always predict exactly the position of the leading non-zero bit. However, at most it will mispredict by one position. The final result of shift register 130 after shifting, denoted by P', will therefore be, to within one bit shift, the desired mantissa of the product p. Depending upon P, a final bit shift of P' may be required, but this is not time consuming. This final bit shift is not shown in Fig. 2.
Fig. 3 provides a high-level diagram of an embodiment of LZA 240. The Boolean binary operation OR is applied to each pair of bits of C and S. Functional unit 310 represents an array of OR gates, each OR gate applying an OR binary operation to a pair of bits from C and S. We denote this operation on C and S by C OR S, where C OR S is a binary tuple (word) having the same length as C and S and with iih component given by [C OR 5], = [C], OR [S],. The predicted leading non-zero bit position of the product P is the position of the leading non-zero bit of C OR S.
Priority encoder 320 asserts one of its output lines 330 corresponding to the leading non-zero bit position of C OR S. Output lines 330 provide a binary tuple indicative of the leading non-zero bit position of C OR S, and are coupled to shift register 130 to provide this prediction information so that shift register 130 can be set up to shift P before it is computed by full adder 220.
As an example, let S = (0001000101) and C = (000101101). The sum of S and C is 0001110010. Applying the Boolean binary operation OR to each pair of bits from C and S yields C OR S = (0001101101). For this example, the leading non-zero bit of C OR S is in the sixth position (where the least significant bit is the zeroth position), and the leading non-zero bit of C + S is predicted correctly. As an example of a misprediction, let S = (0001100101) and C = (0000101101). For this example, C + S = (0010010010) and C OR S = (0001101101), and it is seen that the leading non-zero bit of C OR S mispredicts the leading non-zero bit of C + S by one position.
In some situations, such as denormalized numbers, the product P discussed earlier is not shifted to have a leading non-zero bit. However, the embodiments disclosed herein have utility for denormalized numbers because the position of the leading non-zero bit is still useful for determining the amount of shifting necessary for denormalized numbers.
The embodiments disclosed herein were in reference to a floating point multiplication unit. However, it is to be appreciated that the invention claimed below is not limited to only multiplication units. Embodiments of the present invention are also applicable to addition units, and other kinds of execution units employing combinations of multiplication and addition. Consequently, the term floating point unit encompasses a floating point multiplication unit, a floating point addition unit, or a combination thereof. Various modifications may be made to the embodiment described above without departing from the scope of the invention as claimed below. For example, it is immaterial whether priority encoder 320 is included within LZA 240 or not. As another example, OR gates 310 may be replaced with NOR gates, in which case priority encoder 320 is modified to provide signals on output 330 indicative of the position of the first zero bit of C OR S.

Claims

What is claimed is:
1. A circuit to predict the position of the leading non-zero bit of C + S, where C and S are binary tuples, the circuit comprising at least one logic gate to provide a binary tuple X, where X is indicative of C OR S.
2. The circuit as set forth in claim 1 , further comprising a priority encoder circuit responsive to to provide a binary tuple indicative of the position of the leading nonzero bit of C OR S.
3. A circuit to predict the position of the leading non-zero bit of C + S, where C and S are binary ^-tuples, the circuit comprising: at least one logic gate to provide an ^-binary tuple X, with zth component given by = ([ t OR [S],), i = 0, 1, ..., n- and a priority encoder circuit responsive to to provide a binary tuple indicative of the position of the leading non-zero bit of
4. A circuit to predict the position of the leading non-zero bit of C + S, where C and S are binary n-tuples, the circuit comprising: at least one logic gate to provide an ^-binary tuple X, with /th component given by [X]i = ([ i NOR [SL), / = 0, 1, ..., ┬╗-l; and a priority encoder circuit responsive to to provide a binary tuple indicative of the position of the leading zero bit ofX.
5. A circuit to predict the position of the leading non-zero bit of C + S, where C and S are binary tuples, the circuit comprising: at least one logic gate to provide a binary tuple X, where X is indicative of C OR S; and a priority encoder circuit responsive to to provide a binary tuple indicative of the position of the leading non-zero bit of C OR S.
6. A floating point unit comprising: a shift unit to shift a binary tuple P; and an anticipator unit responsive to binary tuples C and S to provide the shift unit a shift signal indicative of the position of the leading non-zero bit of C OR S.
7. The floating point unit as set forth in claim 6, wherein the shift unit is responsive to the shift signal so as to shift P by m bits, wherein m is such that C OR S shifted by m bits has a leading one.
8. The floating point unit as set forth in claim 7, wherein the anticipator unit further comprises: at least one logic gate to provide a binary tuple X indicative of C OR S; and a priority encoder responsive to to provide the shift signal.
9. The floating point unit as set forth in claim 6, further comprising an adder to provide P, where P = C + S.
10. The floating point unit as set forth in claim 7, further comprising an adder to provide P, where P ~ C + S.
11. The floating point unit as set forth in claim 8, further comprising an adder to provide P, where P = C + S.P
12. The floating point unit as set forth in claim 9, further comprising a carry-save adder unit to provide the binary tuples C and S, where C represents carry terms and S represents sum terms.
13. The floating point unit as set forth in claim 10, further comprising a carry-save adder unit to provide the binary tuples C and S, where C represents carry terms and S represents sum terms.
14. The floating point unit as set forth in claim 11, further comprising a carry-save adder unit to provide the binary tuples C and S, where C represents carry terms and S represents sum terms.
15. A method to predict the position of the leading non-zero bit of C + S, where C and S are binary π-tuples, the method comprising: performing n Boolean binary operations on each pair of bits {[C] ,[S],}, i = 0,1,...,n to provide an n-tuple Vindicative of C OR S; and providing, based upon X_ a shift signal indicative of the leading non-zero bit position of C OR S, wherein the leading non-zero bit position of C OR S is the predicted position of the leading non-zero bit of C + S.
16. A method for normalization in a floating point unit, the method comprising: providing carry and sum tuples C and S; adding C and S to provide a binary tuple P, where P = C + S; performing Boolean binary operations on the components of C and S to provide a binary tuple X indicative of C OR S; and shifting, based upon X, the binary tuple P.
17. The method as set forth in claim 16, wherein in shifting the binary tuple P, P is shifted m bits, wherein m is such that C OR S shifted by m bits has a leading one.
PCT/US1999/008050 1998-05-19 1999-04-08 A leading bit anticipator for floating point multiplication WO1999060475A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU34926/99A AU3492699A (en) 1998-05-19 1999-04-08 A leading bit anticipator for floating point multiplication

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US8183398A 1998-05-19 1998-05-19
US09/081,833 1998-05-19

Publications (1)

Publication Number Publication Date
WO1999060475A1 true WO1999060475A1 (en) 1999-11-25

Family

ID=22166683

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1999/008050 WO1999060475A1 (en) 1998-05-19 1999-04-08 A leading bit anticipator for floating point multiplication

Country Status (2)

Country Link
AU (1) AU3492699A (en)
WO (1) WO1999060475A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5493520A (en) * 1994-04-15 1996-02-20 International Business Machines Corporation Two state leading zero/one anticipator (LZA)
US5530663A (en) * 1994-11-14 1996-06-25 International Business Machines Corporation Floating point unit for calculating a compound instruction A+B×C in two cycles
US5771183A (en) * 1996-06-28 1998-06-23 Intel Corporation Apparatus and method for computation of sticky bit in a multi-stage shifter used for floating point arithmetic
US5790444A (en) * 1996-10-08 1998-08-04 International Business Machines Corporation Fast alignment unit for multiply-add floating point unit
US5889690A (en) * 1994-11-17 1999-03-30 Hitachi, Ltd. Multiply-add unit and data processing apparatus using it

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5493520A (en) * 1994-04-15 1996-02-20 International Business Machines Corporation Two state leading zero/one anticipator (LZA)
US5530663A (en) * 1994-11-14 1996-06-25 International Business Machines Corporation Floating point unit for calculating a compound instruction A+B×C in two cycles
US5889690A (en) * 1994-11-17 1999-03-30 Hitachi, Ltd. Multiply-add unit and data processing apparatus using it
US5771183A (en) * 1996-06-28 1998-06-23 Intel Corporation Apparatus and method for computation of sticky bit in a multi-stage shifter used for floating point arithmetic
US5790444A (en) * 1996-10-08 1998-08-04 International Business Machines Corporation Fast alignment unit for multiply-add floating point unit

Also Published As

Publication number Publication date
AU3492699A (en) 1999-12-06

Similar Documents

Publication Publication Date Title
CN101438233B (en) Mode-based multiply-add processor for denormal operands
CN101263467B (en) Floating point normalization and denormalization
US5917741A (en) Method and apparatus for performing floating-point rounding operations for multiple precisions using incrementers
JP4953644B2 (en) System and method for a floating point unit providing feedback prior to normalization and rounding
EP0359809B1 (en) Apparatus and method for floating point normalization prediction
US5633819A (en) Inexact leading-one/leading-zero prediction integrated with a floating-point adder
US5384723A (en) Method and apparatus for floating point normalization
EP2409219B1 (en) Mechanism for fast detection of overshift in a floating point unit
US20050223055A1 (en) Method and apparatus to correct leading one prediction
US7437400B2 (en) Data processing apparatus and method for performing floating point addition
US7290023B2 (en) High performance implementation of exponent adjustment in a floating point design
US6571266B1 (en) Method for acquiring FMAC rounding parameters
Oberman et al. A variable latency pipelined floating-point adder
US8244783B2 (en) Normalizer shift prediction for log estimate instructions
WO1999060475A1 (en) A leading bit anticipator for floating point multiplication
US20030140074A1 (en) Leading Zero Anticipatory (LZA) algorithm and logic for high speed arithmetic units
US6615228B1 (en) Selection based rounding system and method for floating point operations
Oberman et al. Reducing the mean latency of floating-point addition
US5583805A (en) Floating-point processor having post-writeback spill stage
Seidel How to half the latency of IEEE compliant floating-point multiplication
US20230015430A1 (en) Floating-point accumulator
He et al. Multiply-add fused float point unit with on-fly denormalized number processing
EP0837390A1 (en) Improvements in or relating to microprocessor integrated circuits
US5805487A (en) Method and system for fast determination of sticky and guard bits
Pineiro et al. High-radix iterative algorithm for powering computation

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SL SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
NENP Non-entry into the national phase

Ref country code: KR

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase