A Leading Bit Anticipator for Floating Point Multiplication
Field of Invention
The present invention relates to floating point multiplier and add units in microprocessors, and more particularly, to floating point multiplier and add units with leading bit anticipators.
Background
In many floating point multiplier units in which two floating point numbers are to be multiplied, the mantissas are multiplied together and normalized, where the normalization involves shifting until the final mantissa has a leading non-zero bit. To speed up the floating point multiplication, it is useful to predict the amount of shifting necessary, so that a shift circuit can be configured while the product is still being computed. Add units also can benefit from predicting the leading non-zero bit of the sum. Some execution units can perform both multiplication and addition.
It is desirable for leading non-zero bit prediction to be fast and simple to implement so that the shift circuit can be quickly configured.
Brief Description of the Drawings
Fig. 1 is a high-level diagram of a microprocessor with a floating point multiply unit.
Fig. 2 is a high-level diagram of a portion of a floating point multiply unit with a leading non-zero bit anticipator.
Fig. 3 is an embodiment of a leading non-zero bit anticipator. Detailed Description of Embodiments
Fig. 1 is a high-level diagram of microprocessor 100 with floating point multiply functional unit 110. Registers 120 and 140 hold two floating point numbers a and b to be multiplied together, where A and B denote their mantissas, respectively, in registers 150 and 160. The product ? = ab may be computed by obtaining the product of the
mantissas P = AB in register 130, setting register 130 to shift P so that its leading bit is 1 (i.e., normalization), and properly computing the exponent of p based upon the exponents of a and b as well as the number of bit shifts applied to P.
In Fig. 2, functional unit 200 is a high-level diagram of a portion of floating point multiply functional unit 110. In the particular embodiment of Fig. 1, the product P of the mantissas A and B is obtained by first obtaining carry terms C and sum terms S by carry-save adder (CSA) 210, where C and S are binary tuples and where P is related to the carry and sum terms by P = C + S. This sum is performed by full adder functional unit 220. In the particular embodiment of Fig. 2, it is seen that the carry and sum terms are 128 bits wide, so that the product P obtained from adding the carry and sum terms is also 128 bits wide. Other embodiments will have different word sizes.
To speed up the multiplication of floating point numbers, it is desirable to set up shift register functional unit 130 to properly shift the output of full adder 220 before P is finally computed. In this way, shift register 130 will be ready to shift P when it is available from full adder 220. It is therefore desirable to anticipate, or predict, the position of the leading non-zero bit of P based only upon the carry and sum terms. This prediction function is performed by leading bit anticipator (LZA) 240. As described below, LZA 240 does not always predict exactly the position of the leading non-zero bit. However, at most it will mispredict by one position. The final result of shift register 130 after shifting, denoted by P', will therefore be, to within one bit shift, the desired mantissa of the product p. Depending upon P, a final bit shift of P' may be required, but this is not time consuming. This final bit shift is not shown in Fig. 2.
Fig. 3 provides a high-level diagram of an embodiment of LZA 240. The Boolean binary operation OR is applied to each pair of bits of C and S. Functional unit 310 represents an array of OR gates, each OR gate applying an OR binary operation to a pair of bits from C and S. We denote this operation on C and S by C OR S, where C OR S is a binary tuple (word) having the same length as C and S and with iih component
given by [C OR 5], = [C], OR [S],. The predicted leading non-zero bit position of the product P is the position of the leading non-zero bit of C OR S.
Priority encoder 320 asserts one of its output lines 330 corresponding to the leading non-zero bit position of C OR S. Output lines 330 provide a binary tuple indicative of the leading non-zero bit position of C OR S, and are coupled to shift register 130 to provide this prediction information so that shift register 130 can be set up to shift P before it is computed by full adder 220.
As an example, let S = (0001000101) and C = (000101101). The sum of S and C is 0001110010. Applying the Boolean binary operation OR to each pair of bits from C and S yields C OR S = (0001101101). For this example, the leading non-zero bit of C OR S is in the sixth position (where the least significant bit is the zeroth position), and the leading non-zero bit of C + S is predicted correctly. As an example of a misprediction, let S = (0001100101) and C = (0000101101). For this example, C + S = (0010010010) and C OR S = (0001101101), and it is seen that the leading non-zero bit of C OR S mispredicts the leading non-zero bit of C + S by one position.
In some situations, such as denormalized numbers, the product P discussed earlier is not shifted to have a leading non-zero bit. However, the embodiments disclosed herein have utility for denormalized numbers because the position of the leading non-zero bit is still useful for determining the amount of shifting necessary for denormalized numbers.
The embodiments disclosed herein were in reference to a floating point multiplication unit. However, it is to be appreciated that the invention claimed below is not limited to only multiplication units. Embodiments of the present invention are also applicable to addition units, and other kinds of execution units employing combinations of multiplication and addition. Consequently, the term floating point unit encompasses a floating point multiplication unit, a floating point addition unit, or a combination thereof.
Various modifications may be made to the embodiment described above without departing from the scope of the invention as claimed below. For example, it is immaterial whether priority encoder 320 is included within LZA 240 or not. As another example, OR gates 310 may be replaced with NOR gates, in which case priority encoder 320 is modified to provide signals on output 330 indicative of the position of the first zero bit of C OR S.