US6055554A - Floating point binary quad word format multiply instruction unit - Google Patents

Floating point binary quad word format multiply instruction unit Download PDF

Info

Publication number
US6055554A
US6055554A US09/034,718 US3471898A US6055554A US 6055554 A US6055554 A US 6055554A US 3471898 A US3471898 A US 3471898A US 6055554 A US6055554 A US 6055554A
Authority
US
United States
Prior art keywords
block stage
significand
bit
calculation
stage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
US09/034,718
Inventor
Eric Mark Schwarz
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US09/034,718 priority Critical patent/US6055554A/en
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION reassignment INTERNATIONAL BUSINESS MACHINES CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SCHWARZ, ERIC MARK
Priority to KR1019990004766A priority patent/KR100317767B1/en
Priority to JP04635999A priority patent/JP3163072B2/en
Application granted granted Critical
Publication of US6055554A publication Critical patent/US6055554A/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/483Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
    • G06F7/487Multiplying; Dividing
    • G06F7/4876Multiplying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only
    • G06F7/53Multiplying only in parallel-parallel fashion, i.e. both operands being entered in parallel
    • G06F7/5324Multiplying only in parallel-parallel fashion, i.e. both operands being entered in parallel partitioned, i.e. using repetitively a smaller parallel parallel multiplier or using an array of such smaller multipliers

Abstract

An IEEE 754 standard floating point multiply instruction for binary extended precision format can be executed with a quad word format on an S/390 process. The multiplication calculation multiplies each partition by each other. In the multiplication calculation process dataflow process of either operand is a denormalized number, they are normalized at a stage which creates an expanded exponent range of one more bit, and the calculation continues to a parallel path multiplexor stage, but if neither operand is denormalized then the exponent of the number is expended and the calculation splits into four parallel paths, wherein two operand's sign bits are processed in a sign calculation block stage, the operands' two 16 bit binary exponents are processed by an exponent conversion block stage, and a partition multiplicand significand block stage receives a 113 bit multiplicand significand input for a fourth path. In this calculation third and fourth paths converge with a calculation which provides partial products and intermediate sums and finally a final product as a calculation block stage output, and this output and the exponent from said second path and the sign bit from said first path merge to provide a product which is represented in hexadecimal internal format and is converted back to binary format in calculation block stage and rounded.

Description

FIELD OF THE INVENTION
This invention is related to computers and computer systems and the floating point unit thereof and in particular to the execution of the floating point operation of multiply according to the binary quad word format.
CROSS REFERENCE TO RELATED APPLICATION
The present application related to the following co-pending patent application filed by the inventor concurrently herewith.
`Partitioning of binary quad word format multiply instruction on S/390 processor` filed Mar. 5, 1998, and accorded U.S. Ser. No. 09/033,626, now U.S. Pat. No. 6,021,422.
This co-pending application and the present application are owned by one and the same assignee, International Business Machines Corporation of Armonk, N.Y.
BACKGROUND OF THE INVENTION
The "IEEE Standard for Binary Floating-Point Arithmetic" (IEEE 754 Standard) defines a format called double extended which specifies a representation with a one bit sign, 15 or more bits of exponent, and 64 or more bits of significand. There has been discussions among the standards committee of adopting a quad word format which is more restrictive but is a subset of the double extended format. Several manufacturers including Hewlett-Packard's PA-RISC 1.1 Architecture and IBM S/390 have adopted a specific quad word format which contains one bit of sign information, 15 bits of exponent, and 113 bits of significand where the most significant bit is implied. This format requires 128 bits or one quad word to represent an operand.
Most floating point units are not optimized for the quad word format and instead are optimized for the frequently used double word format. The quad word format operands must be partitioned into the hardware implemented format and then multiple smaller operations are performed to accomplish the overall quad word format operation. One floating operation which is typically supported in this fashion is the multiplication operation for quad word format.
Note that the definition of quad word format that is referred to by Hewlett-Packard's PA-RISC 1.1 Architecture is called binary extended format by the S/390 Architecture, but they are identical except for the representation of signaling NaN and quiet NaN which is not important for this discussion. S/390 Architecture also supports a unique hexadecimal based format which has a short (word), long (double word), and extended (quad word) formats. In typical S/390 floating point units hexadecimal double word precision is optimized in hardware, which is similar to other manufacturers optimizing for IEEE 754 standard double word format since both formats require 64 bits to represent. Hexadecimal double format contains one bit of sign information, seven bits of exponent, and a 56 bit significand. S/390 floating point unit dataflows are typically optimized for 56 bit operands. The multiplier, if used for other purposes such as for division operations, may have some additional guard bits. In a performance optimized multiplier implementation one of the operands, the multiplier, is not extended by any guard bits since one additional bit has a dramatic effect on the overall delay of the counter tree. But, the other operand, the multiplicand, has a less substantial effect on the delay of the counter tree and can easily be extended by one or two hex digits. Typical S/390 multipliers have a multiplier which is 56 bits and a multiplicand which is 60 or 64 bits (see for instance "CMOS floating-point unit for the S/390 Parallel Enterprise Server G4," E. M. Schwarz, L. Sigal, T. J. McPherson, IBM Journal of Research and Development, Vol. 41, No. 4/5, July/September 1997, pp.475-488.)
The method of optimizing the binary extended format (quad word format) to a S/390 floating point unit for a system where the 56 bit significand dataflow is slightly less than half the 113 bit significand of the binary extended format has not been found since the adoption of the IEEE standard over a decade ago. In particular, the present invention is concerned with implementing the multiplication operation for binary quad word format on a typical S/390 multiplier.
SUMMARY OF THE INVENTION
Our invention provides a process for performing the binary quad word format multiplication operation on a hardware multiplier which supports 56 and 60 bit operands and a hexadecimal based exponent. This type of hardware multiplier is commonly available on S/390 floating point units. The process provides minimal hardware operations to perform the overall quad word format multiplication and still retains intermediate results in a hexadecimal based format which is an advantage for a hexadecimal based design.
The invention involves a unique partitioning of the 113 bit significand to only require four multiplies and also handle some additional complications of executing binary operands on a hexadecimal dataflow. The multiplier operand's significand is partitioned into three groups of one, 56, and 56 bits each and the multiplicand operand's significand is partitioned into two groups of 56 and 60 bits.
S/390 floating point architecture defines a hexadecimal based format which is described by the following: X=(-1)**S * 0.F * 16**(Xh-64) where X is the value of an operand, S is the one bit sign, F is the significand, and Xh is the characteristic which reduced by a bias of 64 forms the exponent. The binary extended format has a format as described by the following: X=(-1)**S * 1.F * 2**(Xb-16383) where Xb represents the binary characteristic. U.S. Pat. No. 5,687,106 [Schwarz et al] teaches a method for supporting both a hexadecimal floating point format and a binary floating point format using an internal floating point dataflow with hexadecimal floating point format. The internal dataflow suggested has a 14 bit exponent and a bias of 8192 which can be represented by X=(-1)**S * 0.F * 16**(Xi-8192) where Xi represents the internal characteristic. To transform a binary exponent into a hex format is equivalent to shifting the characteristic right two bits and applying a bias adjustment. The least significant two bits of binary exponent which are shifted out represent the values 2**0, 2**1, 2**2, or 2**3. This exponent value in the binary architected format is transformed into a shift of the significand of up to 3 bits in going into the internal hexadecimal based format. In order to represent a 53 bit long binary significand in this hex format, 3 additional bits are necessary for this shifting which gives a total of 56 bits. For extended format, this would require a 116 bit significand.
The partitioning solution involves only applying the 3 bit binary shift to only one operand, the multiplicand. And the 3 bit shift used is the combination of binary shifts for both input operands. The multiplier significand is separated into 3 parts which consist of an explicit 1, 56 bit high part, and a 56 bit low part for a total of 113 bits. And the multiplicand significand is expanded into a 116 bit significand with the combination of binary shifts for both input operands and partitioned into a 56 bit part and a 60 bit part. So the multiplier has 3 parts denoted: 1, YH (56 bits), YL (56 bits) and the multiplicand has 2 parts denoted: XH (56 bits) and XL (60 bits).
Only four multiplications are necessary to compute binary multiply extended with this partitioning: YL * XL, YL * XH, YH * XL, and YH * XH, and the multiplier size is only 56 by 60 bits. The multiplication by 1 does not need any calculation. Another advantage of this algorithm is that the least significant two bits of the binary exponent are maintained in the dataflow by the significand's binary normalization within the hex format which is also how they are maintained for other binary floating point operations.
These and other improvements are set forth in the following detailed description. For a better understanding of the invention with advantages and features, refer to the description and to the drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
FlG. 1 shows schematically an overview of the preferred embodiment and particularly shows a process flow of the multiplication operation for binary extended format which partitions the multiplicand into 56 and 60 bit parts, and the multiplier into 1, 56, and 56 bit parts.
FIG. 2 shows schematically an overview of a first alternative and particularly shows a process flow of the multiplication operation for binary extended format which partitions the multiplicand into 56, 56, and 4 bit parts, and the multiplier into 56, 56, and 4 bit parts.
FIG. 3 shows schematically an overview of a second alternative and particularly shows a process flow of the multiplication operation for binary extended format which partitions the multiplicand into 56 and 60 bit parts, and the multiplier into 56, 56, and 4 bit parts.
FlG. 4 shows schematically an overview of a third alternative and particularly shows a process flow of the multiplication operation for binary extended format which partitions the multiplicand into 1, 56, and 56 bit parts, and the multiplier into 1, 56, and 56 bit parts.
(Note: For convenience of illustration, FIGURES may be separated in parts and as a convention we place the top of the FIGURE as the first sheet, with subsequent sheets proceeding down and across when viewing the FIGURE, in the event that multiple sheets are used.)
DETAILED DESCRIPTION OF THE INVENTION
Turning now to our invention in greater detail, it will be seen from FIG. 1 which illustrates our preferred embodiment in which the process flow of executing a floating point multiplication operation for binary extended precision format is shown for floating point hardware which is optimized for hexadecimal based double word format. The process flow includes subprocesses for detecting if there is a special case operand, if the operands are denormalized, and a sign computation step, an exponent calculation step, partitioning steps for the multiplier and multiplicand significands, and rounding and overflow detecting steps.
The multiplier and multiplicand input operands contain a sign, exponent, and significand portions which are input to the decision block stage 10 which determines if either of the input operands is equal to IEEE 754 special numbers zero, Not-a-Number (NaN), or infinity. If either operand is one of these special numbers the calculation proceeds to the special number handler 11 and no arithmetic calculation is necessary. If neither is one of the three special numbers then the process continues to the decision block stage 12 which determines if either operand is a denormalized number. If either operand is denormalized as defined by the IEEE 754 standard, the process is directed to a prenormalization block stage which normalizes the operands and creates an expanded exponent range of one more bit, and the process continues to multiplexor 15. If neither operand is denormalized as determined by the decision block stage 12 then the exponent expanded in process block stage 14 and the process converges with the prenormalization path in multiplexor 15.
The process is then split into four parallel paths. The two operand's sign bits are processed in the sign calculation block stage 16 which is simply an exclusive-OR of the signs. The two 16 bit binary exponents are processed by a binary exponent conversion to hex block stage 100. The hex exponents of the two operands are output to a hex exponent calculation block stage 104 which adds the exponents and subtracts the bias. The 113 bit multiplicand significand is processed by block stage 101 using the binary shift amount information from the binary exponent conversion block stage 100. The partitioning of the multiplicand is into two parts of 56 bit high part designated XH and a 60 bit low part designated XL. 116 bits are used in the significand to represent the 113 bit multiplicand and the binary shift amount which could be 2**0, 2**-1, 2**-2, or 2**-3. The multiplier operand 113 bit significand is input to a partitioning block stage 102 which separates the operand into an explicit value of 1, a 56 bit high part designated YH, and a 56 bit low part designated YL.
In U.S. Pat. No. 5,687,106 [Schwarz et al] the binary exponent conversion block stage consisted of taking one binary exponent and converting it to a hex exponent and an appropriate shift amount by the following formulas:
______________________________________                                    
1.xxx * 2**(Xb - 16383) => 0.f * 16**(Xi - 8192)                          
Xb least                                                                  
significant             Hex internal                                      
two bits                                                                  
        Xi              Significand                                       
______________________________________                                    
11      SIGNEXT(Xb >> 2) + 2                                              
                        0.000  113 bit significand                        
00      SIGNEXT(Xb >> 2) + 1                                              
                        0.00   113 bit significand                        
01      SIGNEXT(Xb >> 2) + 1                                              
                        0.0    113 bit significand                        
10      SIGNEXT(Xb >> 2) + 1                                              
                        0.     113 bit significand                        
______________________________________                                    
where Xb represents the binary characteristic and Xi the transformed hex internal characteristic, and
where Xb>>2 is Xb shifted with truncation two bits to the right, || represents concatenation, and
SIGNEXT involves setting the most significant bit of Xi with the most significant bit of Xb and filling in any lesser significant bits with this bit complemented.
For extended format, Xb is 15 bits and the shifting truncates the least significant two bits which leaves 13 bits. Xi is 14 bits and the SIGNEXT of the 13 bit intermediate result can be expressed by the following equation:
SIGNEXT(Xb(⊙:14)>>2)=Xb(⊙)||Xb(⊙)'||Xb(1:12)
where ' represents one's complementation.
The binary exponent conversion to hex block stage 100 performs the combination of both binary exponent conversions but with a fixed multiplier significand of the form 1.,YH,YL and a multiplicand which takes on both binary shift amounts in its significand representation. The following formulas show that the multiplier has a fixed format significand that requires a correction term which is then applied in the conversion of the multiplicand, Y.
______________________________________                                    
1.xxx * 2**(Xb - 16383) => 1.f * 16**(Xi - 8192) * Correction             
1.xxx * 2**(Yb - 16383) * Correction => 0.f * 15**(Yi -                   
______________________________________                                    
8192)                                                                     
Multiplier Conversion:                                                    
Xb least                                                                  
significant                Correction                                     
two bits    Xi             Term                                           
______________________________________                                    
00          SIGNEXT(Xb >> 2)                                              
                           2**1                                           
01          SIGNEXT(Xb >> 2)                                              
                           2**2                                           
10          SIGNEXT(Xb >> 2)                                              
                           2**3                                           
11          SIGNEXT(Xb >> 2) + 1                                          
                           2**0                                           
______________________________________                                    
Multiplicand Conversion without Correction:                               
Yb least                                                                  
significant             Hex internal                                      
two bits                                                                  
        Yi              Significand                                       
______________________________________                                    
00      SIGNEXT(Yb >> 2) + 1                                              
                        0.00   113 bit significand                        
01      SIGNEXT(Yb >> 2) + 1                                              
                        0.0    113 bit significand                        
10      SIGNEXT(Yb >> 2) + 1                                              
                        0.     113 bit significand                        
11      SIGNEXT(Yb >> 2) + 2                                              
                        0.000  113 bit significand                        
______________________________________                                    
Multiplicand Conversion with Correction:                                  
Yb least                                                                  
significant                                                               
two bits +              Hex internal                                      
correction                                                                
        Yi              Significand                                       
______________________________________                                    
000     SIGNEXT(Yb >> 2) + 1                                              
                        0.00   113 bit significand                        
001     SIGNEXT(Yb >> 2) + 1                                              
                        0.0    113 bit significand                        
010     SIGNEXT(Yb >> 2) + 1                                              
                        0.     113 bit significand                        
011     SIGNEXT(Yb >> 2) + 2                                              
                        0.000  113 bit significand                        
100     SIGNEXT(Yb >> 2) + 2                                              
                        0.00   113 bit significand                        
101     SIGNEXT(Yb >> 2) + 2                                              
                        0.0    113 bit significand                        
110     SIGNEXT(Yb >> 2) + 2                                              
                        0.     113 bit significand                        
______________________________________                                    
The binary exponent conversion to hex block stage 100 performs the fixed multiplier exponent conversion and passes the correction term to the multiplicand exponent conversion which also produces a binary shift amount signal which is transmitted to the multiplicand significand partitioning block stage 101.
The outputs XH and XL of partitioning block stage 101 and the outputs 1, YH, and YL of partitioning block stage 102 are input to a calculation block stage 103. The calculation process involves performing 4 multiplications and 4 additions. These calculations can be expressed by the following equations which produce partial products P1, P2, P3, and P4, intermediate sums S1, S2, and S3, and the final product denoted by PR.
1) P1=XL * YL
2) P2=XL * YH
3) S1=P2+P1
4) P3=XH * YL
5) S2=S1+P3
6) P4=XH * YH
7) S3=P4+S2
8) PR=S3+X
The product is output from calculation block stage 103 and merged with the hex exponent from the calculation block stage 104 and the sign bit from the calculation block stage 16, in the merge block stage 105. The product which is represented in hexadecimal internal format is converted back to binary format in calculation block stage 106 and rounded in calculation block stage 17. The result is then checked in decision block stage 18 for an overflow or underflow condition which if true is sent to fix-up block stage 19 and if false is determined to be the reported result and the process is completed.
The block stages numbered 10 through 19 are not unique to the preferred embodiment and only block stages 100 to 106 are unique. The proposed process provides a unique partitioning of the significand which reduces the calculation time to the latency of 4 multiplications and 4 additions. The intermediate results are kept in a hexadecimal format which is a benefit to a hexadecimal based floating point unit for ease of debugging hardware problems. Also, there is the benefit that a hexadecimal internal format can be transmitted earlier to the top of the pipeline to resolve a data dependency with the next sequential instruction.
Three alternatives to the preferred embodiment are described in FIGS. 2, 3, and 4. In FIG. 2 a first alternative is shown where block stages 10-19 are the same as in FIG. 1 as well as block stages 104 and 105. The process block stages for the exponent and significands differ. In FIG. 2 the first alternative method receives the two 16 bit exponents in the binary exponent conversion to hex block stage 200. This conversion block stage 200 computes binary shift amounts for the multiplicand and multiplier independently according to the implementation in U.S. Pat. No. 5,687,106 [Schwarz et al], and passes these shift amounts to partitioning block stages 201 and 202. Given that the binary shift amounts are to be represented in both the multiplier and multiplicand both require 116 bits to represent and a partitioning of 56, 56, and 4 bit parts is chosen for both and performed by block stage 201 and 202 respectively. The 3 parts of the multiplicand are designated XH, XL, and XG, and the multiplier parts are designated YH, YL, and YG. This results in the significand calculation block stage requiring 9 multiplications of 3 parts of X by 3 parts of Y, and 8 additions to sum the 9 partial products. This will be much lower performance than the preferred embodiment but this scheme is a straightforward method of partitioning the multiplier and multiplicand given that they are to be represented internal to the pipeline in a hexadecimal format which requires 116 bits per operand.
In FIG. 3 a second alternative is shown where block stages 10-19 are the same as in FIG. 1 as well as block stages 104 and 105. And the binary exponent conversion to hex block stage 200 of FIG. 3 is the same as in FIG. 2 for the first alternative method. The process block stages for the significands differ and second alternative method is an optimization of the first alternative method given that the multiplier hardware supports one significand of 60 bits. In FIG. 3, the multiplicand is partitioned into 56 and 60 bits which are designated by XH and XL by partitioning block stage 301. The multiplier is partitioned into 3 parts of 56, 56, and 4 bits designated YH, YL, and YG respectively by partitioning block stage 302. The significand calculation block stage 303 of this second alternative method requires 6 multiplications since there are 2 multiplicand parts and 3 multiplier parts, and requires 5 additions to sum the partial products. This an improvement over the first alternative but still is much slower than the preferred embodiment described in FIG. 1.
In FIG. 4 a third alternative is shown where block stages 10-19 are the same as in FIG. 1. This third alternative assumes the exponent calculations are directly executed on the binary exponents and the significands are treated as 113 bit quantities without a corresponding hexadecimal exponent. The binary exponent calculation block stage 400 sums the binary exponents together. The 113 bit multiplicand significand is partitioned into 1, 56, and 56 parts by partitioning block stage 401 and are referred to as 1, XH, and XL respectively. The 113 bit multiplier significand also is partitioned into 1, 56, and 56 parts by partitioning block stage 402 and are referred to as 1, YH, and YL respectively. The significand calculation block stage is able to perform the product calculation by only 4 multiplications which is equivalent to the preferred embodiment in FIG. 1, but there are more additions required due to the extra implied 1. Also, this third alternative does not have the advantage of resolving data dependencies early and contains intermediate results in non-native format where the native floating point unit format is hexadecimal.
While we have described our preferred embodiments of our invention, it will be understood that those skilled in the art, both now and in the future, may make various improvements and enhancements which fall within the scope of the claims which follow. These claims should be construed to maintain the proper protection for the invention first disclosed.

Claims (12)

What is claimed is:
1. A floating point unit (FPU) for executing an IEEE 754 standard floating point multiply instruction for binary extended precision format with a quad word format on a computer processor having floating point hardware which is optimized for hexadecimal based double word format and which defines a binary quad word format to have a 113 bit significand and wherein quad word format multiplication must be executed as a series of several long precision multiplications and extended precision or long precision additions by the computer processor with a multiplier and multiplicand input, comprising,
a FPU dataflow pipeline for performing the multiplications by said computer processor having a floating point multiplier in the pipeline in which the multiplier and multiplicand input operands contain a sign, exponent, and significand portions which are input to a decision block stage which determines if either of the input operands is equal to IEEE 754 special numbers of zero, Not-a-Number (NaN), or infinity, and if either operand is one of said special numbers a multiplication calculation process dataflow proceeds to be handled by a special number handler and no arithmetic calculation is necessary, but if neither operand is one of the three special numbers then data flow in the multiplication calculation process dataflow process continues to a decision block stage which determines whether either operand is a denormalized number, and if either operand is denormalized as defined for said IEEE 754 standard floating point multiply instruction, the process is directed to a prenormalization block stage path which normalizes the operands and creates an expanded exponent range of one more bit, and the multiplication calculation process dataflow process continues to a parallel path multiplexor stage, but if neither operand is denormalized as determined by said decision block stage then the exponent of the number is expanded in an expansion process block stage and then to said parallel path multiplexor stage which receives convergence of said dataflow from said prenormalization block stage path as well as from said expansion process block stage whereupon the multiplication calculation process dataflow splits into four parallel paths, one of said parallel paths having a sign calculation block stage, a second of said parallel paths having an exponent conversion to block stage, a third of said parallel paths having a partition multiplicand significand block stage, and a fourth of said parallel paths having a partition multiplier significand block stage, wherein said two operand's sign bits are processed in said sign calculation block stage, the operands' two 16 bit binary exponents are processed by said exponent conversion block stage, and wherein said partition multiplicand significand block stage receives a 113 bit multiplicand significand input from said parallel path multiplexor stage, and in which fourth path a 113 bit multiplier input is provided to said fourth of said parallel paths having a partition multiplier significand block stage, and wherein said third and fourth paths converge in a calculation block stage which performs multiplications and additions which provide partial products and intermediate sums and finally a final product as a calculation block stage output, said calculation block stage output and said exponent from said second path and the sign bit from said first path merging in a merge block stage to provide a product which is represented in hexadecimal internal format and is converted back to binary format in calculation block stage and rounded.
2. A floating point unit (FPU) according to claim 1 wherein said second of said parallel paths having an exponent conversion stage is a binary exponent to hex block stage, said third of said parallel paths having a partition multiplicand significand partitions into a 56/60 bit block stage, and said fourth of said parallel paths having a partition multiplier significand partitions into 1/56/56 bits block stage, wherein said two operand's sign bits are processed in said sign calculation block stage, the operands' two 16 bit binary exponents are processed by a binary exponent conversion to a hex block stage from which hex exponents of the two operands are output to a hex exponent calculation block stage of said second parallel path which adds the exponents and subtracts any bias, and from which binary exponent conversion to a hex block stage any shift is passed to said third of said parallel paths having a partition multiplicand significand partitioned into bits block stage which also receives a 113 bit multiplicand significand input from said multiplexor, and in which fourth path a 113 bit multiplier input is provided to said fourth of said parallel paths having a partition multiplier significand partitioned into bits block stage, and wherein
said third and fourth paths converge in a calculation block stage which performs multiplications and additions which provide partial products and intermediate sums and finally a final product which is the calculation block stage output,
said calculation block stage output and said hex exponent from said second path and the sign bit from said first path merging in a merge block stage to provide a product which is represented in hexadecimal internal format and is converted back to binary format in calculation block stage and rounded.
3. A floating point unit (FPU) according to claim 2 wherein partitioning of the multiplicand is into two parts of 56 bit high part and a 60 bit low part.
4. A floating point unit (FPU) according to claim 3 wherein the multiplier operand 113 bit significand in said fourth multiplier partitioning block stage which separates the operand into an explicit value of 1, a 56 bit high part and a 56 bit low part.
5. A floating point unit (FPU) according to claim 2 wherein third of said parallel paths having a partition multiplicand significand partitions into 56/60 bits, and said fourth of said parallel paths having a partition multiplier significand partitions into 1/56/56 bits.
6. A floating point unit (FPU) according to claim 5 wherein said calculation block stage performs 4 multiplications and 4 additions which provide partial products and intermediate sums and a final product.
7. A floating point unit (FPU) according to claim 2 wherein third of said parallel paths having a partition multiplicand significand partitions into 56/56/4 bits, and said fourth of said parallel paths having a partition multiplier significand partitions into 56/56/4 bits.
8. A floating point unit (FPU) according to claim 7 wherein said calculation block stage performs 9 multiplications and 8 additions which provide partial products and intermediate sums and a final product.
9. A floating point unit (FPU) according to claim 2 wherein third of said parallel paths having a partition multiplicand significand partitions into 56/60 bits, and said fourth of said parallel paths having a partition multiplier significand partitions into 56/56/4 bits.
10. A floating point unit (FPU) according to claim 9 wherein said calculation block stage performs 6 multiplications and 5 additions which provide partial products and intermediate sums and a final product.
11. A floating point unit (FPU) according to claim 1 wherein third of said parallel paths having a partition multiplicand significand partitions into 1/56/56 bits, and said fourth of said parallel paths having a partition multiplier significand partitions into 1/56/56 bits.
12. A floating point unit (FPU) according to claim 11 wherein said calculation block stage performs 4 multiplications and 5 additions which provide partial products and intermediate sums and a final product.
US09/034,718 1998-03-04 1998-03-04 Floating point binary quad word format multiply instruction unit Expired - Lifetime US6055554A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US09/034,718 US6055554A (en) 1998-03-04 1998-03-04 Floating point binary quad word format multiply instruction unit
KR1019990004766A KR100317767B1 (en) 1998-03-04 1999-02-10 Floating point binary quad word format multiply instruction unit
JP04635999A JP3163072B2 (en) 1998-03-04 1999-02-24 Floating point unit

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US09/034,718 US6055554A (en) 1998-03-04 1998-03-04 Floating point binary quad word format multiply instruction unit

Publications (1)

Publication Number Publication Date
US6055554A true US6055554A (en) 2000-04-25

Family

ID=21878160

Family Applications (1)

Application Number Title Priority Date Filing Date
US09/034,718 Expired - Lifetime US6055554A (en) 1998-03-04 1998-03-04 Floating point binary quad word format multiply instruction unit

Country Status (3)

Country Link
US (1) US6055554A (en)
JP (1) JP3163072B2 (en)
KR (1) KR100317767B1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020095451A1 (en) * 2001-01-18 2002-07-18 International Business Machines Corporation Floating point unit for multiple data architectures
US20040199562A1 (en) * 2003-04-07 2004-10-07 Vincent Dupaquis Efficient multiplication sequence for large integer operands wider than the multiplier hardware
US20090198758A1 (en) * 2008-01-31 2009-08-06 International Business Machines Corporation Method for sign-extension in a multi-precision multiplier
US20120236945A1 (en) * 2010-11-30 2012-09-20 M.S. Ramaiah School Of Advanced Studies Wavelet transformation using multicore processors
CN111198672A (en) * 2019-12-30 2020-05-26 华南师范大学 Point adding and point doubling circuit based on binary domain expansion and control method thereof
US11175394B2 (en) 2019-03-27 2021-11-16 Raytheon Company Binary to quad phase generator

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10990389B2 (en) * 2019-04-29 2021-04-27 Micron Technology, Inc. Bit string operations using a computing tile

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4594679A (en) * 1983-07-21 1986-06-10 International Business Machines Corporation High speed hardware multiplier for fixed floating point operands
US5687106A (en) * 1995-03-31 1997-11-11 International Business Machines Corporation Implementation of binary floating point using hexadecimal floating point unit
US5740093A (en) * 1995-12-20 1998-04-14 Intel Corporation 128-bit register file and 128-bit floating point load and store for quadruple precision compatibility
US5909385A (en) * 1996-04-01 1999-06-01 Hitachi, Ltd. Multiplying method and apparatus

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4594679A (en) * 1983-07-21 1986-06-10 International Business Machines Corporation High speed hardware multiplier for fixed floating point operands
US5687106A (en) * 1995-03-31 1997-11-11 International Business Machines Corporation Implementation of binary floating point using hexadecimal floating point unit
US5740093A (en) * 1995-12-20 1998-04-14 Intel Corporation 128-bit register file and 128-bit floating point load and store for quadruple precision compatibility
US5909385A (en) * 1996-04-01 1999-06-01 Hitachi, Ltd. Multiplying method and apparatus

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
"CMOS Floating-Point Unit For The S/390 Parallel Enterprise Server G4" by Schwarz et al., IBM Journal of Research and Development, vol. 41, No. 4/5, Jul./Sep. 1997, pp. 475-488.
"Hardware Implementation of Sine/Cosine Polynomial Approximation" by Desrosiers et al., IBM TDB, vol. 37, No. 11, Nov. 1994, pp. 609-613.
CMOS Floating Point Unit For The S/390 Parallel Enterprise Server G4 by Schwarz et al., IBM Journal of Research and Development, vol. 41, No. 4/5, Jul./Sep. 1997, pp. 475 488. *
Hardware Implementation of Sine/Cosine Polynomial Approximation by Desrosiers et al., IBM TDB, vol. 37, No. 11, Nov. 1994, pp. 609 613. *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020095451A1 (en) * 2001-01-18 2002-07-18 International Business Machines Corporation Floating point unit for multiple data architectures
US6829627B2 (en) * 2001-01-18 2004-12-07 International Business Machines Corporation Floating point unit for multiple data architectures
US20040199562A1 (en) * 2003-04-07 2004-10-07 Vincent Dupaquis Efficient multiplication sequence for large integer operands wider than the multiplier hardware
US7392276B2 (en) 2003-04-07 2008-06-24 Atmel Corporation Efficient multiplication sequence for large integer operands wider than the multiplier hardware
US20090198758A1 (en) * 2008-01-31 2009-08-06 International Business Machines Corporation Method for sign-extension in a multi-precision multiplier
US20120236945A1 (en) * 2010-11-30 2012-09-20 M.S. Ramaiah School Of Advanced Studies Wavelet transformation using multicore processors
US9197902B2 (en) * 2010-11-30 2015-11-24 M.S. Ramaiah School Of Advanced Studies Wavelet transformation using multicore processors
US11175394B2 (en) 2019-03-27 2021-11-16 Raytheon Company Binary to quad phase generator
CN111198672A (en) * 2019-12-30 2020-05-26 华南师范大学 Point adding and point doubling circuit based on binary domain expansion and control method thereof

Also Published As

Publication number Publication date
JP3163072B2 (en) 2001-05-08
KR100317767B1 (en) 2001-12-22
KR19990077418A (en) 1999-10-25
JPH11296346A (en) 1999-10-29

Similar Documents

Publication Publication Date Title
US5790446A (en) Floating point multiplier with reduced critical paths using delay matching techniques
US5963460A (en) Apparatus for computing transcendental functions quickly
US6078941A (en) Computational structure having multiple stages wherein each stage includes a pair of adders and a multiplexing circuit capable of operating in parallel
US6751644B1 (en) Method and apparatus for elimination of inherent carries
US7395304B2 (en) Method and apparatus for performing single-cycle addition or subtraction and comparison in redundant form arithmetic
US5517438A (en) Fast multiply-add instruction sequence in a pipeline floating-point processor
US8838664B2 (en) Methods and apparatus for compressing partial products during a fused multiply-and-accumulate (FMAC) operation on operands having a packed-single-precision format
US4999803A (en) Floating point arithmetic system and method
US5928316A (en) Fused floating-point multiply-and-accumulate unit with carry correction
US6401194B1 (en) Execution unit for processing a data stream independently and in parallel
US4866652A (en) Floating point unit using combined multiply and ALU functions
Mueller et al. The vector floating-point unit in a synergistic processor element of a Cell processor
US20010009010A1 (en) Data split parallel shifter and parallel adder/subtractor
US5999960A (en) Block-normalization in multiply-add floating point sequence without wait cycles
US5253195A (en) High speed multiplier
US8996601B2 (en) Method and apparatus for multiply instructions in data processors
EP0478745A1 (en) High performance interlock collapsing scism alu apparatus
US6988119B2 (en) Fast single precision floating point accumulator using base 32 system
US20130282784A1 (en) Arithmetic processing device and methods thereof
US5548545A (en) Floating point exception prediction for compound operations and variable precision using an intermediate exponent bus
KR20040039470A (en) Split multiplier for efficient mixed-precision dsp
US6055554A (en) Floating point binary quad word format multiply instruction unit
US5623683A (en) Two stage binary multiplier
US20050228844A1 (en) Fast operand formatting for a high performance multiply-add floating point-unit
US8577952B2 (en) Combined binary/decimal fixed-point multiplier and method

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SCHWARZ, ERIC MARK;REEL/FRAME:009062/0919

Effective date: 19980304

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCF Information on status: patent grant

Free format text: PATENTED CASE

FPAY Fee payment

Year of fee payment: 4

FPAY Fee payment

Year of fee payment: 8

FPAY Fee payment

Year of fee payment: 12