US4896286A - Floating-point arithmetic apparatus - Google Patents

Floating-point arithmetic apparatus Download PDF

Info

Publication number
US4896286A
US4896286A US07/137,924 US13792487A US4896286A US 4896286 A US4896286 A US 4896286A US 13792487 A US13792487 A US 13792487A US 4896286 A US4896286 A US 4896286A
Authority
US
United States
Prior art keywords
output
exponent
point data
floating point
arithmetic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
US07/137,924
Inventor
Katsuhiko Ueda
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Holdings Corp
Original Assignee
Matsushita Electric Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co Ltd filed Critical Matsushita Electric Industrial Co Ltd
Assigned to MATSUSHITA ELECTRIC INDUSTRIAL CO., LTD. reassignment MATSUSHITA ELECTRIC INDUSTRIAL CO., LTD. ASSIGNMENT OF ASSIGNORS INTEREST. Assignors: HAMMERL, GUNTER
Application granted granted Critical
Publication of US4896286A publication Critical patent/US4896286A/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/483Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
    • G06F7/485Adding; Subtracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/499Denomination or exception handling, e.g. rounding or overflow
    • G06F7/49936Normalisation mentioned as feature only
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/499Denomination or exception handling, e.g. rounding or overflow
    • G06F7/49942Significance control
    • G06F7/49947Rounding
    • G06F7/49957Implementation of IEEE-754 Standard

Definitions

  • This invention relates to a floating-point arithmetic apparatus for processing floating-point data with a mantissa section expressed in an absolute value.
  • a first device matches scales of first and second floating-point data.
  • a second device serves to perform addition and subtraction between outputs from the first device.
  • a third device serves to normalize output from the addition/subtraction device.
  • a fourth device serves to round off output from the third device.
  • a fifth device serves to calculate an absolute value of output from the third device.
  • a floating-point arithmetic apparatus handles first and second floating-point data each having an exponent section, a mantissa section expressed in an absolute value, and a sign section representing a sign of the mantissa section.
  • the arithmetic apparatus includes a first device for selecting the greater of the exponent sections of the first and second data and matching scales of the mantissas of the first and second data, and a second device for performing addition and subtraction between mantissa sections of outputs from the first device.
  • a third device serves to normalize a mantissa section of output from the second device and to correct an exponent section of output from the first device.
  • a fourth device serves to round off a mantissa section of output from the third device.
  • a fifth device normalizes the mantissa section of the output from the fourth device and corrects the exponent section of the output from the third device.
  • a sixth device serves to calculate an absolute value of output from the third device.
  • a floating-point arithmetic apparatus handles first and second floating-point data each having an exponent section, a mantissa section expressed in an absolute value, and a sign section representing a sign of the mantissa section.
  • the arithmetic apparatus includes a first device for subtracting the exponent sections of the first and second data, calculating a difference between the exponent sections of the first and second data, determining which of the exponent sections of the first and second data is greater, and selecting the greater of the exponent sections of the first and second data.
  • a second device selects the mantissa section of one of the first and second data which has the smaller exponent section, and shifts rightward the selected mantissa section by a bit number corresponding to the difference between the exponent sections.
  • a third device selects the mantissa section of one of the first and second data which has the greater exponent section.
  • An addition/subtraction circuit performs addition and subtraction between outputs from the second and third devices.
  • a fourth device controls the second device, the third device, and the addition/subtraction circuit and allows output from the addition/subtraction circuit to be always positive.
  • a fifth device normalizes the output from the addition/subtraction circuit and corrects the greater exponent.
  • a sixth device rounds off a mantissa section of output from the fifth device.
  • a seventh device calculates an absolute value of the mantissa section of the output from the fifth device.
  • FIG. 1 is a block diagram of a floating-point arithmetic apparatus according to an embodiment of this invention.
  • FIG. 2 is a flowchart of a conventional floating-point arithmetic sequence.
  • FIG. 3 is a diagram of a guard bit, a round bit, and a sticky bit.
  • FIG. 4 is a flowchart of a floating-point arithmetic sequence in this invention.
  • FIG. 5 is a diagram of the relationship between input data to and output data from the round control circuit of FIG. 1.
  • FIG. 6 is a diagram of the relationship between input data to and output data from the counter of FIG. 1.
  • FIG. 7 is a diagram of the relationship between input data and output data in connection with the generation of control signals CPL1 and CPL2 in the control circuit of FIG. 1.
  • FIG. 8 is a diagram of the relationship between input data and output data in connection with the generation of a control signal CPL3 in the control circuit of FIG. 1.
  • FIG. 9 is a diagram of the relationship between input data and output data in connection with the generation of a sign bit "s" in the control circuit of FIG. 1.
  • FIG. 10 is a diagram of the relationship between input data to and output data from the control circuit of FIG. 1.
  • floating-point data N is expressed as:
  • a conventional method of addition or subtraction between two floating-point data generally requires the following sequential processes: scale matching 10 (which is generally called a binary point alignment process); addition or subtraction 11; absolute value calculation 12; normalization 13; roundoff 14; and overflow process (rightward 1 bit shift) 15.
  • scale matching 10 which is generally called a binary point alignment process
  • addition or subtraction 11 absolute value calculation 12
  • normalization 13 normalization 13
  • roundoff 14 roundoff 14
  • overflow process rightward 1 bit shift
  • FIG. 1 shows a floating-point arithmetic apparatus of this invention, which uses the floating-point format prescribed in IEEE standard "P754". A single precision of arithmetic is assumed to simplify the description. Specifically, two floating-point numbers N1 and N2 are expressed as:
  • the floating-point arithmetic apparatus of FIG. 1 uses a roundoff method which is prescribed in IEEE standard "P754". This roundoff method will be simply described hereinafter with reference to FIG. 3.
  • the mantissa f1 or f2 is shifted right for scale matching, two bits which move out of the scale beyond a bit (referred to as the LSB hereinafter) 80 having a weight of 2 -23 are preserved as a guard bit (referred to as the G bit hereinafter) 81 and a round bit (referred to as the R bit hereinafter) 82, and a bit which shifts right beyond the R bit 82 is preserved as a sticky bit (referred to as the S bit hereinafter) 83 in a form subjected to OR operation.
  • the G bit, the R bit, and the S bit are considered, and bits having weights 2 1 -2 -23 , the G bit, the R bit, and the S bit are calculated.
  • the G bit 81 and the R bit 82 are shifted into the LSB 80 to increase the accuracy of the mantissa section.
  • IEEE standard "P754" prescribes four roundoff modes, that is, an RN (round to nearest) mode, an RP (round to plus) mode, an RM (round to minus) mode, and an RZ (round to zero) mode.
  • an adder 57 performs roundoff.
  • a value added to the LSB in the adder 57 is determined in accordance with a roundoff mode designation signal, the LSB, the G bit, the R bit, and the S bit of the output from a shifter 52, and the sign bit "s" of the final arithmetic result which is one output of a main control circuit 61, as shown in FIG. 5.
  • the arithmetic apparatus of FIG. 1 is based on the fact that the conventional floating-point arithmetic sequence can be converted into an arithmetic sequence of FIG. 4. This conversion is possible as proved hereinafter.
  • the arithmetic sequence of FIG. 2 takes three different paths in respective three cases (1), (2), and (3) as follows.
  • bit pattern (1) represents the following negative number: ##EQU3## so that the absolute value calculation process 12 is required to calculate the complement of "2".
  • the complement process and the normalization process in this case can be performed in the following two different methods or sequences.
  • Equation (11) shows that the position of the absolute value calculation process and the position of the normalization process are exchangeable in the arithmetic sequence.
  • Equation (14) shows that the position of the absolute value calculation process and the position of the normalization process are exchangeable in the arithmetic sequence.
  • equations (11) and (14) reveal that, in respect of the bit pattern of equation (4), the position of the absolute value calculation process 12 and the position of the normalization process 13 are exchangeable in the arithmetic sequence.
  • each of the paths 16, 17, and 18 includes only one of the absolute value calculation process 12 and the roundoff process 14 and that the position of the absolute value calculation process 12 and the normalization process 13 are exchangeable
  • the arithmetic sequence of FIG. 2 is rewritten in a form of FIG. 4.
  • Each of the absolute value calculation process 12 and the roundoff process 14 requires an adder.
  • an adder can be used in common in the two processes 12 and 14.
  • the floating-point arithmetic The apparatus of FIG. 1 is designed on the basis of this fact.
  • the processes 10-15 of FIG. 4 correspond to portions of FIG. 1 denoted by the arrows 10-15 respectively.
  • the floating-point arithmetic apparatus includes input registers 40 and 41 which hold two floating-point number data having signs s1 and s2, exponents e1 and e2, and mantissas f1 and f2 respectively.
  • a subtracter 42 subtracts the exponent e2 in the input register 41 from the exponent e1 in the input register 40, and generates an exponent difference signal ed, a borrow signal eb, and an equality signal ez.
  • the exponent difference signal ed represents the absolute value of the difference between the exponents e1 and e2.
  • the borrow signal eb is "0" when the exponent e1 is equal to or greater than the exponent e2, and is "1" when the exponent e1 is smaller than the exponent e2.
  • a multiplexer 43 selects one of the exponents e1 and e2 in accordance with the borrow signal eb. Specifically, the exponent e1 is selected when the borrow signal eb is "0". The exponent e2 is selected when the borrow signal eb is "1".
  • a multiplexer 44 selects one of the mantissas f1 and f2 in accordance with the borrow signal eb. Specifically, the mantissa f1 is selected when the borrow signal eb is "1”. The mantissa f2 is selected when the borrow signal eb is "0".
  • a shifter 45 shifts the output from the multiplexer 44 rightward by a bit number corresponding to the exponent difference ed outputted by the subtracter 42.
  • a multiplexer 46 selects the mantissa f1 when the borrow signal eb is "0", and selects the output from the shifter 45 when the borrow signal eb is “1".
  • a multiplexer 47 selects the output from the shifter 45 when the borrow signal eb is "0", and selects the mantissa f2 when the borrow signal eb is "1".
  • a complementer 48 calculates the complement of the output from the multiplexer 46 with respect to "1”.
  • a complementer 49 calculates the complement of the output from the multiplexer 47 with respect to "1".
  • An adder 50 adds the outputs from the complementers 48 and 49.
  • a counter 51 derives a number of bits required for the normalization of the output from the adder 50. As shown in FIG. 6, the counter 51 outputs left shift numbers as positive numbers and outputs right shift numbers as negative numbers in the expression of complements with respect to "2".
  • a shifter 52 shifts the output from the adder 50 leftward and rightward in accordance with the output from the counter 51, and thus normalizes the mantissa.
  • a subtracter 53 subtracts the output of the counter 51 from the output of the multiplexer 43, and thus corrects the exponent.
  • a roundoff control circuit 54 is connected to a main control circuit 61 and the shifter 52.
  • the roundoff control circuit 54 receives a roundoff mode designation signal.
  • a complementer 55 calculates the complement of the output from the shifter 52 with respect to "1".
  • the complementer 55 is connected to the main control circuit 61.
  • a multiplexer 56 selects one of the output from the roundoff control circuit 54 and fixed value data in which only the least significant bit is "1".
  • the multiplexer 56 is connected to the main control circuit 61.
  • An adder 57 adds the outputs from the complementer 55 and the multiplexer 56.
  • An overflow detector 58 senses or checks overflow in the output from the adder 57.
  • a shifter 59 outputs a signal representative of a final mantissa "f”, and shifts rightward the output from the adder 57 by 1 bit when the overflow is detected by the overflow detector 58.
  • An incrementer 60 outputs a signal representative of a final exponent "e”, and adds "1" to the output from the subtracter 53 when the overflow is detected by the overflow detector 58.
  • the main control circuit 61 generates control signals CPL1, CPL2, and CPL3 and determines a sign "s" of the final arithmetic result in accordance with an arithmetic mode designation signal, the signs s1 and s2, the outputs eb and ez from the subtracter 42, and the sign bit st outputted from the adder 50.
  • the control signals CPL1 and CPL2 are applied to the complementers 48 and 49 respectively.
  • the control signal CPL3 is applied to the complementer 55 and the multiplexer 56.
  • the floating-point arithmetic apparatus of FIG. 1 operates as follows:
  • the subtracter 42 subtracts the exponent e2 in the input register 41 from the exponent e1 in the input register 40, and derives the exponent difference signal ed, the borrow signal eb, and the equality signal ez.
  • the multiplexer 43 selects the larger of the exponents e1 and e2 in accordance with the borrow signal eb.
  • the multiplexer 44 selects one of the mantissas f1 and f2 which corresponds to the smaller of the exponents e1 and e2. Specifically, when the borrow signal eb is "0", the output of the multiplexer 43 represents the exponent e1 and the output of the multiplexer 44 represents the mantissa f2.
  • the output of the multiplexer 43 represents the exponent e2 and the output of the multiplexer 44 represents the mantissa f1.
  • the shifter 45 shifts the output from the multiplexer 44 rightward by a bit number corresponding to the exponent difference ed.
  • the multiplexer 46 is controlled by the borrow signal eb.
  • the multiplexer 46 selects the mantissa f1 in the input register 40.
  • the multiplexer 46 selects the mantissa f1 which was shifted rightward by the shifter 45 by the ed bit or bits.
  • the multiplexer 47 is also controlled by the borrow signal eb.
  • the multiplexer 46 selects the mantissa f2 in the input register 41.
  • the multiplexer 46 selects the mantissa f2 which was shifted rightward by the shifter 45 by the ed bit or bits.
  • the complementers 48 and 49, and the adder 50 performs the addition and subtraction process of the mantissas represented by the outputs of the multiplexers 46 and 47.
  • the complementers 48 and 49 are controlled via the control signals CPL1 and CPL2 so that the number represented by the output from the adder 50 will be positive.
  • the main control circuit 61 generates the control signals CPL1 and CPL2 in accordance with the borrow signal eb, the equality signal ez, the signs s1 and s2, and the externally-supplied arithmetic mode designation signal by referring to the following rules shown in FIG. 7.
  • the output f1 from the multiplexer 46 is greater than the output f2' from the multiplexer 47.
  • the character f2' means the result of a rightward shift of the mantissa f2 for scale matching.
  • the output f1' from the multiplexer 46 is smaller than the output f2 from the multiplexer 47.
  • the character f1' means the result of a rightward shift of the mantissa f1 for scale matching.
  • the main control circuit 61 makes one of the control signals CPL1 and CPL2 active.
  • the output from the adder 50 represents a positive number or a negative number.
  • the output from the adder 50 is inputted into the counter 51, and the bit number of shift required for normalization is counted in compliance with the rules of FIG. 6.
  • the shifter 52 normalizes the output from the adder 50 in accordance with the output from the counter 51.
  • the subtracter 53 corrects the exponent.
  • the complement and the selected fixed value are added by the adder 57 so that the absolute value of the output from the adder 50 is derived.
  • the control signal CPL3 is set to "0" so that the complementer 55 passes the output of the adder 50 as it is and the multiplexer 56 selects the output from the roundoff control circuit 54.
  • the output from the shifter 52 and the selected output from the roundoff control circuit 54 are added by the adder 57. In this case, since all of the G bit, R bit, and S bit are "0" and the roundoff control circuit 54 outputs "0" as shown in FIG. 5, the output of the adder 50 remains virtually unchanged.
  • the control signal CPL3 is generated by the main control circuit 61 in accordance with generating conditions of FIG. 8 which relate to the cases (d-1) and (d-2).
  • the sign bit "s" of the final result is determined as shown in FIG. 9. Specifically, when the exponents e1 and e2 are different, the sign bit "s" is determined in accordance with the sign bits s1 and s2, and the arithmetic mode designation signal. When the exponents e1 and e2 are equal, the sign bit "s" is determined in accordance with the sign bit st of the output from the adder 50.
  • FIGS. 7-9 The rules and the relationships in FIGS. 7-9 are shown together in FIG. 10.
  • the main control circuit 61 is designed so as to satisfy the relationship between the input and the output of FIG. 10.
  • the absolute value calculation and the roundoff are performed exclusively. Accordingly, a single adder 57 is used in common for the absolute value calculation and the roundoff, allowing a simple structure of the arithmetic apparatus. In addition, the exclusive executions of the absolute value calculation and the roundoff enable high-speed arithmetic.
  • the adders 50 and 57 may be composed of a common adding element.
  • the shifters 45 and 52 may be composed of a common shifting element.

Abstract

A floating-point arithmetic apparatus includes a first device for matching scales of first and second floating-point data. A second device serves to perform addition and subtraction between outputs from the first device. A third device serves to normalize output from the addition/subtraction device. A fourth device serves to round off output from the third device. A fifth device serves to calculate an absolute value of output from the third device. The fourth device and the fifth device may include a common adder.

Description

BACKGROUND OF THE INVENTION
This invention relates to a floating-point arithmetic apparatus for processing floating-point data with a mantissa section expressed in an absolute value.
Conventional method and apparatus for addition or subtraction between two floating-point data tend to cause a low-speed arithmetic as will be described hereinafter.
SUMMARY OF THE INVENTION
It is an object of this invention to provide an apparatus which can perform floating-point arithmetic at a high speed.
It is another object of this invention to provide a simple floating-point arithmetic apparatus.
In a floating-point arithmetic apparatus according to a first aspect of this invention, a first device matches scales of first and second floating-point data. A second device serves to perform addition and subtraction between outputs from the first device. A third device serves to normalize output from the addition/subtraction device. A fourth device serves to round off output from the third device. A fifth device serves to calculate an absolute value of output from the third device.
A floating-point arithmetic apparatus according to a second aspect of this invention handles first and second floating-point data each having an exponent section, a mantissa section expressed in an absolute value, and a sign section representing a sign of the mantissa section. The arithmetic apparatus includes a first device for selecting the greater of the exponent sections of the first and second data and matching scales of the mantissas of the first and second data, and a second device for performing addition and subtraction between mantissa sections of outputs from the first device. A third device serves to normalize a mantissa section of output from the second device and to correct an exponent section of output from the first device. A fourth device serves to round off a mantissa section of output from the third device. When a mantissa section of output from the fourth device overflows, a fifth device normalizes the mantissa section of the output from the fourth device and corrects the exponent section of the output from the third device. A sixth device serves to calculate an absolute value of output from the third device.
A floating-point arithmetic apparatus according to a third aspect of this invention handles first and second floating-point data each having an exponent section, a mantissa section expressed in an absolute value, and a sign section representing a sign of the mantissa section. The arithmetic apparatus includes a first device for subtracting the exponent sections of the first and second data, calculating a difference between the exponent sections of the first and second data, determining which of the exponent sections of the first and second data is greater, and selecting the greater of the exponent sections of the first and second data. A second device selects the mantissa section of one of the first and second data which has the smaller exponent section, and shifts rightward the selected mantissa section by a bit number corresponding to the difference between the exponent sections. A third device selects the mantissa section of one of the first and second data which has the greater exponent section. An addition/subtraction circuit performs addition and subtraction between outputs from the second and third devices. A fourth device controls the second device, the third device, and the addition/subtraction circuit and allows output from the addition/subtraction circuit to be always positive. A fifth device normalizes the output from the addition/subtraction circuit and corrects the greater exponent. A sixth device rounds off a mantissa section of output from the fifth device. A seventh device calculates an absolute value of the mantissa section of the output from the fifth device.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram of a floating-point arithmetic apparatus according to an embodiment of this invention.
FIG. 2 is a flowchart of a conventional floating-point arithmetic sequence.
FIG. 3 is a diagram of a guard bit, a round bit, and a sticky bit.
FIG. 4 is a flowchart of a floating-point arithmetic sequence in this invention.
FIG. 5 is a diagram of the relationship between input data to and output data from the round control circuit of FIG. 1.
FIG. 6 is a diagram of the relationship between input data to and output data from the counter of FIG. 1.
FIG. 7 is a diagram of the relationship between input data and output data in connection with the generation of control signals CPL1 and CPL2 in the control circuit of FIG. 1.
FIG. 8 is a diagram of the relationship between input data and output data in connection with the generation of a control signal CPL3 in the control circuit of FIG. 1.
FIG. 9 is a diagram of the relationship between input data and output data in connection with the generation of a sign bit "s" in the control circuit of FIG. 1.
FIG. 10 is a diagram of the relationship between input data to and output data from the control circuit of FIG. 1.
DESCRIPTION OF THE PREFERRED EMBODIMENT
In IEEE standard "P754", floating-point data N is expressed as:
N=(-1).sup.s.2.sup.e-bias. (1.f)
where the character "s" denotes a sign bit; the character "e" denotes an exponent section with a bias; and the character "f" denotes an absolute mantissa section. One feature of this data format is that the mantissa section is expressed by an absolute value. In addition, IEEE standard "P754" prescribes a method of rounding off arithmetic results.
Prior to the detailed description of this invention, a conventional method of addition or subtraction between two floating-point data will be described for a better understanding of this invention. As shown in FIG. 2, a conventional method of addition or subtraction between two floating-point data generally requires the following sequential processes: scale matching 10 (which is generally called a binary point alignment process); addition or subtraction 11; absolute value calculation 12; normalization 13; roundoff 14; and overflow process (rightward 1 bit shift) 15. The absolute value calculation 12 and the roundoff 14 require processing times whose lengths increase in proportion to the bit numbers of the mantissas "f". Accordingly, it is usually difficult for the conventional method to perform addition or subtraction between floating-point numbers at a high speed.
FIG. 1 shows a floating-point arithmetic apparatus of this invention, which uses the floating-point format prescribed in IEEE standard "P754". A single precision of arithmetic is assumed to simplify the description. Specifically, two floating-point numbers N1 and N2 are expressed as:
N1=(-1).sup.s1. 2.sup.e1-127.(1.f1)
N2=(-1).sup.s2.2.sup.e2-127.(1.f2)
In addition, the floating-point arithmetic apparatus of FIG. 1 uses a roundoff method which is prescribed in IEEE standard "P754". This roundoff method will be simply described hereinafter with reference to FIG. 3. When the mantissa f1 or f2 is shifted right for scale matching, two bits which move out of the scale beyond a bit (referred to as the LSB hereinafter) 80 having a weight of 2-23 are preserved as a guard bit (referred to as the G bit hereinafter) 81 and a round bit (referred to as the R bit hereinafter) 82, and a bit which shifts right beyond the R bit 82 is preserved as a sticky bit (referred to as the S bit hereinafter) 83 in a form subjected to OR operation. In addition and subtraction, the G bit, the R bit, and the S bit are considered, and bits having weights 21 -2-23, the G bit, the R bit, and the S bit are calculated. When the result of addition and subtraction is normalized, the G bit 81 and the R bit 82 are shifted into the LSB 80 to increase the accuracy of the mantissa section. IEEE standard "P754" prescribes four roundoff modes, that is, an RN (round to nearest) mode, an RP (round to plus) mode, an RM (round to minus) mode, and an RZ (round to zero) mode.
In the arithmetic apparatus of FIG. 1, an adder 57 performs roundoff. A value added to the LSB in the adder 57 is determined in accordance with a roundoff mode designation signal, the LSB, the G bit, the R bit, and the S bit of the output from a shifter 52, and the sign bit "s" of the final arithmetic result which is one output of a main control circuit 61, as shown in FIG. 5.
The arithmetic apparatus of FIG. 1 is based on the fact that the conventional floating-point arithmetic sequence can be converted into an arithmetic sequence of FIG. 4. This conversion is possible as proved hereinafter.
The arithmetic sequence of FIG. 2 takes three different paths in respective three cases (1), (2), and (3) as follows.
(1) Addition is Performed with e1=e2
Since e1=e2, it is unnecessary to shift one of the mantissas f1 and f2 in the scale matching process 10. Since the addition process 11 forces the mantissa section to overflow, rightward 1 bit shift in the normalization process 13, the roundoff process 14, and the overflow process 15 are necessary. Accordingly, the whole arithmetic sequence takes a path 16 (see FIG. 2 and also FIG. 4).
(2) Subtraction is Performed with e1=e2
Since e1=e2, it is unnecessary to shift one of the mantissas f1 and f2 in the scale matching process 10. Thus, the G bit, the R bit, and the S bit of the subtraction result are equal to "0" so that the roundoff process 14 and the overflow process 15 are unnecessary. Since the subtraction result is negative in some cases, the absolute value calculation process 12 is necessary. Accordingly, the whole arithmetic sequence takes a path 17 (see FIG. 2 and also FIG. 4).
(3) Exponents e1 and e2 are Unequal
Since the values e1 and e2 are unequal, it is necessary to shift one of the mantissas f1 and f2 rightward in dependence on which of the values e1 and e2 is greater. In the subtraction, since the shifted mantissa is smaller than the other mantissa, it is possible to eliminate the absolute value calculation process 12 by controlling the subtraction so as to make the arithmetic result positive. The right shift of the mantissa sometimes causes one of the G bit, the R bit, and the S bit of the addition or subtraction result to be equal to "1", so that the roundoff process 14 and the overflow process 15 are necessary. Accordingly, the whole arithmetic sequence takes a path 18 (see FIG. 2 and also FIG. 4).
As understood from the previous description, only one of the absolute value calculation process 12 and the roundoff process 14 is performed in any cases.
The positions of the absolute value calculation process 12 and the normalization process 13 are exchangeable in the arithmetic sequence path 17 as proved hereinafter.
Since e1=e2, the bit sequence of the mantissa section which was subjected to the subtraction in the addition and subtraction process 11 can be expressed as:
(s, a.sub.1, a.sub.0, a.sub.-1, . . . , a.sub.-i, . . . , a.sub.-22, a.sub.-23)                                                (1)
where the character "s" denotes a sign bit and the character "a-i " denotes a bit having a weight of 2-i. This bit sequence represents the following number "f": ##EQU1## the bit pattern (1) represents a positive number so that the absolute value calculation process 12 does not perform any operation virtually. Accordingly, it is clear that the positions of the absolute value calculation process 12 and the normalization process 13 can be exchanged.
In the case where: ##EQU2## and where the value "m" is in the range of 0 to 22, the bit pattern (1) represents the following negative number: ##EQU3## so that the absolute value calculation process 12 is required to calculate the complement of "2". The complement process and the normalization process in this case can be performed in the following two different methods or sequences.
(1) First Method Where Normalization Follows Absolute Value Calculation
The number of equation (5) represented by the bit sequence (4) is subjected to the absolute value calculation process by referring to the following equations: ##EQU4## value given by equation (6) is shifted rightward to perform normalization as follows:
(1a) In the case where at least one of the bits a-(m+2) to a-23 is different from "0", since the bit a-(m+1) equals "0" as shown by equation (4), a-(m+1) equals -1". The normalization is realized by setting a weight of this bit equal to 20, that is, shifting rightward the bit pattern of equation (6) by (m+1) bits. The result of shift is given as: ##EQU5## (1b) In the case where all of the bits a-(m+1) to a-23 are "0", the right-hand side of equation (6) equals 2-m, The normalization is realized by a rightward m-bit shift. The result of shift is given as:
f.sub.f1b =2.sup.m.2.sup.-m =2.sup.0                       (8)
(2) Second Method Where Normalization Precedes Absolute Value Calculation
The bit pattern of equation (5) is normalized. There are two cases.
(2a) In the case where at least one of the bits a-(m+2) to a-23 is different from "0", the bit pattern of equation (4) is shifted leftward by (m+1) bits so that the bit a31 m will move to the position of the bit a1. The result of shift is given as: ##EQU6## The complement of this value with respect to "2" is given as: ##EQU7## Accordingly, ##EQU8## Equation (11) shows that the position of the absolute value calculation process and the position of the normalization process are exchangeable in the arithmetic sequence.
(2b) In the case where all of the bits a-(m+1) to a-23 are "0", the bit pattern of equation (4) is shifted leftward by m bits so that the bit a-m will move to the position of the bit a0. The result of shift is given as: ##EQU9## The complement of this value with respect to "2" is given as: ##EQU10## Accordingly, ##EQU11## Equation (14) shows that the position of the absolute value calculation process and the position of the normalization process are exchangeable in the arithmetic sequence.
In summary, equations (11) and (14) reveal that, in respect of the bit pattern of equation (4), the position of the absolute value calculation process 12 and the position of the normalization process 13 are exchangeable in the arithmetic sequence.
In the case where: ##EQU12## and where a leftward 23 bit shift or a rightward 1 bit shift is required, the position of the absolute value calculation process 12 and the position of the normalization process 13 are similarly proved to be exchangeable in the arithmetic sequence.
In view of the fact that each of the paths 16, 17, and 18 includes only one of the absolute value calculation process 12 and the roundoff process 14 and that the position of the absolute value calculation process 12 and the normalization process 13 are exchangeable, the arithmetic sequence of FIG. 2 is rewritten in a form of FIG. 4. Each of the absolute value calculation process 12 and the roundoff process 14 requires an adder. In the arithmetic sequence of FIG. 4, since the absolute value calculation process 12 and the roundoff process 14 have the same phases, an adder can be used in common in the two processes 12 and 14. The floating-point arithmetic The apparatus of FIG. 1 is designed on the basis of this fact. The processes 10-15 of FIG. 4 correspond to portions of FIG. 1 denoted by the arrows 10-15 respectively.
As shown in FIG. 1, the floating-point arithmetic apparatus includes input registers 40 and 41 which hold two floating-point number data having signs s1 and s2, exponents e1 and e2, and mantissas f1 and f2 respectively. A subtracter 42 subtracts the exponent e2 in the input register 41 from the exponent e1 in the input register 40, and generates an exponent difference signal ed, a borrow signal eb, and an equality signal ez. The exponent difference signal ed represents the absolute value of the difference between the exponents e1 and e2. The borrow signal eb is "0" when the exponent e1 is equal to or greater than the exponent e2, and is "1" when the exponent e1 is smaller than the exponent e2.
A multiplexer 43 selects one of the exponents e1 and e2 in accordance with the borrow signal eb. Specifically, the exponent e1 is selected when the borrow signal eb is "0". The exponent e2 is selected when the borrow signal eb is "1". A multiplexer 44 selects one of the mantissas f1 and f2 in accordance with the borrow signal eb. Specifically, the mantissa f1 is selected when the borrow signal eb is "1". The mantissa f2 is selected when the borrow signal eb is "0". A shifter 45 shifts the output from the multiplexer 44 rightward by a bit number corresponding to the exponent difference ed outputted by the subtracter 42.
A multiplexer 46 selects the mantissa f1 when the borrow signal eb is "0", and selects the output from the shifter 45 when the borrow signal eb is "1". A multiplexer 47 selects the output from the shifter 45 when the borrow signal eb is "0", and selects the mantissa f2 when the borrow signal eb is "1". A complementer 48 calculates the complement of the output from the multiplexer 46 with respect to "1". A complementer 49 calculates the complement of the output from the multiplexer 47 with respect to "1". An adder 50 adds the outputs from the complementers 48 and 49.
A counter 51 derives a number of bits required for the normalization of the output from the adder 50. As shown in FIG. 6, the counter 51 outputs left shift numbers as positive numbers and outputs right shift numbers as negative numbers in the expression of complements with respect to "2". A shifter 52 shifts the output from the adder 50 leftward and rightward in accordance with the output from the counter 51, and thus normalizes the mantissa. A subtracter 53 subtracts the output of the counter 51 from the output of the multiplexer 43, and thus corrects the exponent.
A roundoff control circuit 54 is connected to a main control circuit 61 and the shifter 52. The roundoff control circuit 54 receives a roundoff mode designation signal. A complementer 55 calculates the complement of the output from the shifter 52 with respect to "1". The complementer 55 is connected to the main control circuit 61. A multiplexer 56 selects one of the output from the roundoff control circuit 54 and fixed value data in which only the least significant bit is "1". The multiplexer 56 is connected to the main control circuit 61. An adder 57 adds the outputs from the complementer 55 and the multiplexer 56. An overflow detector 58 senses or checks overflow in the output from the adder 57. A shifter 59 outputs a signal representative of a final mantissa "f", and shifts rightward the output from the adder 57 by 1 bit when the overflow is detected by the overflow detector 58. An incrementer 60 outputs a signal representative of a final exponent "e", and adds "1" to the output from the subtracter 53 when the overflow is detected by the overflow detector 58.
The main control circuit 61 generates control signals CPL1, CPL2, and CPL3 and determines a sign "s" of the final arithmetic result in accordance with an arithmetic mode designation signal, the signs s1 and s2, the outputs eb and ez from the subtracter 42, and the sign bit st outputted from the adder 50. The control signals CPL1 and CPL2 are applied to the complementers 48 and 49 respectively. The control signal CPL3 is applied to the complementer 55 and the multiplexer 56.
Since the mantissa sections f1 and f2 have no bit of a weight equal to 20, "1" is added to each of the left sides of the most significant bits of the outputs from the input registers 40 and 41 and the resulting data are applied to the multiplexers 44, 46, and 47.
The floating-point arithmetic apparatus of FIG. 1 operates as follows:
(a) Scale Matching 10
The subtracter 42 subtracts the exponent e2 in the input register 41 from the exponent e1 in the input register 40, and derives the exponent difference signal ed, the borrow signal eb, and the equality signal ez. The multiplexer 43 selects the larger of the exponents e1 and e2 in accordance with the borrow signal eb. The multiplexer 44 selects one of the mantissas f1 and f2 which corresponds to the smaller of the exponents e1 and e2. Specifically, when the borrow signal eb is "0", the output of the multiplexer 43 represents the exponent e1 and the output of the multiplexer 44 represents the mantissa f2. When the borrow signal eb is "1", the output of the multiplexer 43 represents the exponent e2 and the output of the multiplexer 44 represents the mantissa f1. The shifter 45 shifts the output from the multiplexer 44 rightward by a bit number corresponding to the exponent difference ed. The multiplexer 46 is controlled by the borrow signal eb. When the borrow signal eb is "0", the multiplexer 46 selects the mantissa f1 in the input register 40. When the borrow signal eb is "1", the multiplexer 46 selects the mantissa f1 which was shifted rightward by the shifter 45 by the ed bit or bits. The multiplexer 47 is also controlled by the borrow signal eb. When the borrow signal eb is 1", the multiplexer 46 selects the mantissa f2 in the input register 41. When the borrow signal eb is "0", the multiplexer 46 selects the mantissa f2 which was shifted rightward by the shifter 45 by the ed bit or bits.
(b) Addition and Subtraction 11
The complementers 48 and 49, and the adder 50 performs the addition and subtraction process of the mantissas represented by the outputs of the multiplexers 46 and 47.
In the case where the exponents e1 and e2 are unequal, the complementers 48 and 49 are controlled via the control signals CPL1 and CPL2 so that the number represented by the output from the adder 50 will be positive. The main control circuit 61 generates the control signals CPL1 and CPL2 in accordance with the borrow signal eb, the equality signal ez, the signs s1 and s2, and the externally-supplied arithmetic mode designation signal by referring to the following rules shown in FIG. 7.
In the case where e1>e2(ez=0, eb=0), the output f1 from the multiplexer 46 is greater than the output f2' from the multiplexer 47. The character f2' means the result of a rightward shift of the mantissa f2 for scale matching. When subtraction is finally required, the arithmetic is performed by referring to the following equations: ##EQU13## In addition, the main control circuit 61 makes only the control signal CPL2 active and the adder 50 is forced to always output the positive number "f1-f2'".
In the case where e1<e2 (ez=0, eb=1), the output f1' from the multiplexer 46 is smaller than the output f2 from the multiplexer 47. The character f1' means the result of a rightward shift of the mantissa f1 for scale matching. When subtraction is finally required, the arithmetic is performed by referring to the following equations: ##EQU14## In addition, the main control circuit 61 makes only the control signal CPL1 active and the adder 50 is forced to always output the positive number "f2-f1'".
In the case where e1=e2 (ez=1, eb=0), it is difficult to determine which of the outputs f1 and f2 from the complementers 48 and 49 is greater from the exponents e1 and e2. When subtraction is finally required, the arithmetic is performed by referring to the following equations: ##EQU15## In addition, the main control circuit 61 makes one of the control signals CPL1 and CPL2 active. The output from the adder 50 represents a positive number or a negative number.
(c) Normalization 13
The output from the adder 50 is inputted into the counter 51, and the bit number of shift required for normalization is counted in compliance with the rules of FIG. 6. The shifter 52 normalizes the output from the adder 50 in accordance with the output from the counter 51. The subtracter 53 corrects the exponent.
(d-1) Absolute Value Calculation 12
In the case where ez=1 and the devices 48-50 virtually perform subtraction, since the shifter 45 does not perform a rightward shift, all of the G bit, R bit, and S bit of the output from the adder 50 are "0" and the output from the adder 50 is accurate so that a roundoff process is unnecessary. It should be noted that the output from the adder 50 is sometimes negative. In the case where ez=1 and the sign bit st of the output from the adder 50 is "1", the control signal CPL3 is set to "1" so that the complementer 55 calculates the complement of the output from the shifter 52 with respect to "1" and the multiplexer 56 selects a fixed value (0 . . . 01). The complement and the selected fixed value are added by the adder 57 so that the absolute value of the output from the adder 50 is derived. In the case where ez=1 and the sign bit st of the output from the adder 50 is "0", the control signal CPL3 is set to "0" so that the complementer 55 passes the output of the adder 50 as it is and the multiplexer 56 selects the output from the roundoff control circuit 54. The output from the shifter 52 and the selected output from the roundoff control circuit 54 are added by the adder 57. In this case, since all of the G bit, R bit, and S bit are "0" and the roundoff control circuit 54 outputs "0" as shown in FIG. 5, the output of the adder 50 remains virtually unchanged.
(d-2) Roundoff 14
When ez=0, although the output from the adder 50 is always positive, the G bit, R bit, and S bit differ from "0" since the shifter 45 performs a rightward shift. In the case where ez=1 and the devices 48-50 virtually perform addition, the shifter 52 performs a rightward shift so that the G bit of the output from the adder 50 is sometimes "1". In these cases, the control signal CPL3 is set to "0" so that the complementer 55 passes the output of the shifter 52 as it is and the multiplexer 56 selects the output from the roundoff control circuit 54. The output from the adder 50 and the output from the roundoff control circuit 54 are added by the adder 57 so that a roundoff process is performed.
The control signal CPL3 is generated by the main control circuit 61 in accordance with generating conditions of FIG. 8 which relate to the cases (d-1) and (d-2).
(e) Right 1 Bit Shift 15
In the case where the adder 57 performs roundoff, an overflow sometimes occur. When the overflow is detected by the overflow detector 58, the shifter 59 performs a rightward 1 bit shift and the incrementer 60 corrects the exponent.
In accordance with the previously-mentioned processes 10-15, the exponent "e" and the mantissa "f" of the final result are calculated and derived.
The sign bit "s" of the final result is determined as shown in FIG. 9. Specifically, when the exponents e1 and e2 are different, the sign bit "s" is determined in accordance with the sign bits s1 and s2, and the arithmetic mode designation signal. When the exponents e1 and e2 are equal, the sign bit "s" is determined in accordance with the sign bit st of the output from the adder 50.
The rules and the relationships in FIGS. 7-9 are shown together in FIG. 10. The main control circuit 61 is designed so as to satisfy the relationship between the input and the output of FIG. 10.
As described previously, the absolute value calculation and the roundoff are performed exclusively. Accordingly, a single adder 57 is used in common for the absolute value calculation and the roundoff, allowing a simple structure of the arithmetic apparatus. In addition, the exclusive executions of the absolute value calculation and the roundoff enable high-speed arithmetic.
As understood from FIG. 4, a pair of an addition (subtraction) process and a shift process are repeated twice after a scale matching process. Accordingly, the adders 50 and 57 may be composed of a common adding element. In addition, the shifters 45 and 52 may be composed of a common shifting element.

Claims (6)

What is claimed is:
1. A floating-point arithmetic apparatus for first and second floating point data each having an exponent section, a mantissa section expressed in an absolute value, and a sign section representing a sign of the mantissa section, the apparatus comprising:
first means for calculating the difference between the exponent sections of the first and second floating point data, determining which of the exponent sections of the first and second floating point data is greater, and selecting the greater exponent section of the first and second floating point data;
second means responsive to an output from the first means for selecting the mantissa section of one of the first and second floating point data which has the smaller exponent section, and shifting rightward the selected mantissa section by a bit number corresponding to the difference between the exponent sections;
third means responsive to the output from the first means for selecting the mantissa section of one of the first and second floating point data which has the greater exponent section;
an arithmetic means for performing addition and subtraction between outputs from the second and third means;
fourth means for normalizing a mantissa section which is an output from said arithmetic means, and correcting the greater exponent section of the first and second floating point data in correspondence with the normalizing;
fifth means for rounding off a mantissa section of an output from the fourth means;
sixth means for calculating an absolute value of the mantissa section of the output from the fourth means;
seventh means for generating a signal representative of a final arithmetic mode on the basis of the sign sections of the first and second floating point data, a result of the determination by the first means about which of the exponent sections of the first and second floating data is greater, and an arithmetic mode designation signal, for outputting the final arithmetic mode signal to said arithmetic means, and for, in cases where the final arithmetic mode corresponds to subtraction and the output from the first means indicates that there is a difference in exponent between the first and second floating point data, instructing said arithmetic means to subtract the output of the second means from the output of the third means;
eighth means for, in cases where the output from the first means represents that there is a difference in exponent between the first and second floating point data, enabling the fifth means to execute its function, for in cases where the output from the first means represents that there is no difference in exponent between the first and second floating point data and the final arithmetic mode corresponds to addition, enabling the fifth means to execute its function, and for, in cases where the output from the first means represents that there is no difference in exponent between the first and second floating point data and the final arithmetic mode corresponds to subtraction, enabling the sixth means to execute its function; and
ninth means for determining a sign of a final arithmetic result on the basis of the sign sections of the first and second floating point data, the result of the determination by the first means about which of the exponent sections of the first and second floating point data is greater, the arithmetic mode designation signal, and a sign bit of the output from said arithmetic means.
2. The apparatus of claim 1 further comprising means for, when an output from the fifth means overflows, shifting rightward the output from the fifth means by one bit and correcting an exponent section of an output from the fourth means.
3. The apparatus of claim 1, wherein the said arithmetic means and the sixth means each include an adder.
4. A floating-point arithmetic apparatus for first and second floating point data each having an exponent section, a mantissa section expressed in an absolute value, and a sign section representing a sign of the mantissa section, the apparatus comprising:
first means for calculating the difference between the exponent sections of the first and second floating point data, determining which of the exponent sections of the first and second floating point data is greater, and selecting the greater exponent section of the first and second floating point data;
second means responsive to an output from the first means for selecting the mantissa section of one of the first and second floating point data which has the smaller exponent section, and shifting rightward the selected mantissa section by a bit number corresponding to the difference between the exponent sections;
third means responsive to the output from the first means for selecting the mantissa section of one of the first and second floating point data which has the greater exponent section;
an arithmetic means for performing addition and subtraction between outputs from the second and third means;
fourth means for normalizing a mantissa section which is an output from said arithmetic means, and correcting the greater exponent section of the first and second floating point data in correspondence with the normalizing;
fifth means for rounding off a mantissa section of an output from the fourth means;
sixth means for calculating an absolute value of the mantissa section of the output from the fourth means;
seventh means for generating a signal representative of a final arithmetic mode on the basis of the sign sections of the first and second floating point data, a result of the determination by the firs means about which of the exponent sections of the first and second floating point data is greater, and an arithmetic mode designation signal, for outputting the final arithmetic mode signal to said arithmetic means, and for, in cases where the final arithmetic mode corresponds to subtraction and the output from the first means indicates that there is a difference in exponent between the first and second floating point data, instructing said arithmetic means to subtract the output of the second means from the output of the third means;
eighth means for, in cases where the output from the first means represents that there is a difference in exponent between the first and second floating point data, enabling the fifth means to execute its function, for, in cases where the output from the first means represents that there is no difference in exponent between the first and second floating point data and the final arithmetic mode corresponds to addition, enabling the fifth means to execute its function, and for, in cases where the output from the first means represents that there is no difference in exponent between the first and second floating point data and the final arithmetic mode corresponds to subtraction, enabling the sixth means to execute its function; and
ninth means for determining a sign of a final arithmetic result on the basis of the sign sections of the first and second floating point data, the result of the determination by the first means about which of the exponent sections of the first and second floating point data is greater, the arithmetic mode designation signal, and a sign bit of the output from said arithmetic means;
wherein the fifth means and the sixth means each utilize an adding means, the adding means being common to the fifth means and the sixth means.
5. The apparatus of claim 4, further comprising means for, when output from the fifth means overflows, shifting rightward the output from the fifth means by one bit and correcting an exponent section of an output from the fourth means.
6. The apparatus of claim 4, wherein the said arithmetic means and the sixth means each include an adder.
US07/137,924 1986-12-29 1987-12-28 Floating-point arithmetic apparatus Expired - Lifetime US4896286A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP61-311024 1986-12-29
JP61311024A JP2558669B2 (en) 1986-12-29 1986-12-29 Floating point arithmetic unit

Publications (1)

Publication Number Publication Date
US4896286A true US4896286A (en) 1990-01-23

Family

ID=18012196

Family Applications (1)

Application Number Title Priority Date Filing Date
US07/137,924 Expired - Lifetime US4896286A (en) 1986-12-29 1987-12-28 Floating-point arithmetic apparatus

Country Status (5)

Country Link
US (1) US4896286A (en)
EP (1) EP0273753B1 (en)
JP (1) JP2558669B2 (en)
KR (1) KR910006143B1 (en)
DE (1) DE3786072T2 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5063530A (en) * 1988-05-31 1991-11-05 Kabushiki Kaisha Toshiba Method of adding/subtracting floating-point representation data and apparatus for the same
US5111421A (en) * 1990-02-26 1992-05-05 General Electric Company System for performing addition and subtraction of signed magnitude floating point binary numbers
US5267186A (en) * 1990-04-02 1993-11-30 Advanced Micro Devices, Inc. Normalizing pipelined floating point processing unit
US5568412A (en) * 1994-04-29 1996-10-22 Goldstar Company, Limited Rounding-off method and apparatus of floating point arithmetic apparatus for addition/subtraction
US20120197954A1 (en) * 2010-09-28 2012-08-02 Texas Instruments Incorporated Floating point multiplier circuit with optimized rounding calculation

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4994996A (en) * 1989-02-03 1991-02-19 Digital Equipment Corporation Pipelined floating point adder for digital computer
US4926370A (en) * 1989-04-17 1990-05-15 International Business Machines Corporation Method and apparatus for processing postnormalization and rounding in parallel

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4488252A (en) * 1982-02-22 1984-12-11 Raytheon Company Floating point addition architecture
US4562553A (en) * 1984-03-19 1985-12-31 Analogic Corporation Floating point arithmetic system and method with rounding anticipation
US4644490A (en) * 1983-04-11 1987-02-17 Hitachi, Ltd. Floating point data adder
US4698771A (en) * 1984-12-31 1987-10-06 Gte Communication Systems Corporation Adder circuit for encoded PCM samples

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5960637A (en) * 1982-09-30 1984-04-06 Toshiba Corp Arithmetic device for floating decimal point

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4488252A (en) * 1982-02-22 1984-12-11 Raytheon Company Floating point addition architecture
US4644490A (en) * 1983-04-11 1987-02-17 Hitachi, Ltd. Floating point data adder
US4562553A (en) * 1984-03-19 1985-12-31 Analogic Corporation Floating point arithmetic system and method with rounding anticipation
US4698771A (en) * 1984-12-31 1987-10-06 Gte Communication Systems Corporation Adder circuit for encoded PCM samples

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"An American National Standard; IEEE Standard for Binary Floating-Point Arithmetic" sponsored by Standards Committee of the IEEE Computer Society; approved by IEEE Standards Board & American National Standards Institute; ANSI/IEEE Std. 754--1985.
An American National Standard; IEEE Standard for Binary Floating Point Arithmetic sponsored by Standards Committee of the IEEE Computer Society; approved by IEEE Standards Board & American National Standards Institute; ANSI/IEEE Std. 754 1985. *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5063530A (en) * 1988-05-31 1991-11-05 Kabushiki Kaisha Toshiba Method of adding/subtracting floating-point representation data and apparatus for the same
US5111421A (en) * 1990-02-26 1992-05-05 General Electric Company System for performing addition and subtraction of signed magnitude floating point binary numbers
US5267186A (en) * 1990-04-02 1993-11-30 Advanced Micro Devices, Inc. Normalizing pipelined floating point processing unit
US5568412A (en) * 1994-04-29 1996-10-22 Goldstar Company, Limited Rounding-off method and apparatus of floating point arithmetic apparatus for addition/subtraction
US20120197954A1 (en) * 2010-09-28 2012-08-02 Texas Instruments Incorporated Floating point multiplier circuit with optimized rounding calculation
US8832166B2 (en) * 2010-09-28 2014-09-09 Texas Instruments Incorporated Floating point multiplier circuit with optimized rounding calculation

Also Published As

Publication number Publication date
EP0273753B1 (en) 1993-06-02
KR880008143A (en) 1988-08-30
KR910006143B1 (en) 1991-08-16
JPS63167930A (en) 1988-07-12
EP0273753A2 (en) 1988-07-06
EP0273753A3 (en) 1990-12-27
DE3786072T2 (en) 1993-10-28
DE3786072D1 (en) 1993-07-08
JP2558669B2 (en) 1996-11-27

Similar Documents

Publication Publication Date Title
EP0820005B1 (en) Method and apparatus for computing floating point data
US5313415A (en) Method and apparatus for performing floating point arithmetic operation and rounding the result thereof
KR920005226B1 (en) Floating point arithmetic units
EP0472139B1 (en) A floating-point processor
US5550767A (en) Method and apparatus for detecting underflow and overflow
US5369607A (en) Floating-point and fixed-point addition-subtraction assembly
US5317526A (en) Format conversion method of floating point number and device employing the same
JP3845009B2 (en) Product-sum operation apparatus and product-sum operation method
US5677861A (en) Arithmetic apparatus for floating-point numbers
KR19980041731A (en) Floating-point multiplication and accumulation with coordination and normalization classes
US4896286A (en) Floating-point arithmetic apparatus
JPH069028B2 (en) Arithmetic unit
EP0205850B1 (en) Operation unit with an error amount calculating circuit for output data thereof
JP2902041B2 (en) Floating point arithmetic unit
JPH0578049B2 (en)
JP3124286B2 (en) Floating point arithmetic unit
JP2801472B2 (en) Floating point arithmetic unit
JP2558799B2 (en) Floating point normalized rounding device
JPH0383126A (en) Floating-point multiplier
JP2792998B2 (en) Type conversion device using addition / subtraction circuit
JPS5960637A (en) Arithmetic device for floating decimal point
JPH0552532B2 (en)
JPH0612224A (en) Method and device for arithmetic processing for floating point binary number
JPH04252330A (en) Processing system for floating point addition/subtraction
JPH0667849A (en) Arithmetic circuit

Legal Events

Date Code Title Description
AS Assignment

Owner name: MATSUSHITA ELECTRIC INDUSTRIAL CO., LTD., 1006, OA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST.;ASSIGNOR:HAMMERL, GUNTER;REEL/FRAME:004809/0300

Effective date: 19871221

Owner name: MATSUSHITA ELECTRIC INDUSTRIAL CO., LTD.,JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HAMMERL, GUNTER;REEL/FRAME:004809/0300

Effective date: 19871221

STCF Information on status: patent grant

Free format text: PATENTED CASE

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FPAY Fee payment

Year of fee payment: 4

FEPP Fee payment procedure

Free format text: PAYER NUMBER DE-ASSIGNED (ORIGINAL EVENT CODE: RMPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FPAY Fee payment

Year of fee payment: 8

FPAY Fee payment

Year of fee payment: 12