US20040068531A1 - Faster shift value calculation using modified carry-lookahead adder - Google Patents
Faster shift value calculation using modified carry-lookahead adder Download PDFInfo
- Publication number
- US20040068531A1 US20040068531A1 US10/613,095 US61309503A US2004068531A1 US 20040068531 A1 US20040068531 A1 US 20040068531A1 US 61309503 A US61309503 A US 61309503A US 2004068531 A1 US2004068531 A1 US 2004068531A1
- Authority
- US
- United States
- Prior art keywords
- logic
- carry
- result
- lookahead
- control signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000000295 complement effect Effects 0.000 claims description 16
- 238000000034 method Methods 0.000 claims description 13
- 238000012545 processing Methods 0.000 claims description 6
- 230000004048 modification Effects 0.000 abstract description 8
- 238000012986 modification Methods 0.000 abstract description 8
- 230000001934 delay Effects 0.000 abstract description 4
- 230000008030 elimination Effects 0.000 abstract description 2
- 238000003379 elimination reaction Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 12
- 241000030538 Thecla Species 0.000 description 10
- 230000009977 dual effect Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007274 generation of a signal involved in cell-cell signaling Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012856 packing Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000003245 working effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/50—Adding; Subtracting
- G06F7/505—Adding; Subtracting in bit-parallel fashion, i.e. having a different digit-handling circuit for each denomination
- G06F7/5055—Adding; Subtracting in bit-parallel fashion, i.e. having a different digit-handling circuit for each denomination in which one operand is a constant, i.e. incrementers or decrementers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/50—Adding; Subtracting
- G06F7/505—Adding; Subtracting in bit-parallel fashion, i.e. having a different digit-handling circuit for each denomination
- G06F7/506—Adding; Subtracting in bit-parallel fashion, i.e. having a different digit-handling circuit for each denomination with simultaneous carry generation for, or propagation over, two or more stages
- G06F7/508—Adding; Subtracting in bit-parallel fashion, i.e. having a different digit-handling circuit for each denomination with simultaneous carry generation for, or propagation over, two or more stages using carry look-ahead circuits
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03K—PULSE TECHNIQUE
- H03K19/00—Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits
- H03K19/02—Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components
- H03K19/08—Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using semiconductor devices
- H03K19/094—Logic circuits, i.e. having at least two inputs acting on one output; Inverting circuits using specified components using semiconductor devices using field-effect transistors
- H03K19/096—Synchronous circuits, i.e. using clock signals
- H03K19/0963—Synchronous circuits, i.e. using clock signals using transistors of complementary type
Definitions
- the present invention relates to an apparatus and method for use in implementing a floating point multiply-accumulate operation.
- FMAC floating point multiply-accumulate operation
- FIG. 1 is a diagram of a prior art circuit 10 for use in implementing an FMAC operation.
- circuit 10 three latches 12 , 14 , and 16 contain three 17-bit operands A, B, and C. The values of those operands are input to a first carry-save adder (CSA) 18 .
- CSA carry-save adder
- the result of the first CSA 18 is input to a second CSA 20 along with the value of a constant received on line 22 .
- the output of the second CSA adder 20 is input to a carry-lookahead adder (CLA) 24 , which performs an add operation and outputs a resulting shift value on line 26 for use in an FMAC operation.
- CLA carry-lookahead adder
- the shift value is used to line up the mantissas for the add portion of the FMAC operation.
- the floating point numbers used by the FMAC operation are each expressed as a mantissa and an exponent.
- the result of the multiply operation (A*B) produces a product that typically has a different exponent than the exponent of operand C.
- the FMAC operation uses the shift value to shift, and hence “line up,” the mantissa of operand C for adding it with the mantissa of the A*B product. Although the mantissa of operand C is shifted, the mantissa of the A*B product could alternatively be shifted to perform the add operation. Calculating the shift value and performing the shifting of the mantissa of operand C occur during the multiply operation.
- the format of floating point numbers, the addition of floating point numbers and the multiplication of floating point numbers are known in the art.
- An embodiment consistent with the present invention reduces propagation delays within a circuit for performing an FMAC operation.
- An apparatus consistent with the present invention includes a plurality of latches for containing a plurality of operands.
- a CSA circuit performs a CSA operation on the operands to produce a first result, and a logic block performs a CLA operation on the first result to produce a second result.
- a logic circuit in the logic block performs a logic operation on the second result based upon a control signal to produce a shift value for use in the FMAC operation.
- a method consistent with the present invention includes receiving a plurality of operands.
- a CSA operation is performed on the operands to produce a first result, and a CLA operation is performed on the first result to produce a second result.
- a logic operation is performed on the second result, as part of the CLA operation, based upon a control signal to produce a shift value for use in the FMAC operation.
- FIG. 1 is a logic diagram of a prior art circuit for use in implementing an FMAC operation
- FIG. 2 is a logic diagram of a circuit for use in implementing an FMAC operation consistent with the present invention
- FIG. 3 is a transistor diagram of prior art circuitry for use in implementing an FMAC operation corresponding with the logic diagram in FIG. 1;
- FIG. 4 is a transistor diagram of circuitry for use in implementing an FMAC operation corresponding with the logic diagram in FIG. 2;
- FIG. 5 is a transistor diagram of a control circuit for generating control signals for use in implementing an FMAC operation using the circuitry shown in FIG. 4.
- Circuitry consistent with the present invention reduces propagation delays in performing an FMAC operation by eliminating one stage of logic used in generating a shift value for the operation. Another stage of logic is modified to perform a parallel logic operation and account for the reduced logic stage. This results in increased speed of execution in calculating the shift value for use in an FMAC operation.
- FIG. 2 is a logic diagram of a circuit 30 for use in performing an FMAC operation consistent with the present invention.
- Circuit 30 illustrates modification of prior art circuit 10 shown in FIG. 1.
- Circuit 30 includes three latches 32 , 34 , and 36 for containing three operands A, B and C for the FMAC operation, shown as 17-bit operands in this example.
- a CSA 38 receives the values of operands A, B, and C from latches 32 , 34 , and 36 .
- a second CSA corresponding with CSA 20 in circuit 10 is eliminated. Elimination of the second CSA increases speed of calculation of the resulting shift value for use in an FMAC operation by eliminating one stage of logic; it thus reduces the corresponding propagation delays.
- a logic block 40 receives the outputs from CSA 38 and provides a resulting shift value on line 48 .
- the shift value is used, as explained above, to line up mantissas for the add operation.
- logic block 40 is implemented using a CLA that is modified to logically perform an exclusive-OR (XOR) operation on the result of the CLA operation based upon a control signal 46 .
- XOR exclusive-OR
- the XOR function is performed on the most significant bit of the result.
- Control signal 46 is generated based upon whether the FMAC operation is of Single Instruction, Multiple Data (SIMD) type or non-SIMD type.
- SIMD operations are known in the art. For example, SIMD indicates packing two single precision (32 bit) floating point numbers in registers normally meant for a single double precision (64 bit) floating point number. SIMD calculations are, accordingly, used where full precision floating point calculations are not needed, thereby doubling the throughput of operations by accepting only single precision results. More detail regarding the usage of SIMD in computation is found throughput the literature, e.g., Abel et al., “Applications Tuning for Streaming SIMD Extensions,” Intel Technology Journal Q2, 1999.
- the XOR operation can be implemented within the existing circuitry of a CLA in logic block 40 and thus does not generate any additional propagation delay.
- the second CSA 20 can be eliminated based upon how the constant on line 22 operates.
- the second CSA 20 in circuit 10 uses only the lower eight bits of the constant on line 22 , and those lower eight bits only vary in the most significant bit position. This variance is known because the FMAC operation uses a standard for operating on floating point numbers, as specified in IEEE Standard for Binary Floating-Point Arithmetic, IEEE Std. 754-1985, which is incorporated herein by reference.
- CSAs and CLAs, along with the operations they implement, are known in the art.
- FIG. 3 is a transistor diagram of prior art circuitry for implementing a final stage in CLA 24 of prior art circuit 10 .
- FIG. 4 is a transistor diagram illustrating an example of how the prior art circuitry in FIG. 3 is modified to implement the XOR operation in circuit 30 . Since CLAs are known in the art, only the final stage is shown for illustrative purposes. In addition, only the final stage is shown as modified in this example, although additional modifications may be made based on a particular use of the CLA. More particular, the circuitry of FIG. 3 illustrates operations on only the most significant bit (MSB) of data, e.g., bit [ 7 ] of bits [ 7 : 0 ] within a byte of data.
- MSB most significant bit
- a final stage in CLA 24 includes two sets of circuits 50 and 60 corresponding with two bits for each input bit. Two bits exists because the implementation in this embodiment uses, for example, complementary logic referred to as dual rail Domino CMOS or mousetrap logic, which is known in the art.
- Circuit 50 includes a first stage 52 and second stage 54 producing a summation low (SUML) signal 58 and its complement, a signal sSUML 56 .
- Complementary circuitry 60 includes a first stage 62 and second stage 64 producing a summation high (SUMH) signal 68 and its complement, signal sSUMH 66 .
- the signals (CLK, DNG or GND, CARRY_INL, CARRY_INH, GROUP_PROPAGATE, GROUP_GENERATEH and GROUP_GENERATEL) shown in circuits 50 and 60 are known in the art with respect to FMAC operations.
- the signal pair CARRY_INH and CARRY_INL is the input carry signal from the least significant 4-bit nibble.
- the signal GROUP_PROPAGATE is true if and only if the propagate (P) signals for bits [ 6 : 4 ] are true, i.e., this is a group propagate signal (illustrated in the figures using the symbol GRP).
- the signal pair GROUP_GENERATEH and GROUP_GENERATEL is also a mutually exclusive signal pair (illustrated in the figures using the symbols GGH and GGL, respectively) based upon the equation:
- FIG. 4 illustrates circuitry 70 and 90 containing modifications, respectively, to the aforedescribed circuits 50 and 60 for implementing the XOR operation in the CLA of logic block 40 .
- circuits 70 and 90 illustrate processing on the most significant bit position in the final stage of the CLA in logic block 40 .
- logic block 40 also includes additional known circuitry for processing of the other bits received from CSA 38 for the CLA operation.
- Circuit 70 includes redundant logic for implementing the XOR operation, and it includes two stages 72 and 76 corresponding with the functions of stages 52 and 54 .
- Circuit 70 also includes a redundant stage 74 for stage 72 , and a redundant stage 78 for stage 76 .
- transistors 80 , 82 , 84 and 86 implement the XOR operation in, respectively, stages 72 , 74 , 76 and 78 . Therefore, the result of the stages, without use of a second CSA (such as CSA 20 ), produces a SUML signal 88 and its complement, a signal sSUML 87 .
- a second CSA such as CSA 20
- Circuit 90 corresponds with circuit 60 and likewise illustrates modification to implement the XOR operation for the output complementary to stage 70 .
- Circuit 90 includes stages 92 and 96 corresponding with, respectively, stages 62 and 64 .
- Circuit 90 also includes a redundant stage 94 for stage 92 , and a redundant stage 98 for stage 96 .
- Each of these stages also includes an additional transistor for implementing the XOR operation.
- transistors 100 , 102 , 104 and 106 implement the XOR operation in, respectively, stages 92 , 94 , 96 , and 98 . Therefore, operation of these stages, without use of a second CSA, produces a SUMH signal 108 and its complement, a signal sSUMH 107 .
- the signals 87 , 88 , 107 , and 108 produce the same resulting shift value on line 48 as the shift value produced on line 26 by signals 56 , 58 , 66 , and 68 . Since the XOR operation is performed through modification of a CLA to generate these signals, as shown in circuits 70 and 90 , it occurs in parallel with the CLA operation and does not add any significant propagation delay. As described in connection with FIG.
- FIG. 5 is a transistor diagram of a control circuit 110 for generating the XOR control signals, XOR high (XORH) and XOR low (XORL), used in circuits 70 and 90 . These control signals correspond with control signal 46 .
- the operation of control circuit 110 to generate the XORH and XORL signals occurs in parallel with the CLA operation in logic block 40 or other processing and thus does not affect the overall delay for the CLA operation in logic block 40 .
- control circuit 110 receives as inputs a SIMD low (SIMDL) signal 112 , a SIMD high (SIMDH) signal 114 , a propogate (P) signal 116 , and a Generate_or Kill signal (GorK) 118 .
- SIMD low SIMD low
- SIMDDH SIMD high
- P propogate
- GorK Generate_or Kill signal
- Control circuit 110 logically processes these input signals to generate the XORL signal 120 and its complement, XORH signal 122 .
Abstract
Description
- This application is a continuation-in-part of U.S. patent application Ser. No. 09/507,376, filed Feb. 18, 2000, and entitled “Faster Shift Value Calculation Using Modified Carry-Lookahead Adder.”
- The present invention relates to an apparatus and method for use in implementing a floating point multiply-accumulate operation.
- Logic circuitry has been developed to implement a floating point multiply-accumulate operation (FMAC). This operation performs on three operands (A, B, C) the operation A*B+C. The FMAC operation is useful in that it can be used to implement both addition and multiplication in logic circuitry. In particular, for an add operation, the operand A is set to a value one. For a multiply operation, the operand C is set to a value zero.
- For example, FIG. 1 is a diagram of a
prior art circuit 10 for use in implementing an FMAC operation. Incircuit 10, threelatches second CSA 20 along with the value of a constant received online 22. Finally, the output of thesecond CSA adder 20 is input to a carry-lookahead adder (CLA) 24, which performs an add operation and outputs a resulting shift value online 26 for use in an FMAC operation. - The shift value is used to line up the mantissas for the add portion of the FMAC operation. The floating point numbers used by the FMAC operation are each expressed as a mantissa and an exponent. The result of the multiply operation (A*B) produces a product that typically has a different exponent than the exponent of operand C. The FMAC operation uses the shift value to shift, and hence “line up,” the mantissa of operand C for adding it with the mantissa of the A*B product. Although the mantissa of operand C is shifted, the mantissa of the A*B product could alternatively be shifted to perform the add operation. Calculating the shift value and performing the shifting of the mantissa of operand C occur during the multiply operation. The format of floating point numbers, the addition of floating point numbers and the multiplication of floating point numbers are known in the art.
- Using these multiple stages within
circuit 10 to produce the shift value can introduce a significant amount of delay in performing the FMAC operation. Accordingly, a need exists for a faster method of implementing an FMAC operation. - An embodiment consistent with the present invention reduces propagation delays within a circuit for performing an FMAC operation. An apparatus consistent with the present invention includes a plurality of latches for containing a plurality of operands. A CSA circuit performs a CSA operation on the operands to produce a first result, and a logic block performs a CLA operation on the first result to produce a second result. A logic circuit in the logic block performs a logic operation on the second result based upon a control signal to produce a shift value for use in the FMAC operation.
- A method consistent with the present invention includes receiving a plurality of operands. A CSA operation is performed on the operands to produce a first result, and a CLA operation is performed on the first result to produce a second result. A logic operation is performed on the second result, as part of the CLA operation, based upon a control signal to produce a shift value for use in the FMAC operation.
- The accompanying drawings are incorporated in and constitute a part of this specification and, together with the description, explain the advantages and principles of the invention. In the drawings,
- FIG. 1 is a logic diagram of a prior art circuit for use in implementing an FMAC operation;
- FIG. 2 is a logic diagram of a circuit for use in implementing an FMAC operation consistent with the present invention;
- FIG. 3 is a transistor diagram of prior art circuitry for use in implementing an FMAC operation corresponding with the logic diagram in FIG. 1;
- FIG. 4 is a transistor diagram of circuitry for use in implementing an FMAC operation corresponding with the logic diagram in FIG. 2; and
- FIG. 5 is a transistor diagram of a control circuit for generating control signals for use in implementing an FMAC operation using the circuitry shown in FIG. 4.
- Circuitry consistent with the present invention reduces propagation delays in performing an FMAC operation by eliminating one stage of logic used in generating a shift value for the operation. Another stage of logic is modified to perform a parallel logic operation and account for the reduced logic stage. This results in increased speed of execution in calculating the shift value for use in an FMAC operation.
- FIG. 2 is a logic diagram of a
circuit 30 for use in performing an FMAC operation consistent with the present invention.Circuit 30 illustrates modification ofprior art circuit 10 shown in FIG. 1.Circuit 30 includes threelatches CSA 38 receives the values of operands A, B, and C fromlatches circuit 30, however, a second CSA corresponding withCSA 20 incircuit 10 is eliminated. Elimination of the second CSA increases speed of calculation of the resulting shift value for use in an FMAC operation by eliminating one stage of logic; it thus reduces the corresponding propagation delays. - A
logic block 40 receives the outputs from CSA 38 and provides a resulting shift value online 48. The shift value is used, as explained above, to line up mantissas for the add operation. In this example,logic block 40 is implemented using a CLA that is modified to logically perform an exclusive-OR (XOR) operation on the result of the CLA operation based upon acontrol signal 46. The XOR function is performed on the most significant bit of the result. - As shown in FIGS. 1 and 2 of the Drawings,
CLAs -
Control signal 46 is generated based upon whether the FMAC operation is of Single Instruction, Multiple Data (SIMD) type or non-SIMD type. SIMD operations are known in the art. For example, SIMD indicates packing two single precision (32 bit) floating point numbers in registers normally meant for a single double precision (64 bit) floating point number. SIMD calculations are, accordingly, used where full precision floating point calculations are not needed, thereby doubling the throughput of operations by accepting only single precision results. More detail regarding the usage of SIMD in computation is found throughput the literature, e.g., Abel et al., “Applications Tuning for Streaming SIMD Extensions,” Intel Technology Journal Q2, 1999. - As explained below, the XOR operation can be implemented within the existing circuitry of a CLA in
logic block 40 and thus does not generate any additional propagation delay. The second CSA 20 can be eliminated based upon how the constant online 22 operates. In particular, the second CSA 20 incircuit 10 uses only the lower eight bits of the constant online 22, and those lower eight bits only vary in the most significant bit position. This variance is known because the FMAC operation uses a standard for operating on floating point numbers, as specified in IEEE Standard for Binary Floating-Point Arithmetic, IEEE Std. 754-1985, which is incorporated herein by reference. In addition, CSAs and CLAs, along with the operations they implement, are known in the art. In particular, the structure and workings of carry-save and carry-lookahead adders are well known in the art, as are the equations for sum, carry, propagate (P), generate (G) and kill (K). The basic principles for the implementation of such adders are set forth in numerous texts, such as Weste and Eshraghian, hereinabove, which is also incorporated herein by reference. It should be understood that these equations are readily implemented in static or dynamic logic families, e.g., single-rail or dual-rail (mousetrap) logic. - FIG. 3 is a transistor diagram of prior art circuitry for implementing a final stage in
CLA 24 ofprior art circuit 10. In comparison, FIG. 4 is a transistor diagram illustrating an example of how the prior art circuitry in FIG. 3 is modified to implement the XOR operation incircuit 30. Since CLAs are known in the art, only the final stage is shown for illustrative purposes. In addition, only the final stage is shown as modified in this example, although additional modifications may be made based on a particular use of the CLA. More particular, the circuitry of FIG. 3 illustrates operations on only the most significant bit (MSB) of data, e.g., bit [7] of bits [7:0] within a byte of data. - As shown in FIG. 3, a final stage in
CLA 24 includes two sets ofcircuits Circuit 50 includes afirst stage 52 andsecond stage 54 producing a summation low (SUML)signal 58 and its complement, asignal sSUML 56.Complementary circuitry 60 includes afirst stage 62 andsecond stage 64 producing a summation high (SUMH)signal 68 and its complement, signalsSUMH 66. The signals (CLK, DNG or GND, CARRY_INL, CARRY_INH, GROUP_PROPAGATE, GROUP_GENERATEH and GROUP_GENERATEL) shown incircuits - In particular, the signal pair CARRY_INH and CARRY_INL is the input carry signal from the least significant 4-bit nibble. These two signals (illustrated in the figures using the symbols CIH and CIL, respectively) are mutually exclusive. In other words, if there is a carry from the least-significant nibble into the next nibble, CARRY_INH=1 and CARRY_INL=0; if no carry, then the values are reversed. Again, only operations for the MSB, bit [7], are shown in the figures. The signal GROUP_PROPAGATE is true if and only if the propagate (P) signals for bits [6:4] are true, i.e., this is a group propagate signal (illustrated in the figures using the symbol GRP). The signal pair GROUP_GENERATEH and GROUP_GENERATEL is also a mutually exclusive signal pair (illustrated in the figures using the symbols GGH and GGL, respectively) based upon the equation:
- K[2]+P[2]*K[1]+P[2]*(P[1]*K[0])
- Thus, if the equation is true, then GROUP_GENERATEH=1 and GROUP_GENERATEL=0; if not true, then the values are reversed
- FIG. 4 illustrates
circuitry aforedescribed circuits logic block 40. As previously noted,circuits logic block 40. Accordingly,logic block 40 also includes additional known circuitry for processing of the other bits received fromCSA 38 for the CLA operation.Circuit 70, as shown, includes redundant logic for implementing the XOR operation, and it includes twostages stages Circuit 70 also includes aredundant stage 74 forstage 72, and aredundant stage 78 forstage 76. Within each of these stages an additional transistor implements the XOR operation. In particular,transistors SUML signal 88 and its complement, asignal sSUML 87. -
Circuit 90 corresponds withcircuit 60 and likewise illustrates modification to implement the XOR operation for the output complementary to stage 70.Circuit 90 includesstages 92 and 96 corresponding with, respectively, stages 62 and 64.Circuit 90 also includes aredundant stage 94 for stage 92, and aredundant stage 98 forstage 96. Each of these stages also includes an additional transistor for implementing the XOR operation. In particular,transistors SUMH signal 108 and its complement, asignal sSUMH 107. - Accordingly, the
signals line 48 as the shift value produced online 26 bysignals circuits circuits - FIG. 5 is a transistor diagram of a
control circuit 110 for generating the XOR control signals, XOR high (XORH) and XOR low (XORL), used incircuits control signal 46. The operation ofcontrol circuit 110 to generate the XORH and XORL signals occurs in parallel with the CLA operation inlogic block 40 or other processing and thus does not affect the overall delay for the CLA operation inlogic block 40. In operation,control circuit 110 receives as inputs a SIMD low (SIMDL) signal 112, a SIMD high (SIMDH) signal 114, a propogate (P) signal 116, and a Generate_or Kill signal (GorK) 118. These input signals are known in the art with respect to FMAC operations.Control circuit 110 logically processes these input signals to generate theXORL signal 120 and its complement,XORH signal 122. In particular,control circuit 110 implements the following logic functions to generate those signals: XORL=(SIMDL)(P)+(SIMDH)(GorK); XORH=(SIMDH)(P)+(SIMDL)(GorK). - Accordingly, with the use of these control signals an entire CSA has been eliminated within the exemplary implementation for use in implementing an FMAC operation. The resulting propagation delay has likewise been eliminated. This modification thus results in increased speed of calculation for the FMAC operation and corresponding improvement in performance for other circuitry that uses this implementation for the FMAC operation. Although dual rail Domino CMOS has been shown to implement the modified CLA operation, any type of suitable logic may be used. In addition, if a particular application does not require or use complementary outputs, then only one modified final stage in the CLA can be used.
- While the present invention has been described in connection with an exemplary embodiment, it will be understood that many modifications will be readily apparent to those skilled in the art, and this application is intended to cover any adaptations or variations thereof. For example, different types of CSAs and CLAs, different types of transistors to implement the XOR and other logic functions, different size operands, and various types of logic for generating the control signals may be used without departing from the scope of the invention. This invention should be limited only by the claims and equivalents thereof.
Claims (16)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/613,095 US7240085B2 (en) | 2000-02-18 | 2003-07-07 | Faster shift value calculation using modified carry-lookahead adder |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US50737600A | 2000-02-18 | 2000-02-18 | |
US10/613,095 US7240085B2 (en) | 2000-02-18 | 2003-07-07 | Faster shift value calculation using modified carry-lookahead adder |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US50737600A Continuation-In-Part | 2000-02-18 | 2000-02-18 |
Publications (2)
Publication Number | Publication Date |
---|---|
US20040068531A1 true US20040068531A1 (en) | 2004-04-08 |
US7240085B2 US7240085B2 (en) | 2007-07-03 |
Family
ID=24018399
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/613,095 Expired - Fee Related US7240085B2 (en) | 2000-02-18 | 2003-07-07 | Faster shift value calculation using modified carry-lookahead adder |
US10/853,518 Expired - Fee Related US7444366B2 (en) | 2000-02-18 | 2004-05-26 | Faster shift value calculation using modified carry-lookahead adder |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/853,518 Expired - Fee Related US7444366B2 (en) | 2000-02-18 | 2004-05-26 | Faster shift value calculation using modified carry-lookahead adder |
Country Status (2)
Country | Link |
---|---|
US (2) | US7240085B2 (en) |
DE (1) | DE10050589B4 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111106825A (en) * | 2018-10-25 | 2020-05-05 | Arm 有限公司 | Data compressor logic circuit |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8402075B2 (en) * | 2009-03-16 | 2013-03-19 | Advanced Micro Devices, Inc. | Mechanism for fast detection of overshift in a floating point unit of a processing device |
US9213523B2 (en) * | 2012-06-29 | 2015-12-15 | Intel Corporation | Double rounded combined floating-point multiply and add |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5636157A (en) * | 1994-10-03 | 1997-06-03 | International Business Machines Corporation | Modular 64-bit integer adder |
US5719803A (en) * | 1996-05-31 | 1998-02-17 | Hewlett-Packard Company | High speed addition using Ling's equations and dynamic CMOS logic |
US5790444A (en) * | 1996-10-08 | 1998-08-04 | International Business Machines Corporation | Fast alignment unit for multiply-add floating point unit |
Family Cites Families (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4228520A (en) * | 1979-05-04 | 1980-10-14 | International Business Machines Corporation | High speed multiplier using carry-save/propagate pipeline with sparse carries |
US4425623A (en) | 1981-07-14 | 1984-01-10 | Rockwell International Corporation | Lookahead carry circuit apparatus |
JPS5892036A (en) | 1981-11-27 | 1983-06-01 | Toshiba Corp | Addition circuit |
EP0152939B1 (en) | 1984-02-20 | 1993-07-28 | Hitachi, Ltd. | Arithmetic operation unit and arithmetic operation circuit |
JPS61213927A (en) | 1985-03-18 | 1986-09-22 | Hitachi Ltd | Processor for floating point arithmetic |
US4737926A (en) | 1986-01-21 | 1988-04-12 | Intel Corporation | Optimally partitioned regenerative carry lookahead adder |
US4811272A (en) | 1987-05-15 | 1989-03-07 | Digital Equipment Corporation | Apparatus and method for an extended arithmetic logic unit for expediting selected floating point operations |
US4841467A (en) | 1987-10-05 | 1989-06-20 | General Electric Company | Architecture to implement floating point multiply/accumulate operations |
US5043934A (en) | 1990-02-13 | 1991-08-27 | Hewlett-Packard Company | Lookahead adder with universal logic gates |
US5166899A (en) | 1990-07-18 | 1992-11-24 | Hewlett-Packard Company | Lookahead adder |
US5479356A (en) | 1990-10-18 | 1995-12-26 | Hewlett-Packard Company | Computer-aided method of designing a carry-lookahead adder |
US5253195A (en) * | 1991-09-26 | 1993-10-12 | International Business Machines Corporation | High speed multiplier |
US5195051A (en) | 1992-03-31 | 1993-03-16 | Intel Corporation | Computation of sign bit and sign extension in the partial products in a floating point multiplier unit |
US5351207A (en) | 1992-08-31 | 1994-09-27 | Intel Corporation | Methods and apparatus for subtraction with 3:2 carry-save adders |
US5508952A (en) | 1993-10-19 | 1996-04-16 | Kantabutra; Vitit | Carry-lookahead/carry-select binary adder |
US5757686A (en) | 1995-11-30 | 1998-05-26 | Hewlett-Packard Company | Method of decoupling the high order portion of the addend from the multiply result in an FMAC |
JPH09231055A (en) | 1996-02-27 | 1997-09-05 | Denso Corp | Logical operation circuit and carry look-ahead adder |
US5892698A (en) | 1996-04-04 | 1999-04-06 | Hewlett-Packard Company | 2's complement floating-point multiply accumulate unit |
US5860017A (en) | 1996-06-28 | 1999-01-12 | Intel Corporation | Processor and method for speculatively executing instructions from multiple instruction streams indicated by a branch instruction |
US5859999A (en) | 1996-10-03 | 1999-01-12 | Idea Corporation | System for restoring predicate registers via a mask having at least a single bit corresponding to a plurality of registers |
US5944777A (en) | 1997-05-05 | 1999-08-31 | Intel Corporation | Method and apparatus for generating carries in an adder circuit |
-
2000
- 2000-10-12 DE DE10050589A patent/DE10050589B4/en not_active Expired - Fee Related
-
2003
- 2003-07-07 US US10/613,095 patent/US7240085B2/en not_active Expired - Fee Related
-
2004
- 2004-05-26 US US10/853,518 patent/US7444366B2/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5636157A (en) * | 1994-10-03 | 1997-06-03 | International Business Machines Corporation | Modular 64-bit integer adder |
US5719803A (en) * | 1996-05-31 | 1998-02-17 | Hewlett-Packard Company | High speed addition using Ling's equations and dynamic CMOS logic |
US5790444A (en) * | 1996-10-08 | 1998-08-04 | International Business Machines Corporation | Fast alignment unit for multiply-add floating point unit |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111106825A (en) * | 2018-10-25 | 2020-05-05 | Arm 有限公司 | Data compressor logic circuit |
Also Published As
Publication number | Publication date |
---|---|
DE10050589A1 (en) | 2001-08-30 |
US7444366B2 (en) | 2008-10-28 |
DE10050589B4 (en) | 2006-04-06 |
US7240085B2 (en) | 2007-07-03 |
US20040220991A1 (en) | 2004-11-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7395304B2 (en) | Method and apparatus for performing single-cycle addition or subtraction and comparison in redundant form arithmetic | |
EP0328063B1 (en) | Absolute value calculating circuit having a single adder | |
US6411980B2 (en) | Data split parallel shifter and parallel adder/subtractor | |
GB2172129A (en) | Adder/subtractor | |
KR100294015B1 (en) | Adder for handling multiple data with different data type | |
JPH0785221B2 (en) | Complementizer | |
US4878192A (en) | Arithmetic processor and divider using redundant signed digit arithmetic | |
WO1999040508A1 (en) | Fast adder/subtractor for signed floating point numbers | |
US8429213B2 (en) | Method of forcing 1's and inverting sum in an adder without incurring timing delay | |
WO2005010746A1 (en) | Arithmetic unit for addition or subtraction with preliminary saturation detection | |
Burgess | The flagged prefix adder and its applications in integer arithmetic | |
US7337202B2 (en) | Shift-and-negate unit within a fused multiply-adder circuit | |
KR20060128007A (en) | Arithmetic circuit with balanced logic levels for low-power operation | |
US7240085B2 (en) | Faster shift value calculation using modified carry-lookahead adder | |
US6813628B2 (en) | Method and apparatus for performing equality comparison in redundant form arithmetic | |
US6546411B1 (en) | High-speed radix 100 parallel adder | |
US4890127A (en) | Signed digit adder circuit | |
US6826588B2 (en) | Method and apparatus for a fast comparison in redundant form arithmetic | |
US5333120A (en) | Binary two's complement arithmetic circuit | |
KR20010014902A (en) | Three input split-adder | |
US6484193B1 (en) | Fully pipelined parallel multiplier with a fast clock cycle | |
US20030084084A1 (en) | Method and apparatus for a multi-purpose domino adder | |
US5978826A (en) | Adder with even/odd 1-bit adder cells | |
US7051062B2 (en) | Apparatus and method for adding multiple-bit binary-strings | |
US4979140A (en) | Signed digit adder circuit |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P., TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:THAYER, PAUL R.;KUMAR, SANJAY;REEL/FRAME:015368/0404;SIGNING DATES FROM 20040319 TO 20040516 Owner name: INTEL CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:THAYER, PAUL R.;KUMAR, SANJAY;REEL/FRAME:015368/0404;SIGNING DATES FROM 20040319 TO 20040516 |
|
CC | Certificate of correction | ||
FPAY | Fee payment |
Year of fee payment: 4 |
|
REMI | Maintenance fee reminder mailed | ||
LAPS | Lapse for failure to pay maintenance fees | ||
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20150703 |