US6539413B1 - Prefix tree adder with efficient sum generation - Google Patents

Prefix tree adder with efficient sum generation Download PDF

Info

Publication number
US6539413B1
US6539413B1 US09/525,644 US52564400A US6539413B1 US 6539413 B1 US6539413 B1 US 6539413B1 US 52564400 A US52564400 A US 52564400A US 6539413 B1 US6539413 B1 US 6539413B1
Authority
US
United States
Prior art keywords
adder
group
sum
signal
generate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
US09/525,644
Inventor
Alexander Goldovsky
Hosahalli R. Srinivas
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Avago Technologies International Sales Pte Ltd
Nokia of America Corp
Original Assignee
Agere Systems LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agere Systems LLC filed Critical Agere Systems LLC
Priority to US09/525,644 priority Critical patent/US6539413B1/en
Assigned to LUCENT TECHNOLOGIES INC. reassignment LUCENT TECHNOLOGIES INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SRINIVAS, HOSAHALLI R., GOLDOVSKY, ALEXANDER
Application granted granted Critical
Publication of US6539413B1 publication Critical patent/US6539413B1/en
Assigned to DEUTSCHE BANK AG NEW YORK BRANCH, AS COLLATERAL AGENT reassignment DEUTSCHE BANK AG NEW YORK BRANCH, AS COLLATERAL AGENT PATENT SECURITY AGREEMENT Assignors: AGERE SYSTEMS LLC, LSI CORPORATION
Assigned to AVAGO TECHNOLOGIES GENERAL IP (SINGAPORE) PTE. LTD. reassignment AVAGO TECHNOLOGIES GENERAL IP (SINGAPORE) PTE. LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: AGERE SYSTEMS LLC
Assigned to AGERE SYSTEMS LLC, LSI CORPORATION reassignment AGERE SYSTEMS LLC TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENT RIGHTS (RELEASES RF 032856-0031) Assignors: DEUTSCHE BANK AG NEW YORK BRANCH, AS COLLATERAL AGENT
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/50Adding; Subtracting
    • G06F7/505Adding; Subtracting in bit-parallel fashion, i.e. having a different digit-handling circuit for each denomination
    • G06F7/506Adding; Subtracting in bit-parallel fashion, i.e. having a different digit-handling circuit for each denomination with simultaneous carry generation for, or propagation over, two or more stages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2207/00Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F2207/506Indexing scheme relating to groups G06F7/506 - G06F7/508
    • G06F2207/50632-input gates, i.e. only using 2-input logical gates, e.g. binary carry look-ahead, e.g. Kogge-Stone or Ladner-Fischer adder

Definitions

  • the present invention relates generally to electronic circuits and more particularly to adder circuits for use in semiconductor integrated circuits and other electronic devices.
  • the recursive carry computation can also be reduced to a prefix computation, as described in, e.g., P. M. Kogge and H. S. Stone, “A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations,” IEEE Trans. on Computers, Vol. C-22, No.8, pp. 786-793, August 1973.
  • a prefix tree can be used to compute the carry at the most-significant bit position, and an additional tree superimposed on the prefix tree can be used to compute the intermediate carries.
  • a problem associated with the above-noted full prefix tree adders which are also known as Kogge-Stone adders, is the additional delay introduced as a result of exponentially growing interconnection complexity.
  • Existing architecture tradeoffs have emphasized reduction of interconnection complexity at the expense of higher gate fanouts.
  • Interconnection complexity can also be reduced by using hybrid carry lookahead/carry select architectures which eliminate the need to implement a full prefix tree for each bit position.
  • the use of low resistance and low capacitance materials can reduce the negative effects of architectures that depend on large amounts of interconnect, as described in J. Silberman et al., “A 1.0 GHz Single-Issue 64b PowerPC Integer Processor,” IEEE Intl. Solid-State Circuits Conf., pp. 230-231, February 1998.
  • the area overhead required to implement such adders is alleviated through the use of extensive “over-the-cell” routing, which removes the routing channels and further minimizes the interconnect capacitance.
  • g j a j ⁇ b j
  • p j a j ⁇ b j
  • c j g j + p j ⁇ c j - 1
  • s j p j ⁇ c j - 1 ⁇ ⁇ ⁇ ⁇ j ⁇ 0 ⁇ j ⁇ n
  • c ⁇ 1 is the primary carry-input.
  • the signals designated g j , p j and c j are referred to herein as generate, propagate and carry signals, respectively.
  • the fundamental carry operator o is both associative and idempotent. At each bit position, the carry is given by
  • c ⁇ 1 is the primary carry input. If there is no primary carry input, then c j is simply G 0 j .
  • An additional speedup in the above-described conventional prefix tree adder can be achieved by using transmit signals t j instead of propagate signals p j to compute the carries for each bit position.
  • the final sum computation still requires the propagate signals p j to be generated from the primary inputs.
  • c ⁇ 1 is the primary carry input. If there is no primary carry input, then c j is simply G 0 j .
  • the t j signals can be computed faster than the p j signals since an OR gate is typically faster than an XOR gate. Hence, the carry computation through the prefix trees can start slightly earlier if the transmit signals are used. Since the sum generation step still uses the propagate signals, the load on the transmit signals in this architecture is smaller than the load on the propagate signals in the architecture which uses the p j signals to compute the carries. However, the load on the input signals is now higher since both transmit and propagate signals need to be generated.
  • the invention provides an improved prefix tree adder in which a significant delay reduction is achieved by implementing sum computation logic circuitry in a final stage of the adder so as to exploit the differing delays with which group-generate (G), group-transmit (T) and intermediate carries (c) are generated. Previous adder designs have not exploited these final-stage delay differences to reduce the overall computation delay of the adder.
  • an n-bit prefix tree adder includes n prefix trees, each associated with a bit position of the adder and including a number of computation stages.
  • the computation stages for each of the bit positions include a sum computation stage implemented in logic circuitry.
  • the corresponding sum computation logic circuitry computes a sum based at least in part on group-generate, group-transmit and intermediate carry signals.
  • the sum computation logic circuitry is configured to exploit differences in delay associated with generation of the group-generate, group-transmit and intermediate carry signals, so as to reduce the total computational delay of the adder.
  • additional delay reduction may be achieved by configuring the sum computation stages of the adder in accordance with a left-to-right routing of most-significant group-generate and group-transmit signals, such that the most-significant half of the sum bits are generated in the same prefix trees in which the least-significant half of the sum bits are generated.
  • the adder architecture of the present invention provides a reduced computational delay relative to conventional architectures.
  • the techniques of the invention are applicable to a wide variety of prefix tree adders, including both radix-2 adders and non-radix-2 adders.
  • FIG. 1 shows a set of prefix trees for an n-bit prefix tree adder with carry incorporated into the tree as described in the above-cited U.S. patent application Ser. No. 09/291,677.
  • FIG. 2 is a diagram illustrating maximum accumulated stage delays for a 32-bit prefix tree adder of the type illustrated in FIG. 1 .
  • FIG. 3 ( a ) shows logic circuitry used in the last stage of a prefix tree adder of the type illustrated in FIG. 1 for calculating a final sum result.
  • FIGS. 3 ( b ) and 3 ( c ) show logic circuitry used in the last stage of a prefix tree adder for calculating a final sum result, in accordance with an illustrative embodiment of the present invention.
  • FIG. 4 shows a set of prefix trees for an improved n-bit prefix tree adder in accordance with an illustrative embodiment of the present invention.
  • FIG. 5 is a diagram illustrating maximum accumulated stage delays for a 32-bit prefix tree adder of the type illustrated in FIG. 4 .
  • FIG. 1 shows a set of superimposed prefix trees 10 for an n-bit prefix tree adder of the type described in the above-cited U.S. patent application Ser. No. 09/291,677.
  • the general algorithm for an n-bit radix-2 prefix tree adder of this type is described below.
  • Step 2 ( ⁇ log 2 n ⁇ stages):
  • c j G j ⁇ 2 k ⁇ 1 +1 j +T j ⁇ 2 k ⁇ 1 +1 j c j ⁇ 2 k ⁇ 1 ⁇ j 2 k ⁇ 1 ⁇ 1 ⁇ j ⁇ 2 k ⁇ 1 ,
  • c n ⁇ 1 G 0 n ⁇ 1 +T 0 n ⁇ 1 c ⁇ 1 .
  • the squares at the top of the figure compute g j , t j and p j for each bit position in accordance with Step 1.
  • the empty circles apply the fundamental carry operator in accordance with Step 2.
  • the filled circles represent buffers.
  • the crossed circles compute carries in accordance with Step 2 and Step 3 above.
  • the diamonds at the bottom of the figure generate the sum at each bit position from the p j signal in accordance with the equation of Step 3. It should be noted that the sum computation of in Step 3 occurs in parallel with the computation of the final carry output c n ⁇ 1 in Step 3.
  • the logic depth of an n-bit prefix tree adder configured as shown in FIG. 1 is 2+ ⁇ log 2 n ⁇ , and the fanout of the carry input c ⁇ 1 is 1+ ⁇ log 2 n ⁇ .
  • the above-described algorithm can also be extended in a straightforward manner to higher radix prefix trees.
  • This gate level model specifies that a 2-input NAND or NOR gate has a delay of ⁇ , while XOR/XNOR, AOI (and-or-invert), OAI (or-and-invert) and 2-to-1 multiplexer gates each have a delay of 1.5* ⁇ .
  • the interconnect delay is modeled as ⁇ v for a minimum width routing along the vertical pitch of the corresponding circuit design, and as ⁇ h for a minimum width routing along the horizontal pitch of the design.
  • the critical path delay for an n-bit adder design (with a total of ( ⁇ log 2 n ⁇ +2) logic stages) of the type illustrated in FIG. 1 is as follows:
  • ⁇ s j 1.5* ⁇ + ⁇ c j ⁇ 1 + ⁇ h , ⁇ j 0 ⁇ j ⁇ n,
  • ⁇ c n ⁇ 1 1.5* ⁇ + ⁇ G 0 n ⁇ 1 + ⁇ h .
  • ⁇ t j is selected to be the worst delay from stage 1 since an OR/NOR gate is typically slower than an AND/NAND gate.
  • FIG. 2 shows a graph of the maximum accumulated stage delays for group-generate (G), group-transmit (T), and intermediate carries (c) for a 32-bit prefix tree adder design of the type shown in FIG. 1, i.e., an adder design with c routing over n/2 bits. It can be seen from the graph that the group-generate, the group-transmit, and the intermediate carries are all generated with differing delays, and that this difference is maximum at the final stage of the parallel prefix tree of the adder.
  • the present invention provides an improved prefix tree adder design which significantly reduces delay relative to the FIG. 1 adder design. More particularly, the invention in an illustrative embodiment exploits the above-described difference in the delays for computing the group-generate, group-transmit, and the intermediate carries in the final stage of the prefix tree, by combining the last two stages of the most-significant half of the adder into a single stage. As will be described in greater detail below, this may be done by altering the carry and sum generation equations in the adder algorithm so as to take advantage of the latency of the signals.
  • FIG. 3 ( a ) shows the Boolean logic used in the last stage, i.e., the sum generation stage, of the prefix tree adder of FIG. 1 .
  • FIGS. 3 ( b ) and 3 ( c ) show the Boolean logic of two cells which may be used in the most-significant half of the final stage of the adder in order to decrease the adder delay in accordance with the present invention. More specifically, FIG. 3 ( b ) shows the Boolean logic used in the last stage of an improved prefix tree adder in accordance with the invention for all values of j such that n>j ⁇ 3 ⁇ 4n, while FIG. 3 ( c ) shows the Boolean logic used in the last stage of the prefix tree adder for all values of j such that 3 ⁇ 4n>j ⁇ n/2.
  • Step 2 ( ⁇ log 2 n ⁇ 1 stages)
  • Step 3 (1 stage) for the final stage, calculate
  • c n ⁇ 1 G 0 n ⁇ 1 +T 0 n ⁇ 1 c ⁇ 1 .
  • a further improvement in computation speed is possible in accordance with the invention by rearranging the physical layout of the last stage of the adder so that the upper or most-significant half of the sum bits are generated in the same column as the lower or least-significant half of the sum bits. This reduces the routing delay on the intermediate carry signals that are on the critical path and therefore speeds up the sum computation.
  • Such an arrangement may be implemented as a left-to-right routing of the most-significant group-generate and group-transmit signals, and may be referred to as a “folded” arrangement.
  • This further improvement is particularly useful for adders having a large word length, i.e., a word length greater than or equal to 32, and for adder applications in which a regular layout is not required.
  • FIG. 4 shows a set of superimposed prefix trees 40 for an n-bit prefix tree adder incorporating the above-described improvements.
  • the empty circles apply the fundamental carry operator in accordance with Step 2.
  • the filled circles represent buffers.
  • the crossed circles compute carries in accordance with Step 2 and Step 3 of the general algorithm.
  • the empty diamonds represent the Boolean logic of FIG. 3 ( b )
  • the filled diamonds represent the Boolean logic of FIG. 3 ( c )
  • the crossed rectangles represent logic that implements the sum computation equation in Step 3 for values of j such that n/2>j ⁇ 0.
  • FIGS. 3 ( a ) and 3 ( b ) are shown by way of example only. Those skilled in the art will recognize that numerous alternative arrangements of logic circuitry may be used to exploit the differences in delay in the group-generate, group-transmit and intermediate carry signals in accordance with the techniques of the present invention.
  • the improved prefix tree adder of FIG. 4 has a logic depth of 2+ ⁇ log 2 n ⁇ , and the fanout of the carry input c ⁇ 1 is 1+ ⁇ log 2 n ⁇ .
  • the above-described general algorithm for the improved prefix tree adder can be extended in a straightforward manner to higher radix prefix trees.
  • FIG. 5 shows a graph of the maximum accumulated stage delays for group-generate (G), group-transmit (T), and intermediate carries (c) for a 32-bit prefix tree adder design of the type shown in FIG. 4, i.e., an adder design with G and T routing over n/2 bits. It is apparent from the graph that the delay of the improved prefix tree adder is smaller than that of the adder of FIG. 1 .
  • the adder architecture of the present invention thus reduces the gate delay of an n-bit prefix tree adder, as compared to existing architectures such as that illustrated in FIG. 1, while providing the same logic depth, fanout and wiring complexity.
  • a fully-static 32-bit radix-2 prefix tree adder configured in accordance with the invention has a delay on the order of 0.7 nsec in a 0.16 ⁇ m static CMOS implementation.
  • the wiring complexity is manageable in 0.16 ⁇ m technology using five layers of interconnect.
  • static circuits were used in the above-described illustrative 32-bit implementations, it should be noted that the invention may be implemented using either static circuits, dynamic circuits or combinations of both static and dynamic circuits. Static circuits are often preferred to dynamic circuits because of their ease of design.
  • Adders in accordance with the invention may be used as elements of many different types of circuits, such as, e.g., arithmetic logic units (ALUs), multiply-add units, and comparators.
  • ALUs arithmetic logic units
  • the invention can be incorporated in a wide variety of integrated circuits or other processing devices, including, e.g., microprocessors, digital signal processors (DSPs), microcontrollers, application-specific integrated circuits (ASICs), memory circuits, telecommunications hardware and other types of processing devices.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Mathematical Optimization (AREA)
  • General Engineering & Computer Science (AREA)
  • Logic Circuits (AREA)

Abstract

An n-bit prefix tree adder includes n prefix trees, each associated with a bit position of the adder and including a number of computation stages. The computation stages for each of the bit positions include a sum computation stage implemented in logic circuitry. For a subset of the bit positions, the corresponding sum computation logic circuitry computes a sum based at least in part on group-generate, group-transmit and intermediate carry signals. Advantageously, the sum computation logic circuitry is configured to exploit differences in delay associated with generation of the group-generate, group-transmit and intermediate carry signals, so as to reduce the total computational delay of the adder. Additional delay reduction may be achieved by configuring the sum computation stages of the adder in accordance with a left-to-right routing of the group-generate and group-transmit signals, such that a most-significant half of a given set of sum bits are generated in the same prefix trees as a least-significant half of the sum bits.

Description

RELATED APPLICATION
The present application is related to U.S. patent application Ser. No. 09/291,677 filed Apr. 14, 1999 in the name of inventors M. Besz et al. and entitled “Prefix Tree Adder with Efficient Carry Generation,” which is incorporated by reference herein.
FIELD OF THE INVENTION
The present invention relates generally to electronic circuits and more particularly to adder circuits for use in semiconductor integrated circuits and other electronic devices.
BACKGROUND OF THE INVENTION
As a result of ever-shrinking very large scale integration (VLSI) process geometries, it has become necessary to reexamine the tradeoffs that have been made in the existing design and implementation of computer arithmetic algorithms. Algorithms utilizing the so-called carry lookahead technique, as described in A. Weinberger and J. L. Smith, “A One-Microsecond Adder Using One-Megacycle Circuitry,” IRE Trans. on Electronic Computers, pp. 65-73, June 1956, speed up the addition process by unrolling a recursive carry equation. Both transistor count and interconnection complexity have typically limited the maximum unrolling to 4 bits. Larger adders have been built as block carry-lookahead adders, where the lookahead operation occurs within small blocks, as described in T.-F. Ngai et al., “Regular, Area-Time Efficient Carry-Lookahead Adders,” Journal of Parallel and Distributed Computing, Vol. 3, pp. 92-105, 1986.
The recursive carry computation can also be reduced to a prefix computation, as described in, e.g., P. M. Kogge and H. S. Stone, “A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations,” IEEE Trans. on Computers, Vol. C-22, No.8, pp. 786-793, August 1973. As described in R. P. Brent and H. T. Kung, “A Regular Layout for Parallel Adders,” IEEE Trans. on Computers, Vol. C-31, No. 3, pp. 260-264, March 1982, a prefix tree can be used to compute the carry at the most-significant bit position, and an additional tree superimposed on the prefix tree can be used to compute the intermediate carries. Faster computation of all the carries can be achieved by using a separate prefix tree for each bit position, as described in D. Dozza et al., “A 3.5 NS, 64 Bit, Carry-Lookahead Adder,” in Proc. Intl. Symp. Circuits and Systems, pp. 297-300, 1996.
A problem associated with the above-noted full prefix tree adders, which are also known as Kogge-Stone adders, is the additional delay introduced as a result of exponentially growing interconnection complexity. Existing architecture tradeoffs have emphasized reduction of interconnection complexity at the expense of higher gate fanouts. Interconnection complexity can also be reduced by using hybrid carry lookahead/carry select architectures which eliminate the need to implement a full prefix tree for each bit position. The use of low resistance and low capacitance materials can reduce the negative effects of architectures that depend on large amounts of interconnect, as described in J. Silberman et al., “A 1.0 GHz Single-Issue 64b PowerPC Integer Processor,” IEEE Intl. Solid-State Circuits Conf., pp. 230-231, February 1998. Furthermore, with additional levels of interconnect, the area overhead required to implement such adders is alleviated through the use of extensive “over-the-cell” routing, which removes the routing channels and further minimizes the interconnect capacitance.
The operation of a conventional prefix tree adder will now be described in greater detail. In a general n-bit prefix tree adder, the addition of two numbers A and B, A = - a n - 1 2 n - 1 + j = 0 n - 2 a j 2 j B = - b n - 1 2 n - 1 + j = 0 n - 2 b j 2 j
Figure US06539413-20030325-M00001
represented in two's complement binary form, can be accomplished by computing: g j = a j b j p j = a j b j c j = g j + p j c j - 1 s j = p j c j - 1 } j 0 j < n
Figure US06539413-20030325-M00002
where c−1 is the primary carry-input. The signals designated gj, pj and cj are referred to herein as generate, propagate and carry signals, respectively. The resulting sum of A and B is S = - s n - 1 2 n - 1 + j = 0 n - 2 s j 2 j .
Figure US06539413-20030325-M00003
An overflow occurs, and the resulting sum is invalid, if
c n−1 ⊕c n−2=1.
The above-cited Dozza et al. reference defines (Gj j, Pj j)=(gj, pj), and
(G i j ,P i j)=(g j ,p j)o(g j−1 ,p j−1)o . . . o(g i ,p i) if j>i,
where o is the fundamental carry operator described in the above-cited Brent and Kung reference and defined as
(g j ,p j)o(g i ,p i)=((gj +p j g i)p j p i).
The fundamental carry operator o is both associative and idempotent. At each bit position, the carry is given by
c j =G 0 j +P 0 j c −1
where c−1 is the primary carry input. If there is no primary carry input, then cj is simply G0 j.
An additional speedup in the above-described conventional prefix tree adder can be achieved by using transmit signals tj instead of propagate signals pj to compute the carries for each bit position. The final sum computation still requires the propagate signals pj to be generated from the primary inputs. However, the propagate signal pj can be computed as pj={overscore (g)}jtj, in order to reduce the load on the primary inputs and to eliminate the need for an XOR gate for generating the propagate signal pj.
The addition operation in this case is defined as g j = a j b j t j = a j + b j p j = a j b j = g _ j t j c j = g j + t j c j - 1 s j = p j c j - 1 } j 0 j < n
Figure US06539413-20030325-M00004
where (Gj j, Tj j) (gj, tj), and
(G j j ,T j j)=(g j ,t j)o(g j−1 ,t j−1)o . . . o(g i , t i) if j>i,
where o is the fundamental carry operator. The computation of (G0 j, T0 j) ∀j follows the same methodology as above for (G0 j, P0 j). The carry cj for each bit position is then given by
c j =G 0 j +T 0 j c −1
where c−1 is the primary carry input. If there is no primary carry input, then cj is simply G0 j.
The tj signals can be computed faster than the pj signals since an OR gate is typically faster than an XOR gate. Hence, the carry computation through the prefix trees can start slightly earlier if the transmit signals are used. Since the sum generation step still uses the propagate signals, the load on the transmit signals in this architecture is smaller than the load on the propagate signals in the architecture which uses the pj signals to compute the carries. However, the load on the input signals is now higher since both transmit and propagate signals need to be generated.
Improved prefix tree adders which provide significant reductions in logic depth, delay and circuit area relative to the above-described conventional prefix tree adders are disclosed in the above-cited U.S. patent application Ser. No. 09/291,677. Although these improved prefix tree adders provide substantial advantages over conventional prefix tree adders, a need nonetheless remains for further improvements, particularly in terms of the computational delay parameter.
SUMMARY OF THE INVENTION
The invention provides an improved prefix tree adder in which a significant delay reduction is achieved by implementing sum computation logic circuitry in a final stage of the adder so as to exploit the differing delays with which group-generate (G), group-transmit (T) and intermediate carries (c) are generated. Previous adder designs have not exploited these final-stage delay differences to reduce the overall computation delay of the adder.
In accordance with one aspect of the present invention, an n-bit prefix tree adder includes n prefix trees, each associated with a bit position of the adder and including a number of computation stages. The computation stages for each of the bit positions include a sum computation stage implemented in logic circuitry. For at least a subset of the bit positions, the corresponding sum computation logic circuitry computes a sum based at least in part on group-generate, group-transmit and intermediate carry signals. Advantageously, the sum computation logic circuitry is configured to exploit differences in delay associated with generation of the group-generate, group-transmit and intermediate carry signals, so as to reduce the total computational delay of the adder.
In accordance with another aspect of the invention, additional delay reduction may be achieved by configuring the sum computation stages of the adder in accordance with a left-to-right routing of most-significant group-generate and group-transmit signals, such that the most-significant half of the sum bits are generated in the same prefix trees in which the least-significant half of the sum bits are generated.
The adder architecture of the present invention provides a reduced computational delay relative to conventional architectures. The techniques of the invention are applicable to a wide variety of prefix tree adders, including both radix-2 adders and non-radix-2 adders. These and other features and advantages of the present invention will become more apparent from the accompanying drawings and the following detailed description.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 shows a set of prefix trees for an n-bit prefix tree adder with carry incorporated into the tree as described in the above-cited U.S. patent application Ser. No. 09/291,677.
FIG. 2 is a diagram illustrating maximum accumulated stage delays for a 32-bit prefix tree adder of the type illustrated in FIG. 1.
FIG. 3(a) shows logic circuitry used in the last stage of a prefix tree adder of the type illustrated in FIG. 1 for calculating a final sum result.
FIGS. 3(b) and 3(c) show logic circuitry used in the last stage of a prefix tree adder for calculating a final sum result, in accordance with an illustrative embodiment of the present invention.
FIG. 4 shows a set of prefix trees for an improved n-bit prefix tree adder in accordance with an illustrative embodiment of the present invention.
FIG. 5 is a diagram illustrating maximum accumulated stage delays for a 32-bit prefix tree adder of the type illustrated in FIG. 4.
DETAILED DESCRIPTION OF THE INVENTION
The present invention will be illustrated below in conjunction with exemplary prefix tree adders. It should be understood, however, that the invention is not limited to use with any particular type of adder, but is instead more generally applicable to any carry-lookahead adder in which it is desirable to provide a significant reduction in critical path delay without unduly increasing the cost or complexity of the adder circuit. For example, although illustrated using radix-2 carry-lookahead prefix tree adders, it will be apparent to those skilled in the art that the disclosed techniques are readily applicable to other types of adders, including non-radix-2 adders.
FIG. 1 shows a set of superimposed prefix trees 10 for an n-bit prefix tree adder of the type described in the above-cited U.S. patent application Ser. No. 09/291,677. The general algorithm for an n-bit radix-2 prefix tree adder of this type is described below.
Step 1 (1 stage):
Calculate g j = a j b j t j = a j + b j p j = a j b j = g _ j t j } j 0 j < n
Figure US06539413-20030325-M00005
Step 2 (┌log2 n┐ stages):
For k=1 . . . ┌log2 n┐ calculate
c j =G j−2 k−1 +1 j +T j−2 k−1 +1 j c j−2 k−1 j 2k−1−1≦j<2k−1,
(G j−2 k +1 j ,T j−2 k +1 j)=(G j−2 k−1 +1 j ,T j−2 k−1 +1 j)o(G j−2 k +1 j−2 k−1 ,T J−2 k +1 j−2 k−1 ) j 2k−1 ≦j<n.
Step 3 (1 stage)
Calculate
s j =p j ⊕c j−1 j 0≦j<n,
and
c n−1 =G 0 n−1 +T 0 n−1 c −1.
In the set of prefix trees of FIG. 1, the squares at the top of the figure compute gj, tj and pj for each bit position in accordance with Step 1. The empty circles apply the fundamental carry operator in accordance with Step 2. The filled circles represent buffers. The crossed circles compute carries in accordance with Step 2 and Step 3 above. The diamonds at the bottom of the figure generate the sum at each bit position from the pj signal in accordance with the equation of Step 3. It should be noted that the sum computation of in Step 3 occurs in parallel with the computation of the final carry output cn−1 in Step 3.
The logic depth of an n-bit prefix tree adder configured as shown in FIG. 1 is 2+┌log2 n┐, and the fanout of the carry input c−1 is 1+┌log2 n┐. The above-described algorithm can also be extended in a straightforward manner to higher radix prefix trees.
In order to quantify the computational delay of the FIG. 1 adder for comparison purposes, an exemplary gate level model will be used. This gate level model specifies that a 2-input NAND or NOR gate has a delay of δ, while XOR/XNOR, AOI (and-or-invert), OAI (or-and-invert) and 2-to-1 multiplexer gates each have a delay of 1.5*δ. The interconnect delay is modeled as δv for a minimum width routing along the vertical pitch of the corresponding circuit design, and as δh for a minimum width routing along the horizontal pitch of the design. These delays are assumed for a fundamental carry operator fanout of two. Increased fanout of the input carry does not affect the total delay because the carry input is available at the very instant the inputs are applied to the adder, and is used only in the last stage. The contribution of the vertical routing will generally be common across the illustrative adders considered herein and hence may be ignored during comparison. The term Δr will be used in the following description to represent the delay of a given signal r.
The critical path delay for an n-bit adder design (with a total of (┌log2 n┐+2) logic stages) of the type illustrated in FIG. 1 is as follows:
Stage 1:
Δg j =δ, Δt j =δ, and Δp j =2δ, ∀j 0≦j<n.
Stages 2, . . . , ┌log2 n┐:
for k=1, . . . , ┌log2 n┐−1, where k is the stage number of the fundamental carry operator in the prefix tree: Δ c j = ( 1.5 * k ) δ + ( 2 k - 1 ) δ h + Δ t j , j 2 k - 1 - 1 j < 2 k - 1 , Δ G j - 2 k + 1 j = ( 1.5 * k ) δ + ( 2 k - 1 ) δ h + Δ t j and Δ T j - 2 k + 1 j = ( k ) δ + ( 2 k - 1 ) δ h + Δ t j , j 2 k - 1 j < n .
Figure US06539413-20030325-M00006
Stage ┌log2 n┐+1: k=┌log2 n┐: Δ c j = ( 1.5 * k ) δ + ( 2 k - 1 ) δ h + Δ t j , j 2 k - 1 - 1 j < 2 k - 1 , Δ G j - 2 k + 1 j = ( 1.5 * k ) δ + ( 2 k - 1 - 1 ) δ h + Δ t j and Δ T j - 2 k + 1 j = ( k ) δ + ( 2 k - 1 - 1 ) δ h + Δ t j , j 2 k - 1 j < n .
Figure US06539413-20030325-M00007
Stage ┌log2 n┐+2:
Δs j =1.5*δ+Δc j−1 h , ∀j 0≦j<n,
Δc n−1 =1.5*δ+ΔG 0 n−1 h.
In the stages 2 to ┌log2 n┐, the term Δt j is selected to be the worst delay from stage 1 since an OR/NOR gate is typically slower than an AND/NAND gate.
In the (┌log2 n┐+2)-th stage of the adder, the carry signals incur the maximum delay due to the accumulation of the routing delays of the intermediate carry terms, the group-generate signals, and the group-transmit signals that generate the carry terms. This contributes to the term (2k−1)δh in the Δc j expression. Since the sum bits depend on the carry signals in the final stage, the total delay in generating the sum bits in an n-bit adder is Δs=((1.5*┌log2 n┐)+2.5)δ+2┌log 2 n┐h. From simulations it has been determined that bh is approximately related to δ through δh=δ/16 for an exemplary 0.16μ technology.
FIG. 2 shows a graph of the maximum accumulated stage delays for group-generate (G), group-transmit (T), and intermediate carries (c) for a 32-bit prefix tree adder design of the type shown in FIG. 1, i.e., an adder design with c routing over n/2 bits. It can be seen from the graph that the group-generate, the group-transmit, and the intermediate carries are all generated with differing delays, and that this difference is maximum at the final stage of the parallel prefix tree of the adder. The total delta delay in this 32-bit prefix tree adder example, in accordance with the expression given above, is Δs=(1.5*5+2.5)δ+32*δh=12.0δ, assuming as mentioned previously that δh=δ/16.
The present invention provides an improved prefix tree adder design which significantly reduces delay relative to the FIG. 1 adder design. More particularly, the invention in an illustrative embodiment exploits the above-described difference in the delays for computing the group-generate, group-transmit, and the intermediate carries in the final stage of the prefix tree, by combining the last two stages of the most-significant half of the adder into a single stage. As will be described in greater detail below, this may be done by altering the carry and sum generation equations in the adder algorithm so as to take advantage of the latency of the signals.
FIG. 3(a) shows the Boolean logic used in the last stage, i.e., the sum generation stage, of the prefix tree adder of FIG. 1. FIGS. 3(b) and 3(c) show the Boolean logic of two cells which may be used in the most-significant half of the final stage of the adder in order to decrease the adder delay in accordance with the present invention. More specifically, FIG. 3(b) shows the Boolean logic used in the last stage of an improved prefix tree adder in accordance with the invention for all values of j such that n>j≧¾n, while FIG. 3(c) shows the Boolean logic used in the last stage of the prefix tree adder for all values of j such that ¾n>j≧n/2.
A general algorithm describing the operation of the improved prefix tree adder in greater detail is as follows:
Step 1: (1 stage): calculate: g j = a j b j t j = a j + b j p j = a j b j = g _ j t j } j 0 j < n .
Figure US06539413-20030325-M00008
Step 2: (┌log2 n┐−1 stages)
for k=1, . . . , ┌log2 n┐−1, calculate
c j =G j−2 k−1 +1 j +T j−2 k−1 +1 j c j−2 k−1 ∀j
2k−1>j≧2k−1−1,(G
j−2 k +1 j ,T j−2 k +1 j)=(Gj−2 k−1 +1 j ,T j=2 k−1
+1j)o(G j−2 k +1 j−2 k−1 ,T j−2 k +1 j−2 k−1 ) ∀j n>j≧
2k−1−1
Step 3: (1 stage) for the final stage, calculate
s j =p j ⊕c j−1 ∀j n/2>j≧0,
s j =G j−n/2 j−1 p j +G j−n/2 j−1(T j−n/2 j−1(p j ⊕c j−n/2−1)+{overscore (T)} j−n/2 j−1 p j) ∀j ¾n>j≧n/2,
s j =G j−n/2 j−1 {overscore (P)} j +G j−n/2 j−1(c j−n/2−1(p j ⊕T j−n/2 j−1)+{overscore (c)} j−n/2−1 p j) ∀j n>j≧¾n,
and
c n−1 =G 0 n−1 +T 0 n−1 c −1.
A further improvement in computation speed is possible in accordance with the invention by rearranging the physical layout of the last stage of the adder so that the upper or most-significant half of the sum bits are generated in the same column as the lower or least-significant half of the sum bits. This reduces the routing delay on the intermediate carry signals that are on the critical path and therefore speeds up the sum computation. Such an arrangement may be implemented as a left-to-right routing of the most-significant group-generate and group-transmit signals, and may be referred to as a “folded” arrangement. This further improvement is particularly useful for adders having a large word length, i.e., a word length greater than or equal to 32, and for adder applications in which a regular layout is not required.
FIG. 4 shows a set of superimposed prefix trees 40 for an n-bit prefix tree adder incorporating the above-described improvements. The empty circles apply the fundamental carry operator in accordance with Step 2. The filled circles represent buffers. The crossed circles compute carries in accordance with Step 2 and Step 3 of the general algorithm. The empty diamonds represent the Boolean logic of FIG. 3(b), the filled diamonds represent the Boolean logic of FIG. 3(c), and the crossed rectangles represent logic that implements the sum computation equation in Step 3 for values of j such that n/2>j≧0.
It should be emphasized that the logic circuitry in FIGS. 3(a) and 3(b) is shown by way of example only. Those skilled in the art will recognize that numerous alternative arrangements of logic circuitry may be used to exploit the differences in delay in the group-generate, group-transmit and intermediate carry signals in accordance with the techniques of the present invention.
Like the FIG. 1 adder, the improved prefix tree adder of FIG. 4 has a logic depth of 2+┌log2 n┐, and the fanout of the carry input c−1 is 1+┌log2 n┐. In addition, the above-described general algorithm for the improved prefix tree adder can be extended in a straightforward manner to higher radix prefix trees.
FIG. 5 shows a graph of the maximum accumulated stage delays for group-generate (G), group-transmit (T), and intermediate carries (c) for a 32-bit prefix tree adder design of the type shown in FIG. 4, i.e., an adder design with G and T routing over n/2 bits. It is apparent from the graph that the delay of the improved prefix tree adder is smaller than that of the adder of FIG. 1. More specifically, the total adder delay to produce the sum output is given by: Δ s j = max ( Δ G j - n / 2 j - 1 - to - s j , Δ c j - n / 2 - 1 - to - s j , Δ T j - n / 2 j - 1 - to - s j ) jn > j n 2 Δ s j = max ( ( 1.5 * k ) δ + 2.5 * δ + 2 k + 1 δ h , ( 1.5 * k ) δ + 4.0 * δ + ( 2 k ) δ h , ( k ) δ + 5.5 * δ + ( 2 k + 1 ) δ h ) ,
Figure US06539413-20030325-M00009
where k=┌log2 n┐−1. For the present example, in which n=32, the above expression yields a total adder delay of 11.5δ.
The adder architecture of the present invention thus reduces the gate delay of an n-bit prefix tree adder, as compared to existing architectures such as that illustrated in FIG. 1, while providing the same logic depth, fanout and wiring complexity. For example, a fully-static 32-bit radix-2 prefix tree adder configured in accordance with the invention has a delay on the order of 0.7 nsec in a 0.16 μm static CMOS implementation. The wiring complexity is manageable in 0.16 μm technology using five layers of interconnect.
Although static circuits were used in the above-described illustrative 32-bit implementations, it should be noted that the invention may be implemented using either static circuits, dynamic circuits or combinations of both static and dynamic circuits. Static circuits are often preferred to dynamic circuits because of their ease of design.
The above-described illustrative embodiments of the invention may be configured to meet the requirements of a variety of different circuit applications, using any desired value of n. Adders in accordance with the invention may be used as elements of many different types of circuits, such as, e.g., arithmetic logic units (ALUs), multiply-add units, and comparators. The invention can be incorporated in a wide variety of integrated circuits or other processing devices, including, e.g., microprocessors, digital signal processors (DSPs), microcontrollers, application-specific integrated circuits (ASICs), memory circuits, telecommunications hardware and other types of processing devices. Moreover, as previously noted, a variety of other types of adders, including non-radix-2 adders, may also be implemented using the techniques of the present invention. These and numerous other alternative embodiments may be devised by those skilled in the art without departing from the scope of the following claims.

Claims (21)

What is claimed is:
1. An adder comprising:
a plurality of prefix trees, each associated with a bit position of the adder and including one or more computation stages, the computation stages for each of the bit positions including a sum computation stage implemented in logic circuitry, and wherein the logic circuitry of the sum computation stage for at least a subset of the bit positions of the adder computes a sum based at least in part on group-generate, group-transmit, propagate and intermediate carry signals, and is configured to exploit differences in delay associated with generation of the group-generate, group-transmit and intermediate carry signals so as to reduce the total computational delay of the adder.
2. The adder of claim 1 wherein the adder comprises a radix-2 adder.
3. The adder of claim 1 further including a separate prefix tree for each bit position.
4. The adder of claim 1 wherein a carry computed for a lower bit position is used to compute a carry for at least one higher bit position in parallel within the corresponding prefix trees.
5. The adder of claim 1 wherein a generate signal and at least one of a propagate signal and a transmit signal are generated in an initial stage of each of the prefix trees without utilizing a primary carry input signal.
6. The adder of claim 1 wherein an initial stage of each of the prefix trees calculates g j = a j b j t j = a j + b j p j = a j b j = g _ j t j } j 0 j < n ,
Figure US06539413-20030325-M00010
where gj is a generate signal, tj is a transmit signal, pj is a propagate signal, and n is the number of bit positions of the adder.
7. The adder of claim 6 wherein a plurality of subsequent stages of each of the prefix trees each calculate
c j =G j−2 k−1 +1 j +T j−2 k−1 +1 j c j−2 k−1 ∀j
2k−1>j≧2k−1−1,
(G j−2 k +1 j ,T j−2 k +1 j)=(G j−2 k−1 +1 j ,T j=2
k−1+1 j)o(G j−2 k +1 j−2 k−1 ,T j−2 k +1 j−2 k−1 )
j n>j≧2k−1−1
where G denotes a group-generate signal, T denotes a group-transmit signal, c denotes an intermediate carry signal, and o is a carry operator.
8. The adder of claim 6 wherein the sum computation stage of each of the prefix trees calculates a sum sj as follows:
s j =p j ⊕c j−1 ∀j n/2>j≧0,
s j =G j−n/2 j−1 p j +G j−n/2 j−1(T j−n/2 j−1(p j ⊕c j−n/2−1)+{overscore (T)} j−n/2 j−1 p j) ∀j ¾n>j≧n/2,
s j =G j−n/2 j−1 {overscore (P)} j +G j−n/2 j−1(c j−n/2−1(p j ⊕T j−n/2 j−1)+{overscore (c)} j−n/2−1 p j) ∀j n>j≧¾n.
where G denotes a group-generate signal, T denotes a group-transmit signal, and c denotes an intermediate carry signal.
9. The adder of claim 8 wherein an output carry signal cn−1 for the prefix tree adder is calculated as:
c n−1 =G 0 n−1 +T 0 n−1 c −1.
10. The adder of claim 1 wherein the sum computation stages of the adder are configured in accordance with a left-to-right routing of the group-generate and group-transmit signals such that a most-significant half of a given set of sum bits are generated in the same prefix trees as a least-significant half of the sum bits.
11. An integrated circuit comprising:
at least one adder, the adder comprising a plurality of prefix trees, each of the prefix trees being associated with a bit position of the adder and including one or more computation stages, the computation stages for each of the bit positions including a sum computation stage implemented in logic circuitry, and wherein the logic circuitry of the sum computation stage for at least a subset of the bit positions of the adder computes a sum based at least in part on group-generate, group-transmit, propagate and intermediate carry signals, and is configured to exploit differences in delay associated with generation of the group-generate, group-transmit and intermediate carry signals so as to reduce the total computational delay of the adder.
12. A method for performing an addition operation, the method comprising the steps of:
providing a plurality of prefix trees in an adder, each prefix tree associated with a bit position of the adder and including one or more computation stages, the computation stages for each of the bit positions including a sum computation stage; and
computing a sum in the sum computation stage for at least a subset of the bit positions of the adder, based at least in part on group-generate, group-transmit, propagate and intermediate carry signals, the computing step being configured to exploit differences in delay associated with generation of the group-generate, group-transmit and intermediate carry signals so as to reduce the total computational delay of the adder.
13. The method of claim 12 wherein the adder comprises a radix-2 adder.
14. The method of claim 12 further including a separate prefix tree for each bit position.
15. The method of claim 12 wherein a carry computed for a lower bit position is used to compute a carry for at least one higher bit position in parallel within the corresponding prefix trees.
16. The method of claim 12 wherein a generate signal and at least one of a propagate signal and a transmit signal are generated in an initial stage of each of the prefix trees without utilizing a primary carry input signal.
17. The method of claim 12 wherein an initial stage of each of the prefix trees calculates g j = a j b j t j = a j + b j p j = a j b j = g _ j t j } j 0 j < n ,
Figure US06539413-20030325-M00011
where gj is a generate signal, tj is a transmit signal, pj is a propagate signal, and n is the number of bit positions of the adder.
18. The method of claim 17 wherein a plurality of subsequent stages of each of the prefix trees each calculate
c j =G j−2 k−1 +1 j +T j−2 k−1 +1 j c j−2 k−1 ∀j
2k−1>j≧2k−1−1,
(G j−2 k +1 j ,T j−2 k +1 j)=(Gj−2 k−1 +1 j ,T j−2 k−1
+1j)o(G j−2 k +1 j−2 k−1 ,T j−2 k +1 j−2 k−1 ) ∀j n>j≧
2k−1−1
where G denotes a group-generate signal, T denotes a group-transmit signal, c denotes an intermediate carry signal, and o is a carry operator.
19. The method of claim 17 wherein the sum computation stage of each of the prefix trees calculates a sum sj as follows:
s j =p j ⊕c j−1 ∀j n/2>j≧0,
s j 32 G j−n/2 j−1 p j +G j−n/2 j−1(T j−n/2 j−1(p j ⊕c j−n/2−1)+{overscore (T)} j−n/2 j−1 p j) ∀j ¾n>j≧n/2,
s j =G j−n/2 j−1 {overscore (P)} j +G j−n/2 j−1(c j−n/2−1(p j ⊕T j−n/2 j−1)+{overscore (c)} j−n/2−1 p j) ∀j n>j≧¾n,
where G denotes a group-generate signal, T denotes a group-transmit signal, and c denotes an intermediate carry signal.
20. The method of claim 19 wherein an output carry signal cn−1 or the adder is calculated as:
c n−1 =G 0 n−1 +T 0 n−1 c −1.
21. The method of claim 12 wherein the sum computation stages of the adder are configured in accordance with a left-to-right routing of the group-generate and group-transmit signals such that a most-significant half of a given set of sum bits are generated in the same prefix trees as a least-significant half of the sum bits.
US09/525,644 2000-03-15 2000-03-15 Prefix tree adder with efficient sum generation Expired - Fee Related US6539413B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US09/525,644 US6539413B1 (en) 2000-03-15 2000-03-15 Prefix tree adder with efficient sum generation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US09/525,644 US6539413B1 (en) 2000-03-15 2000-03-15 Prefix tree adder with efficient sum generation

Publications (1)

Publication Number Publication Date
US6539413B1 true US6539413B1 (en) 2003-03-25

Family

ID=24094065

Family Applications (1)

Application Number Title Priority Date Filing Date
US09/525,644 Expired - Fee Related US6539413B1 (en) 2000-03-15 2000-03-15 Prefix tree adder with efficient sum generation

Country Status (1)

Country Link
US (1) US6539413B1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020103842A1 (en) * 2000-12-08 2002-08-01 Alexander Goldovsky Adder with improved overflow flag generation
US20040225706A1 (en) * 2003-05-05 2004-11-11 Harris David L. Parallel prefix networks that make tradeoffs between logic levels, fanout and wiring racks
US8683398B1 (en) * 2012-11-27 2014-03-25 International Business Machines Corporation Automated synthesis of high-performance two operand binary parallel prefix adder
US8928675B1 (en) 2014-02-13 2015-01-06 Raycast Systems, Inc. Computer hardware architecture and data structures for encoders to support incoherent ray traversal
GB2523805A (en) * 2014-03-06 2015-09-09 Advanced Risc Mach Ltd Data processing apparatus and method for performing vector scan operation
US20160283196A1 (en) * 2015-03-26 2016-09-29 Altera Corporation Combined adder and pre-adder for high-radix multiplier circuit
US10073677B2 (en) 2015-06-16 2018-09-11 Microsoft Technology Licensing, Llc Mixed-radix carry-lookahead adder architecture
US11301213B2 (en) * 2019-06-24 2022-04-12 Intel Corporation Reduced latency multiplier circuitry for very large numbers
US11334318B2 (en) * 2018-07-12 2022-05-17 Intel Corporation Prefix network-directed addition

Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3814925A (en) * 1972-10-30 1974-06-04 Amdahl Corp Dual output adder and method of addition for concurrently forming the differences a{31 b and b{31 a
US3987291A (en) * 1975-05-01 1976-10-19 International Business Machines Corporation Parallel digital arithmetic device having a variable number of independent arithmetic zones of variable width and location
US4700325A (en) * 1984-02-08 1987-10-13 Hewlett-Packard Company Binary tree calculations on monolithic integrated circuits
US4737926A (en) * 1986-01-21 1988-04-12 Intel Corporation Optimally partitioned regenerative carry lookahead adder
US5122982A (en) * 1988-02-29 1992-06-16 Chopp Computer Corporation Carry generation method and apparatus
US5166899A (en) * 1990-07-18 1992-11-24 Hewlett-Packard Company Lookahead adder
US5208490A (en) * 1991-04-12 1993-05-04 Hewlett-Packard Company Functionally complete family of self-timed dynamic logic circuits
US5257218A (en) * 1992-01-06 1993-10-26 Intel Corporation Parallel carry and carry propagation generator apparatus for use with carry-look-ahead adders
US5270955A (en) * 1992-07-31 1993-12-14 Texas Instruments Incorporated Method of detecting arithmetic or logical computation result
US5434810A (en) * 1988-04-20 1995-07-18 Fujitsu Limited Binary operator using block select look ahead system which serves as parallel adder/subtracter able to greatly reduce the number of elements of circuit with out sacrifice to high speed of computation
US5477480A (en) * 1992-07-10 1995-12-19 Nec Corporation Carry look ahead addition method and carry look ahead addition device
US5479356A (en) * 1990-10-18 1995-12-26 Hewlett-Packard Company Computer-aided method of designing a carry-lookahead adder
US5500813A (en) * 1992-05-20 1996-03-19 Samsung Electronics Co., Ltd. Circuit for adding multiple-bit binary numbers
US5508952A (en) * 1993-10-19 1996-04-16 Kantabutra; Vitit Carry-lookahead/carry-select binary adder
US5581497A (en) * 1994-10-17 1996-12-03 Intel Corporation Carry skip adder with enhanced grouping scheme
US5636156A (en) * 1994-12-12 1997-06-03 International Business Machines Corporation Adder with improved carry lookahead structure
US5701504A (en) * 1994-12-28 1997-12-23 Intel Corporation Apparatus and method for addition based on Kogge-Stone parallel algorithm
US5719803A (en) * 1996-05-31 1998-02-17 Hewlett-Packard Company High speed addition using Ling's equations and dynamic CMOS logic
US5719802A (en) * 1995-12-22 1998-02-17 Chromatic Research, Inc. Adder circuit incorporating byte boundaries
US5881274A (en) * 1997-07-25 1999-03-09 International Business Machines Corporation Method and apparatus for performing add and rotate as a single instruction within a processor
US6175852B1 (en) * 1998-07-13 2001-01-16 International Business Machines Corporation High-speed binary adder

Patent Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3814925A (en) * 1972-10-30 1974-06-04 Amdahl Corp Dual output adder and method of addition for concurrently forming the differences a{31 b and b{31 a
US3987291A (en) * 1975-05-01 1976-10-19 International Business Machines Corporation Parallel digital arithmetic device having a variable number of independent arithmetic zones of variable width and location
US4700325A (en) * 1984-02-08 1987-10-13 Hewlett-Packard Company Binary tree calculations on monolithic integrated circuits
US4737926A (en) * 1986-01-21 1988-04-12 Intel Corporation Optimally partitioned regenerative carry lookahead adder
US5122982A (en) * 1988-02-29 1992-06-16 Chopp Computer Corporation Carry generation method and apparatus
US5434810A (en) * 1988-04-20 1995-07-18 Fujitsu Limited Binary operator using block select look ahead system which serves as parallel adder/subtracter able to greatly reduce the number of elements of circuit with out sacrifice to high speed of computation
US5166899A (en) * 1990-07-18 1992-11-24 Hewlett-Packard Company Lookahead adder
US5479356A (en) * 1990-10-18 1995-12-26 Hewlett-Packard Company Computer-aided method of designing a carry-lookahead adder
US5208490A (en) * 1991-04-12 1993-05-04 Hewlett-Packard Company Functionally complete family of self-timed dynamic logic circuits
US5257218A (en) * 1992-01-06 1993-10-26 Intel Corporation Parallel carry and carry propagation generator apparatus for use with carry-look-ahead adders
US5500813A (en) * 1992-05-20 1996-03-19 Samsung Electronics Co., Ltd. Circuit for adding multiple-bit binary numbers
US5477480A (en) * 1992-07-10 1995-12-19 Nec Corporation Carry look ahead addition method and carry look ahead addition device
US5270955A (en) * 1992-07-31 1993-12-14 Texas Instruments Incorporated Method of detecting arithmetic or logical computation result
US5508952A (en) * 1993-10-19 1996-04-16 Kantabutra; Vitit Carry-lookahead/carry-select binary adder
US5581497A (en) * 1994-10-17 1996-12-03 Intel Corporation Carry skip adder with enhanced grouping scheme
US5636156A (en) * 1994-12-12 1997-06-03 International Business Machines Corporation Adder with improved carry lookahead structure
US5701504A (en) * 1994-12-28 1997-12-23 Intel Corporation Apparatus and method for addition based on Kogge-Stone parallel algorithm
US5719802A (en) * 1995-12-22 1998-02-17 Chromatic Research, Inc. Adder circuit incorporating byte boundaries
US5719803A (en) * 1996-05-31 1998-02-17 Hewlett-Packard Company High speed addition using Ling's equations and dynamic CMOS logic
US5881274A (en) * 1997-07-25 1999-03-09 International Business Machines Corporation Method and apparatus for performing add and rotate as a single instruction within a processor
US6175852B1 (en) * 1998-07-13 2001-01-16 International Business Machines Corporation High-speed binary adder

Non-Patent Citations (17)

* Cited by examiner, † Cited by third party
Title
A. Beaumont-Smith et al., "A GaAs 32-bit Adder," IEEE Symposium Computer Arithmetic, pp. 10-17, Jul. 1997.
A. Goldovsky et al., "A 1.0-nsec 32-bit Prefix Tree Adder in 0.25-mum Static CMOS," 43rd Midwest Symposium on Circuits and Systems, 5 pages, Aug. 1999.
A. Goldovsky et al., "A 1.0-nsec 32-bit Prefix Tree Adder in 0.25-μm Static CMOS," 43rd Midwest Symposium on Circuits and Systems, 5 pages, Aug. 1999.
A. Weinberger and J.L. Smith, "A One-Microsecond Adder Using One-Megacycle Circuitry," IRE Trans. on Electric Computers, pp. 65-73, Jun. 1956.
A. Weinberger, "High-Speed Binary Adder," IBM Technical Disclosure Bulletin, vol. 24, No. 8, pp. 4393-4398, Jan. 1982.
Akhilesh Tyagi, A Reduced-Area Scheme for Carry-Select Adders, Oct. 1993, IEEE Transaction on Computers, vol. 42 No. 10, p. 1163-1170.* *
Arjhan et al., A Novel Scheme for Irregular Parallel-Prefix Adders, 1997, IEEE Transaction on Computers. p. 74-78.* *
D. Dozza et al., "A 3.5 NS, 64 Bit, Carry-Lookahead Adder," in Proc. Intl. Symp. Circuits and Systems, pp. 297-300, 1996.
G. Bewick et al., "Approaching a Nanosecond: A 32 Bit Adder,"IEEE International Conference on Computer Design: VLSI in Computers & Processors, pp. 221-226, Oct. 1988.
J. Silberman et al., "A 1.0 GHz Single-Issue 64b PowerPC Integer Processor," IEEE Intl. Solid-State Circuits Conf., pp. 230-231, Feb. 1998.
P.M. Kogge and H.S. Stone, "A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations," IEEE Trans. on Computers, vol. C-22, No. 8, pp. 786-793, Aug. 1973.
R.P. Brent and H.T. Kung, "A Regular Layout for Parallel Adders," IEEE Trans. on Computers, vol. C-31, No. 3, pp. 260-264, Mar. 1982.
S. Knowles, "A Family of Adders," IEEE Symposium Computer Arithmetic, pp. 30-34, 1999.
T.-F. Ngai et al., "Regular, Area-Time Efficient Carry-Lookahead Adders," Journal of Parallel and Distributed Computing, vol. 3, pp. 92-105, 1986.
W. Liu et al., "A 250-MHz Wave Pipelined Adder in 2-mum CMOS,"IEEE Journal of Solid-State Circuits, vol. 29, No. 9, pp. 1117-1128, Sep. 1994.
W. Liu et al., "A 250-MHz Wave Pipelined Adder in 2-μm CMOS,"IEEE Journal of Solid-State Circuits, vol. 29, No. 9, pp. 1117-1128, Sep. 1994.
Z. Wang et al., "Fast Adders using Enhanced Multiple-Output Domino Logic," IEEE Journal of Solid-State Circuits, vol. 32, No. 2, pp. 206-214, Feb. 1997.

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020103842A1 (en) * 2000-12-08 2002-08-01 Alexander Goldovsky Adder with improved overflow flag generation
US6912560B2 (en) * 2000-12-08 2005-06-28 Agere Systems, Inc. Adder with improved overflow flag generation
US20040225706A1 (en) * 2003-05-05 2004-11-11 Harris David L. Parallel prefix networks that make tradeoffs between logic levels, fanout and wiring racks
US7152089B2 (en) * 2003-05-05 2006-12-19 Sun Microsystems, Inc. Parallel prefix networks that make tradeoffs between logic levels, fanout and wiring racks
US8683398B1 (en) * 2012-11-27 2014-03-25 International Business Machines Corporation Automated synthesis of high-performance two operand binary parallel prefix adder
US9619923B2 (en) 2014-01-14 2017-04-11 Raycast Systems, Inc. Computer hardware architecture and data structures for encoders to support incoherent ray traversal
US8928675B1 (en) 2014-02-13 2015-01-06 Raycast Systems, Inc. Computer hardware architecture and data structures for encoders to support incoherent ray traversal
US9035946B1 (en) 2014-02-13 2015-05-19 Raycast Systems, Inc. Computer hardware architecture and data structures for triangle binning to support incoherent ray traversal
US9058691B1 (en) 2014-02-13 2015-06-16 Raycast Systems, Inc. Computer hardware architecture and data structures for a ray traversal unit to support incoherent ray traversal
US9087394B1 (en) 2014-02-13 2015-07-21 Raycast Systems, Inc. Computer hardware architecture and data structures for packet binning to support incoherent ray traversal
US9761040B2 (en) 2014-02-13 2017-09-12 Raycast Systems, Inc. Computer hardware architecture and data structures for ray binning to support incoherent ray traversal
KR20150105209A (en) * 2014-03-06 2015-09-16 에이알엠 리미티드 Data processing apparatus and method for performing vector scan operation
US10001994B2 (en) * 2014-03-06 2018-06-19 Arm Limited Data processing apparatus and method for performing scan operations omitting a further step
GB2523805B (en) * 2014-03-06 2021-09-01 Advanced Risc Mach Ltd Data processing apparatus and method for performing vector scan operation
CN104899180B (en) * 2014-03-06 2019-05-17 Arm 有限公司 For executing the data processing equipment and method of vector scan operation
CN104899180A (en) * 2014-03-06 2015-09-09 Arm有限公司 Data processing apparatus and method for performing vector scan operation
US20150254076A1 (en) * 2014-03-06 2015-09-10 Arm Limited Data processing apparatus and method for performing vector scan operation
GB2523805A (en) * 2014-03-06 2015-09-09 Advanced Risc Mach Ltd Data processing apparatus and method for performing vector scan operation
US9684488B2 (en) * 2015-03-26 2017-06-20 Altera Corporation Combined adder and pre-adder for high-radix multiplier circuit
CN106020768B (en) * 2015-03-26 2019-01-22 阿尔特拉公司 Combined adder and pre- adder for high radix multiplier circuit
CN106020768A (en) * 2015-03-26 2016-10-12 阿尔特拉公司 Combined adder and pre-adder for high-radix multiplier circuit
US20160283196A1 (en) * 2015-03-26 2016-09-29 Altera Corporation Combined adder and pre-adder for high-radix multiplier circuit
US10073677B2 (en) 2015-06-16 2018-09-11 Microsoft Technology Licensing, Llc Mixed-radix carry-lookahead adder architecture
US11334318B2 (en) * 2018-07-12 2022-05-17 Intel Corporation Prefix network-directed addition
US11301213B2 (en) * 2019-06-24 2022-04-12 Intel Corporation Reduced latency multiplier circuitry for very large numbers

Similar Documents

Publication Publication Date Title
JP3689183B2 (en) Accurate and effective sticky bit calculation for accurate floating-point division / square root operations
JP3761977B2 (en) Floating-point multiplier with reduced critical path by using delay matching technology and its operation method
JPH06348454A (en) Detection of result of computation of arithmetic or logic operation
JPH0969040A (en) Circuit for computing/dividing of square root of radical number 2 by three overlapped stages with presumptive computing
US20020143841A1 (en) Multiplexer based parallel n-bit adder circuit for high speed processing
EP0642093B1 (en) Method, system and apparatus for automatically designing a multiplier circuit and multiplier circuit designed by performing said method
US6539413B1 (en) Prefix tree adder with efficient sum generation
KR20070030320A (en) Carry-skip adder having merged carry-skip cells with sum cells
US6529931B1 (en) Prefix tree adder with efficient carry generation
US7325025B2 (en) Look-ahead carry adder circuit
Stine et al. Constant addition utilizing flagged prefix structures
Maddisetti et al. Machine learning based power efficient approximate 4: 2 compressors for imprecise multipliers
US4890127A (en) Signed digit adder circuit
Nithya et al. Design of Delay Efficient Hybrid Adder for High Speed Applications
US6912560B2 (en) Adder with improved overflow flag generation
Liao et al. A carry-select-adder optimization technique for high-performance booth-encoded wallace-tree multipliers
Kulkarni et al. MAC unit optimization for area power and timing constraints
US6782406B2 (en) Fast CMOS adder with null-carry look-ahead
Chong et al. Low energy 16-bit Booth leapfrog array multiplier using dynamic adders
US4979140A (en) Signed digit adder circuit
US20040267862A1 (en) Adder including generate and propagate bits corresponding to multiple columns
US20060242219A1 (en) Asynchronous multiplier
De et al. Fast parallel algorithm for ternary multiplication using multivalued I/sup 2/L technology
US20230266942A1 (en) Triple adder
US6584484B1 (en) Incorporation of split-adder logic within a carry-skip adder without additional propagation delay

Legal Events

Date Code Title Description
AS Assignment

Owner name: LUCENT TECHNOLOGIES INC., NEW JERSEY

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GOLDOVSKY, ALEXANDER;SRINIVAS, HOSAHALLI R.;REEL/FRAME:010826/0701;SIGNING DATES FROM 20000404 TO 20000422

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FPAY Fee payment

Year of fee payment: 4

FPAY Fee payment

Year of fee payment: 8

AS Assignment

Owner name: DEUTSCHE BANK AG NEW YORK BRANCH, AS COLLATERAL AG

Free format text: PATENT SECURITY AGREEMENT;ASSIGNORS:LSI CORPORATION;AGERE SYSTEMS LLC;REEL/FRAME:032856/0031

Effective date: 20140506

REMI Maintenance fee reminder mailed
LAPS Lapse for failure to pay maintenance fees
AS Assignment

Owner name: AVAGO TECHNOLOGIES GENERAL IP (SINGAPORE) PTE. LTD

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:AGERE SYSTEMS LLC;REEL/FRAME:035365/0634

Effective date: 20140804

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20150325

AS Assignment

Owner name: LSI CORPORATION, CALIFORNIA

Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENT RIGHTS (RELEASES RF 032856-0031);ASSIGNOR:DEUTSCHE BANK AG NEW YORK BRANCH, AS COLLATERAL AGENT;REEL/FRAME:037684/0039

Effective date: 20160201

Owner name: AGERE SYSTEMS LLC, PENNSYLVANIA

Free format text: TERMINATION AND RELEASE OF SECURITY INTEREST IN PATENT RIGHTS (RELEASES RF 032856-0031);ASSIGNOR:DEUTSCHE BANK AG NEW YORK BRANCH, AS COLLATERAL AGENT;REEL/FRAME:037684/0039

Effective date: 20160201