CMOS full adder cell e.g. for multiplier array
Field of the Invention
This invention relates to the field of digital signal processing and, more particularly, to a fast full adder cell for use in integrated circuit multiplier arrays.
Background of the Invention
In digital computing and signal processing applications, multiplication is an important and frequently employed operation. Digital multipliers generally include certain hardware elements and an algorithm controlling their use. The hardware generally comprises an array of adders, which comprise the basic operational cells of each multiplier. For a more complete discussion of the array approach to utliplication, see, for example, V.C. Hamacher et al, Computer Organization (2nd ed.), McGraw-Hill Book Co., New York 1984, pp. 251-64.
Sometimes, large arrays of multipliers are required. It has become an accepted practice to implement such arrays on semiconductor integrated circuit "chips", particularly chips fabricated using very large scale integration (VLSI) techniques. The speed of such multipliers has a strong influence on the overall capabilities of the data or signal processing system, particularly in real-time signal processing applications. And the number of multipliers or multiplier arrays on a chip directly affects its functionality and capabilities. Consequently, multiplier cells (and their constituent adder cells) are sought which are physically small and operate at high speed. One recent
example. of this striving is shown, for example, in D.λ. Henlin et al, "A 25 MHz 16 Bit x 16 Bit Pipelined Multiplier," Proceedings IEEE International Conference on Computer Design (1984) at 417-422.
Three basic approaches may be taken in attempting to improve the speed of a multiplication operation: (1) improving the semiconductor processing of the integrated circuit and making the devices employed therein smaller or of different materials, (2) modifying the multiplication algorithm and (3) increasing the speed of the basic computational element, the full adder cell. Naturally, many combinations of these approaches will yield additional improvements in performance.
Summary of the Invention
In the present invention, the third of these approaches has been followed. An improved circuit has been devised for a full adder cell. The semiconductor processing and the multiplication algorithm being held constant, it is expected that this new cell will operate approximately twice as fast as the prior art full adder cells. This cell uses simple 2-input gates and a pair of multiplexers. The sum and carry results from the cell propagate to its outputs with approximately the same speed. By contrast, the prior art frequently employs 3-input gates, which are substantially slower than 2-input gates since they incorporate many more transistors. The multiplexers are formed using pass transistors. The 2-input gates may also be formed using pass transistors (arranged to form additional multiplexers) and, in the case of AND and OR gates, a single additional field- effect transistor.
In a first, basic embodiment, the full adder cell is implemented with a one-bit-wide multiplexer; this multiplexer is used to select either the output of a 2-input exclusive-OR gate or the output of a 2-input exclusive-NOR gate; the multiplexer output is presented as the sum output of the adder. A second one-bit-wide mutliplexer is used for generating the carry output; this multiplexer selects between the output of a 2-input OR gate and the output of a 2-input AND gate, presenting the selected gate's output as the carry output from the cell.
In another embodiment, the 2-input exclusive-OR and exclusive-NOR gates also are formed from single-bit pass transistor multiplexers, while the AND and OR gates are formed from pass transistors with a pull-up or pull-down transistor on their outputs, as appropriate.
The invention is pointed out with particularity in the appended claims. The above and further objects, features and advantages of the invention may be better understood by referring to the following detailed description, which should be read in conjunction with the accompanying drawing.
Brief Description of the Drawing In the drawing,
Fig. 1 is a schematic diagram of a sum-generating circuit according to the present invention, without showing the construction details of the exclusive-OR and exclusive-NOR gates;
Fig. 2 is a schematic diagram of a carry-generating circuit according to the present invention, without showing the construction details of the AND and OR gates;
Fig. 3 is a schematic circuit diagram of the exclusive-OR gate of Fig. 1 constructed from pass transistors;
Fig. 4 is a schematic circuit diagram of the exclusive- NOR gate of Fig. 1 constructed from pass transistors;
Fig. 5 is a schematic circuit diagram of the AND gate of Fig. 2 constructed from a pass transistor and a pull-up transistor;
Fig. 6 is a schematic circuit diagram of the OR gate of Fig. 2 constructed from a pass transistor and a pull-up transistor;
Fig. 7 is a schematic diagram showing the circuit which results when the circuits of Figs. 3 and 4 are used in the circuit of Fig. 1 to implement its exclusive-OR and exclusive-NOR.gates;
Fig. 8 is a schematic diagram showing the circuit which results when the circuits of Figs. 5 and 6 are used in ahe circuit of Fig. 2 to implement its AND and OR gates;
Fig. 9 is a part-schematic circuit/part-block diagram showing how the inputs of the full adder cell are developed; and
Fig. 10 is a schematic-circuit diagram for a sum- generating circuit in which two of the pass transistors of Fig. 7 have been replaced by an inverter.
Detailed Description of an Illustrative Embodiment
The basic form of the present invention is shown in Figs. 1 and 2 which depict, respectively, the sum circuit and the carry circuit of a one-bit full adder cell. Turning to Fig. 1, the sum circuit 10 comprises a multiplexer (enclosed in dotted lines 11) formed from a pair of pass transistors 12 and 14, an exclusive-OR gate 16 and an exclusive-NOR gate 18. Gates 16 and 18 both receive the same inputs, labelled
"B" and "C". The B and C signals are two of the three possible addends, the third being the S signal. When the adder cell is being used in a multiplier array, the B signal is the encoded multiplication factor (i.e. the multiplier), generated as described below, and the C input is the carry output from the previous adder cell. The output of exclusive-OR gate 16 is connected via line 17 to the first signal input of multiplexer 11 (i.e., the signal input of pass transistor 12) and the output of exclusive-NOR gate 18 is connected via line 19 to the second signal input of the multiplexer 11 (i.e., the signal input of pass transistor 14). The outputs of both pass transistors are wired together to provide the multiplexer output on line 15. The signal on line 15 is the sum output of the cell, labelled SUM. The multiplexer 11 is controlled by a complementary pair of signals, S and S__BAR, applied to the control inputs of the multiplexer (i.e., the gates of pass transistors 12 and 14) in opposite phase - i.e., so that multiplexer 12 is "on" while multiplexer 14 is "off", and vice versa. By definition, the statement that the multiplexer is controlled by the S signal means that the S and S_BAR signals are applied as indicated in Fig. 1 (i.e., the noninverted form of the S signal is applied to the gate of the P-type device in pass transistor 12 and to the gate of the N-type device in pass transistor 14, while the inverted form of the S signal is applied to the other gates of those pass transistors). The S control signal represents the sum from a previous adder in the array. (The notation "__BAR" when added to a signal name herein is used to signify the logical complement, or inverted state, of that signal.)
The carry circuit of Fig. 2 is topologically identical to the carry circuit of Fig. 1, except that exclusive-OR gate 16
has been replaced by an AND gate 22 and exclusive-NOR gate 18 has been replaced by an OR gate 24; the outputs of AND gate 22 and OR gate 24 are supplied on line 23 and line 25, respectively, to the inputs of multiplexer 21 (i.e., pass transistors 26 and 28). The outputs of pass transistors 26 and 28 are wired together, providing the output of multiplexer 21 on line 29. The signal on line 29 is the carry output of the cell and is therefore labelled CARRY.
Figs. 3 and 4 show, respectively, how gate 16 may be made from two pass transistors 32A and 32B, and how gate 18 may be made from two pass transistors 34A and 34B. Figs. 5 and 6 show, respectively, how each of gates 22 and 24 may be made from a single pass transistor and a single field effect transistor each. In the case of AND gate 22, these are, respectively, pass transistor 36 and FET 38, while in the case of OR gate 24, they are, respectively, pass transistor 42 and FET 44. In addition, the carry-in signals ("C") and, for the sum output, the encoded multiplier ("B") must be supplied in both inverted and non-inverted states; the two additional inverters which are needed are not shown. It should be appreciated that such inverters can be shared among the gates of Figs. 3 - 6, provided they have sufficient fan- out capability. All of pass transistors 32A, 32B, 34A, 34B, 36 and 42 are controlled by the C and C__BAR signals; the control signals are applied in like phase (or polarity or state) to pass transistors 32A, 34A and 36, and in the opposite phase to pass transistors 32B, 34B, and 42. The three former pass transistors are thus all "on" while the three latter pass transistors are all "off" and vice versa.
Using the notation of the Figures (i.e., the top gate of a pass transistor is the one to which its internal arrow symbols are pointed), a pass transistor is turned "on" when
the signal applied to the top gate is a logical 1 and the signal applied to its bottom gate is a logical 0. Thus, when the C signal is 1 and pass transistor 32A is turned on (and pass transistor 32B is therefore turned off), the output of gate 16 is the B_BAR signal. Conversely, when the C signal is 0, pass transistor 32B is turned on and pass transistor 32A is turned off, the output of gate 16 is the B input signal. The resulting truth table is that of an exclusive-OR function.
The circuit of Fig. 4 is identical to that of Fig. 3 except that its B and B_BAR inputs have been interchanged. Thus, the output of the former circuit is the logical inverse of the output of the latter circuit and the Fig. 4 circuit implements an exclusive-NOR operation.
Referring to Fig. 5, the output of the gate 22 is controlled by the pass transistor 36 or by the pull-down transistor (FET) 38, depending on the state of the C signal. The B signal is provided to the input of pass transistor 36; the C signal is provided to its top gate and the C_BAR signal is provided to its bottom gate as well as to the gate of FET 38. The output of pass transistor 36 and the drain of FET 38 are connected together to provide the ouput of gate 22. The source of FET 38 is connected to ground. Thus when the C input is 1, FET 38 is cut off and has no effect on the output; the output corresponds to the B signal. And when the C input is 0, the pass transistor 22 is turned off (i.e., its output floats), FET 38 is turned on and it connects the output of gate circuit 22 to ground through a very low resistance, setting that output to 0 irrespective of the state of the B signal. Thus, the output on line 23 is high only when both the B signal and the C signal are 1; this, of course, is the AND function.
in Fig. 6, the output of the gate 24 is controlled by the pass transistor 42 or by the pull-up transistor (FET) 44, depending on the state of the C signal. The B signal is provided to the input of pass transistor 42; the C_BAR signal is provided to the top gate of pass transistor 42 as well as to the gate of FET 44. The C signal is provided to the bottom gate of the pass transistor. The output of pass transistor 42 and the drain of FET 44 are connected together to provide the output of gate 24. The source of FET 44 is connected to the supply voltage, Vcc. Thus when the C input is 1, the pass transistor 42 is turned off (i.e., its putput floats), but FET 44 is turned on and connects the output on line 25 to the supply voltage through a low resistance, setting that output to a 1 irrespective of the state of the B signal. And when the C input is 0, the FET 44 is turned off and pass transistor 42 is turned on; this causes the output on line 25 to correspond to the B input. Thus, the output on line 25 is high if at least one of the B signal or the C signal is 1; this, of course, is the OR function.
Figs. 7 - 9 show how these circuit blocks can be put together to provide a full adder corresponding to Figs. 1 and 2. Inverters 50 and 51 have been added to drive the lines to the next full adder cell and to provide compensation for the unequal propagation times for logical 0 and logical 1, as explained more fully below. The total circuit thus has 30 transistors, which compares favorably to the 28 transistors used in traditional prior art designs.
Some further reduction in component count is possible without substantial penalty. Thus, the circuit of Fig. 10 may be substituted for the circuit of Fig. 7. There, the stand-alone exclusive-NOR ("XNOR") circuit comprising pass
transistors 34A and 34B has been replaced by an inverter 52 between the output of the exclusive-OR ("XOR") circuit on line 17 and pass transistor 14. This is possible since the XNOR and XOR functions are complementary.
When used in a multiply array, each full adder receives both a sum and a carry input from full adders in the row immediately above, and a third input from so-called "Booth logic" 62, the details of which are well-known to those skilled in the art. Booth logic implements the so-called "standard Booth algorithm" or, preferably, the "modified Booth Algorithm" for coding or recording the multiplier (i.e., multiplication factor). Other algorithms for multiplier recording, now known or hereafter to be developed, may be used, as well, as the invention is not limited for use with any particular such algorithm. The standard and modified Booth algorithms are described in numerous popular texts, such as L.R. Rabiner et al, Theory and Application of Digital Signal Processing, Prentice-Hall, Inc., Englewood Cliffs, N.J. 1975, at pp. 514-524.
While the three inputs of the full adder cell can be interchanged in any of the eight possible combinations and remain logically correct, optimal performance is achieved when they are connected in a specific manner. That is, it is preferable that the full adder cells of the present invention be interconnected as follows: input "C" is driven by the carry-out of a preceding full adder; input "S" is driven by the sum-out of a preceding full adder; and input "B" is driven by the Booth logic. This preference arises from the following three considerations: (1) the characteristics of pass transistors, (2) the differences in propagation time between the sum portion and the carry portion and (3) in a multiplier array, one of the three inputs to each full adder
is stable throughout the multiplication. On the first point, it should be appreciated that the control input (i.e., the signal turning the pass transistor on and off) requires less drive capability than the signal-passing input. On the second point, simulation techniques demonstrate the carry portion of the full adder cell to be slightly faster than the sum portion.
The performance of the full adder cell is affected significantly by the amount of the capacitive load which the sum and carry signals must drive. This load comprises the interconnection (metal or polysilicon) to the next row of adder cells and the gates of those cells' several pass transistors which are connected to the interconnection. Minimization of that capacitance is achieved by minimizing the interconnection length and the areas of such pass transistors. A pass transistor, of course, if formed of a P-type device and an N-type device; usually pass transistors are constructed such that the P-type device is twice the size of the N-type device, to allow the logic 1 and the logic 0 to propagate at equal speed. In the present invention, however, both the P-type device and the N-type device may have the minimum possible width of about 4 microns. While this will cause a logical 1 to propagate relatively slowly, compensation may be provided in the inverter which follows each pair of pass transistors; that is, the inverters may be made such that their P-type and N-type devices are of approximately the same size. This allows the "slow" logical 1 out of the preceding pass transistors to be inverted about twice as fast as the faster logical 0, achieving compensation.
Due to this approach, most of the pass transistors can be two-thirds the size of their counterparts in conventional full adders. This results in substantial improvements in speed. In a simulation using a conventional circuit analysis
system, the worst case propagation delay for a 16 by 16 multiplication array according to the present invention was indicated to be about 22 nanoseconds, as compared with about 44 nanoseconds for the customary approaches. Although real delay times may be expected to differ from these theoretical, calculated values, the ratio of the real delay times should not be much different from the ratio of the delay times in the simulation results.
Having thus described exemplary embodiments of the invention, it will be apparent that various alterations, modifications and improvements will readily occur to those skilled in the art. Such obvious alterations, modifications and improvements, though not expressly described above, are nonetheless intended to be implied and are within the spirit and scope of the invention. Accordingly, the foregoing discussion is intended to be illustrative only, and not limiting; the invention is limited and defined only by the following claims and equivalents thereto.
What is claimed is: