CARRY LOOKAHEAD ADDER FOR DIFFERENT DATA TYPES
The present invention relates to addition circuitry, particularly to packed arithmetic prefix adder circuitry, often referred to as "carry lookahead" adders or adder trees, and, more particularly to prefix adder circuitry capable of calculating the sum of or difference between pairs of packed or unpacked binary numbers.
Multimedia processor chips (and others) make much use of "packed" arithmetic operations, in which long wordlength numbers are optionally treated as several independent shorter wordlength numbers - for example, a 32-bit word may be treated as 2 separate 16-bit words or as 4 8-bit words. Moreover, a common arithmetic operation used in video processing is "absolute difference", denoted | A - B | , in which the magnitude of the difference between A and B is calculated. This operation is used in video motion estimation and prediction algorithms. Hence, a most valuable operation is a "packed absolute difference" operation which returns the absolute differences of several independent short numbers simultaneously.
Ordinarily, absolute differences are computed by performing a subtraction operation followed by a separate "absolute value" operation which returns the magnitude of a signed number. Absolute differences can be obtained by computing both A - B and B - A, and using the signs of the two results to select the positive result, which corresponds to the absolute difference. However, this is wasteful and a better technique is sought. This document describes how parallel prefix adder trees - widely used in VLSI processor chips - may be modified to support packed arithmetic operations, including packed absolute difference calculations.
Parallel prefix carry-lookahead adders are a popular VLSI design technique that accelerates a w-bit addition by means of a parallel prefix tree.
The addition process at each bit position can be defined in terms of signals as follows in relation to Figure 1.
A block diagram of a prefix adder is illustrated in Figure 1 , where the adder is seen to consist of three blocks: input bit propagate, generate, and not kill cells; the prefix tree; output sum cells. The input cells derive the bit propagate, generate, and not kill signals according to: p(i) = a(i) ® b(i) g(i) = a(i) Λ b(i)
^k(i) = a(i) v b(i) — 0) respectively, where: g(i) is called the bit generate condition, a value 1 indicating that the bits a(i) and b(i) produce an output carry bit (c(i)) irrespective of the incoming carry,
p(i) is called the bit propagate condition, a value 1 indicating that the bits a(i) and b(i) produce an output carry bit (c(i)) only if there is an incoming carry, and
-,k(i) is called the not kill bit condition, a value 0 indicating that the bits a(i) and b(i) produce no output carry bit (the symbol "-." being used to indicate the NOT condition); c(i) is a binary carry bit which is received by each next most significant bit position, c(i) having the same significance as a(i) and b(i).
The prefix tree combines the bit generate and bit not kill signals to derive
"group generate" and "group not kill" signals, denoted G/-1 ° and ^Kμ-]^ respectively. Next, the "group generate" and "group not kill" signals are combined with control signals, denoted "inc" and "abs", to derive carry signals c(i):
Incremented Sum, A+BA : c(i) = G/-1 ° v ~>Kj.- ° Λ inc (2a)
Absolute Difference, ( A-B | : c(i) = G/_1 °≡ abs v -> /-1 ° A abs (2b)
That is, if "inc" = 1 , A + B + 1 is computed instead of A + B, and if "abs" = 0, B - A is computed instead of A - B. Finally, the carry signals are XOR'd with the bit propagate signals to return the result: s(i) = p(i) ® c(i) — (3)
The present invention is aimed at modifying the prefix tree such that it returns group ' generate and group not kill signals for use in either full-length or packed arithmetic calculations to provide added functionality.
Some background information to this process is desirable. The prefix tree converts the input bit generate and bit not kill signals, g(i) and — .k(i), into group generate and group not kill signals, Gj° and -*Kp through a number of levels of logic operations. In general, Gz x represents a "group generate" signal across the bits from significance x up to and including significance z, and ->Kzx represents a "group not kill" signal across the same significances. It should be noted that Gj' = g(i) and Kj' = k(i). Each level of logic in the tree widens the range of the groups until the lower value of the range covered by the group is 0, and the upper value is i.
The bit combinations of (G^, -,K^) may be interpreted in terms of carry conditions, c(i), as shown in Table 1.
Table 1 Interpretation of (G^, -iKz ) bit combinations
where 'X' denotes a "don't care" condition (i.e. either a logic '1 ' or a logic '0'). Pairs of group signals are combined to yield compound group signals, C^, from pairs of group signals, CZY and Cy , as shown in Table 2, where z ≥ y, z ≥ w, w ≥ x, and y ≤ lrV+1 .
Table 2 Prefix adder cell function
Adopting the coding of Table 1 for the CK, CP, and CG conditions yields Table 3, which gives the required logic operations for the prefix tree.
Table 3 Prefix adder cell logic
Figure 2 shows the prefix tree proposed by Ladner and Fischer. The black squares are prefix cells, which implement the equation pair:
Both the group generate and the group not kill expressions are implementable as individual CMOS logic gates, and exploit the don't cares in Table 3 so as to minimise the complexity of the equation pair. At the output of the prefix tree, a pattern of carry conditions always emerges which comprises a string of CG & CK conditions, followed by a string (possibly null) of CP conditions. The trailing string of CP conditions identifies the trailing string of sum bits that must change from 1 to 0 when the sum is incremented, whence equation (2a).
According to the present invention there is provided an adder having circuitry for calculating the sum of or difference between pairs of unpacked binary numbers having 2n bits or packed binary numbers having 2n" bits where m < n, including: 2m sub-adders, each sub-adder partition including a plurality of columns and a plurality of rows of cells, each column of cells having an input cell in the lowermost row for receiving bits of each of the pairs of numbers, each sub-adder above the lowest significance sub-adder having a lowest significance column input cell arranged to receive a third input bit, and the cells in the remaining rows of the or each sub-adder above the lowest significance sub-adder being arranged to prevent the carry-over of a carry bit from the most significant column of the preceding sub-adder being introduced into the sub- adder, depending on whether the third input bit is zero or one.
Preferably, the lowest significance column input cells of the lowest significance sub-adder is the same as the lowest significance column input cells of the or each sub-adder above the lowest significance sub-adder.
Preferably, cells in the remaining rows below the uppermost row of the or each sub-adder having a lowest significance column input cell arranged to receive a third input bit, may include operational logic as set out in Table 5. In one form of adder, the cells in the uppermost row of the or each sub-adder above the lowest significance sub-adder may include operational logic as set out in Table 6.
In an alternative form, the cells in the uppermost row of the lowest significance sub-adder may include operational logic as set out in Table 7 and the cells in the uppermost row of the or each sub-adder above the lowest significance sub-adder may include operational logic as set out in Table 6.
The lowest significance column input cells of the lowest significance sub-adder may be the same as the lowest significance column input cells of the or each sub- adder above the lowest significance sub-adder.
Examples of parallel prefix adders in accordance with the present invention will now be described with reference to the accompanying drawings, in which:-
Figure 1 is a block diagram of a generic parallel prefix adder;
Figure 2 is a representation of the layout of cells in a 16-bit conventional Ladner-Fischer parallel prefix tree;
Figure 3 is a representation of the layout of cells in a 16-bit Ladner-Fischer parallel prefix tree as modified in accordance with the invention;
Figures 4 & 5 show the topology of the Ladner-Fischer parallel prefix adder including the parallel prefix tree shown in Figure 2;
Figure 6 illustrates the logic gates of the cells of the parallel prefix tree shown in Figure 5; Figures 7a-e show the logic diagrams of the numbered cells used to make up the cells of Figures 4, 5, 8 & 9;
Figures 8 & 9 show the topology of the Ladner-Fischer parallel prefix adder including the parallel prefix tree shown in Figure 3;
Figures 10a - d show the logic diagrams of additional numbered cells used to make up the cells of Figures 8 & 9 and,
Figure 11 illustrates the logic gates of the cells of the parallel prefix tree shown in Figure 9.
Figure 12 is a modified representation of the layout of cells in a 16 bit Ladner- Fischer parallel prefix tree. The key issue for designing a packed arithmetic prefix tree is to arrange for successive independent strings of the general pattern {CGICK} : {CP} to be returned so that the same structure for both full wordlength (w-bit) and sub-wordlength arithmetic may be employed.
The invention recognises that the introduction of a fourth symbol, denoted CB (B for "block"), that exploits the don't care states available in the prefix tree, and which replaces the CP condition at the lowest significant bit (Isb) of a sub-adder can lead to improved operation by redesign of appropriate cells.
The required characteristics of the CB condition are:
(a) unlike the CP condition, it must prevent bits with lower significances from interacting with bits with higher significances;
(b) in common with the CP condition, it must return a string of output CP conditions at the prefix tree's output from each sub-adder.
If these constraints can be met, a w-bit adder is partitionable into independent sub-wordlength adders.
The CB condition is representable by the combination (G^, -iKz*) = (1 ,0), implying that the CG condition must be represented by (G^, - KXX) = (1 ,1), and not
(Gz* -ιKz x) = (1 ,X). The codings for the CK and CP conditions remain unchanged. Table 4 below gives the input-output relationship for the packed prefix adder. It should be noted that Table 2 is subsumed by Table 4, indicating that ordinary addition operations are also supported if no CB conditions are introduced to the prefix tree.
Table 4 Packed prefix adder cell function
Table 5 presents the same information using the coding scheme for the different carry conditions just described.
Table 5 Packed prefix adder cell logic
The major effect of this cell's function relative to a normal prefix cell's function is to return CB conditions - (G
z , -iKz*) = (1 ,0), instead of CP conditions - (G
z ,-ιrz ) = (0,1 ), at the prefix tree's output. The logic equations described by Table 5 are:
-^Kz* = - Kzy Λ (--Kw* v Gzy) — (5)
Both expressions are implementable as single CMOS logic gates.
Now, because the sub-adders have a shorter wordlength than the entire adder, the bottom row of cells is logically redundant. However, in order to support the packed absolute difference operation, the output CS's must be converted to output CPs. This is readily accomplished by defining a second cell for the packed prefix adder that operates as a normal prefix cell if no Cβ's are input, but which also converts CB's to CPs. This cell is placed only at the foot of each column in the prefix tree, as illustrated in Figure 3, where the grey squares denote the second cell type.
The function and logic of the second cell are described in Tables 6 and 7 below. Cells with only one apparent input are reduced complexity cells whose sole function is to convert input CB conditions to output CP conditions.
Table 6 Second prefix adder cell function
Table 7 Second prefix adder cell logic
The logic equations described by Table 7 are: G
zx = ^K
zy Λ (G
zy v G
w X)
Again, both expressions are implementable as single CMOS logic gates. The cells with only one input (g, ~>k) pair have the following simplified expressions:
i.e. compared with equation (6), Gy is set to a logic '0', and ->Kv/ to a logic '1'. Again, both these expressions are implementable as single CMOS logic gates.
The inputs to the adder where a CB condition may need to be injected to mark the Isb's of sub-adders require extra logic to return the correct values of (g, ->k). CG and CK conditions prevent carries from lower significances interacting with carries at higher significances in any case: hence, we only need to replace input CP conditions by input CB conditions where necessary. The full Table for deriving (g, -•k) duples under normal or packed operation as a function of a control signal labelled mode is shown in Table 8:
Table 8 Input logic for packed prefix tree
The equations implied by Table 8 are again implementable using single CMOS logic gates whose inputs are a, b, and mode:
-"/ = a Λ b v ->mode A (a v b) g = a Λ ύ v mode A (a v b) (8)
Once the carry conditions (following either a full wordlength or packed arithmetic operation) emerge from the prefix tree, they must be combined with the bit propagate signals and other control signals so as to return the required results, according to equation (2). This involves supplying the correct control signals to the appropriate sub-adders so that the desired result is computed using some simple output logic.
For packed arithmetic to be supported then the topology of the prefix tree is restricted such that for any sub-adder traversing bits / - / +k (i.e. a / +1-bit sub-adder, whose Isb is at bit position /), the prefix tree's topology must be able to return the group signals, Cj, for values of / that satisfy / < /' < / +/ at the penultimate logic level. This restriction is to accommodate the necessary CB to CP conversions in packed arithmetic mode within an existing prefix tree. For example, the Ladner-Fisher prefix tree of Figures 2 and 3 does satisfy this restriction, permitting both 4-bit and 8-bit arithmetic to be supported.
Two examples of adders according to the invention will now be described with reference to figures 8 to 11.
The individual cells used in the packed arithmetic prefix adder circuits illustrated in Figures 7 & 10 have been described earlier in this document. In summary, there are 5 distinct cell types in the invention, two of which have two versions:
cells 2a & 2b Figures 7a & 7b carry update cell cell 3 Figure 7d output cell cell 5 Figure 10a PAPA prefix cell cells 6 & 7 Figures 10b & 10c CB:CP conversion cells cell 8 Figure 10d input cell
The input to output relationships of cells 2a, 2b and 3 are described by equations (2a), (2b), and (3) respectively. In tabular form, these cells' functions may be written as shown in Tables 9 to11 :
Table 9 Cell 2a input to output relationship
Table 10 Cell 2b input to output relationship
Table 11 Cell 3 input to output relationship
The function of cell 5 was presented in Table 5, the function of cell 6 was presented in Table 7 (cell 7 is derived from cell 6 by assigning (Gw x, -'K ') = (0,1)) and the input cell's function (cell 8) was shown in Table 8.
The operation of the packed arithmetic prefix adder circuit drawn in Figure 8 proceeds as follows: two 8-bit numbers, a(0:7) and b(0:7), are supplied on the eight pairs of a(i) and b(i) inputs. If the adder is to operate as a single 8-bit adder, mode is set to logical '0'; otherwise, if the adder is to operate as two independent 4-bit adders, mode is set to logical '1'. The input cells (types 1 and 8) convert the input bits to the appropriate carry conditions CXj, for 0 < / < 7, where signifies one of G, P, K, or β, as listed in Table 12.
Table 12 Conversion of input bits to carry conditions
In the prefix carry tree, the separate carry conditions are represented as pairs of bits, (G^, -^Kz*) as shown in Table 13. Cell types 5, 6, 7 and 2 can all be described using the concept of carry conditions.
Table 13 Interpretation of (G^, -,/ zx) bit combinations
The operation of the prefix carry tree in Figure 8 proceeds as follows: the first (lefthand-most) column of cell 5's combines the carry conditions of selected pairs of adjacent bits to form group carry conditions, CXA. The top cell 5 combines g(0), ->/c(0), g(1 ), and ->Λ(1 ) to form the bit-pair (G °, - K °), also referred to as CX-,0; the next cell down combines g(2), ~, f(2), g(3), and -" f(3) to determine CX3 2, the third cell down combines g(4), ~' (4), g(5), and ~"k(5) to determine CX5 4, and the bottom cell combines g(6), ~I/ (6), g(7), and ~>k(7) to determine CX7 6.
The second column of cell 5's combines the outputs of the first column of cell 5's and some of the outputs of cell 1 's to yield further carry conditions (reading from top to bottom): CX2°, CX3°, CX6 4, and CX7 4 as follows: the top cell combines G-, , -- g(2) and -<k(2) to determine CX2°; the next cell down combines G- -,K1°, G3 2, and
-,K3 2to determine CX3°; the third cell down combines G5 4, -"Ks4, g(6) and -> (6) to determine CX6 4; the bottom cell combines G5 4, ->K5 , G , and ->K7 e to determine CX7 4.
Next, the column of cell 6's and cell 7's either converts CB conditions to CP conditions if mode = 1 (introduced by cell 8) or groups the outputs from selected cell 5's (and cell 8) to return CX °, CX5°,
CX6°, and CX7° if mode = 0 That is, if any of the four cell 7's receive a CB input condition, they will change it to a CP condition. Similarly, if the top cell 6 receives a CB input condition on the bit-pair ~l (4), g(4), it will output a CP condition on G4°, -> 4°; the other cell 6's operate in this manner too, converting CB conditions received on G4, _ι </ 4 to CP conditions on their outputs, G°, -"K,0 for 5 < / < 7. If mode = 0, no CB conditions can occur and the cell 6's combine the outputs of cell 5's and some cell 1's to yield further carry conditions (reading from top to bottom): CX4°, CX5°, CX6°, and CX7° as follows: the top cell combines G3°, -^Kz0, g(4) and ->k(4) to determine CX °; the next cell down combines
G5 4, -<K5 Λ, G3°, and -- 3 0to determine CX5°; the third cell down combines G6 4, ^rC G3°, and -> 3° to determine CX6°; the bottom cell combines G7 4, ->K7 4, G3°, and ^K3°to determine CX7°. In all cases, no CB conditions are output from the column of cell 6's and cell 7's. Next, the column of cell 2a's combines the carry conditions, CX°, with the control signal inc if 1 is to be added to the result to adjust the carry conditions from CP° to CG° as needed. Finally, the last column of cell 3's XOR's the carry bits, c(i), with the bit propagate signals, p( ), to return the resultant sum bits.
The operation of the prefix carry tree in Figure 9 proceeds in a largely similar fashion to that of Figure 8, except that there are no cell 7's in the 4th column and the control signal is abs, not inc. As before, the first (lefthand-most) column of cell 4's and cell 5's combines the carry conditions of selected pairs of adjacent bits to form group carry conditions, CX/+1'. The second column of cell 4's and cell 5's combines the outputs of the first column of cell 4's and cell 5's and some of the outputs of cell 1's to yield the following carry conditions (reading from top to bottom): CX2°, CX3°,
CX6 4, and CX7 4. Again, the column of cell 6's either converts CB conditions to CP conditions if mode = 1 (introduced by cell 8) or groups the outputs from selected cell 5's (and cell 8) to return CX °, CX5°, CX6°, and CX7° if mode = 0
In either case, no CB conditions are output from the column of cell 6's. Next, the column of cell 2b's combines the carry conditions, CX°, with the control signal abs
to return the final carry signals, c(/+1). Finally, the last column of cell 3's XOR's the carry bits, c(i), with the bit propagate signals, p(i), to return the resultant sum bits.
The packed prefix adder has been described above using the group generate and group not kill signals, Gz* and -"Kz*. However, packed arithmetic prefix adders may also be constructed using the codings laid out in either Table 14 or Table 15 in conjunction with Tables 4 and 6, that is, any "2-out-of-3" combination of g(i), p(i), and ->k(i) may be employed in the prefix tree. However, different logic expressions from those presented in equations (3-6) would result. At the input logic stage, the same expressions for g(i) and -*k(i) are required as listed in equation (7), with p(i) unaffected by the value of mode.
Table 14 Interpretation of (G^, P^) bit combinations
Table 15 Interpretation of (-1KZ X, Pz x) bit combinations
The topology of the trees described above may place restrictions on the placement of the special input cells.
This disadvantage can be overcome by changing the topology of the tree as shown in Fig. 12, where the black and grey squares represent the same cells as in Fig. 3. This new topology causes all of the restrictions on the topology of the tree and on the placement of the special input cells to disappear. This follows because in the adder structure, every output from the prefix cell is a function of all the input positions up to and including its own significance. Hence, provided the outputs can be restricted to the set {CP,CG,CK} as in a non-packed prefix adder, no restrictions are
placed on the location of the special input cells that introduce the CB conditions. The grey cells have the property that they only output the CB condition if they receive a CB condition on their "horizontal" input from a cell of lower significance and simultaneously receive a CP condition on their "vertical" input from the same significance.
Hence, by inspection of Fig. 12, it will be appreciated that the only way any of the grey cells can output a CB condition is if a CB condition is injected at bit 0. To prevent this from happening, the constraint is imposed that bit 0 may not have a special input cell that partitions the adder. This is not a serious constraint because the adder begins at bit 0 in any case.
Fig. 12 is a modified representation of the layout of cells in a 16 bit Ladner- Fischer parallel prefix tree. The illustrated device may be better understood by considering Fig. 12 in conjunction with Fig. 8. Relative to Fig. 8, the modification of Fig. 12 proposes that: firstly the cell 7's are removed; secondly that the cells 5 in rows 2 and 3 and the right hand most cell 5 in row 4 are changed to cell 6's and finally that any of the cell 1 's may be converted to cell 8's except the one in row 1 , i.e., at bit position 0.
This change permits any prefixed tree topology to support packed arithmetic with partitions happening anywhere in the adder except at bit position 0.