GB2166272A

GB2166272A - Serial multiplier circuit

Info

Publication number: GB2166272A
Application number: GB08427236A
Authority: GB
Inventors: Stephen John Harrold
Original assignee: STC PLC
Current assignee: STC PLC
Priority date: 1984-10-27
Filing date: 1984-10-27
Publication date: 1986-04-30
Also published as: GB2166272B; GB8427236D0

Abstract

The delay latch of a serial multiplier module is disposed at the module input. This allows cascading of a plurality of similar modules without the need for additional circuitry. For handling bits having a negative weighting as required by two's complement data in a multiplier implementing a shift and add algorithm an extra input is provided. This input is connected to logic zero in all stages except the final module where it is logic one for identification. Modules are described which implement Booth's algorithm and a modified Booth's algorithm. <IMAGE>

Description

SPECIFICATION Serial multiplier circuit This invention relates to cascadable circuits for performing pipelined two's complement serial multiplication.

A number of integrated circuits have been designed for performing two's complement multiplication. In such arrangements the multiplication of two serial data words is generally performed by arranging that each bit of the multiplier (or coefficient) word is stored in a different stage of the multiplier and that the multiplicand (or data) word is then operated upon by each coefficient bit to form partial products. Summing the partial products with the correct weighting then produces the final product.

In order to be able to process a continuous stream of data words and take full advantage of the pipelining of a multiplier, the product (and all partial products) of a multiplication operation must be truncated or rounded to the same length as the multiplicand and the multiplier words. Further, if longer coefficient words are to be processed it is necessary to cascade two or more identical multiplier circuit chips.

Such chips and designs that are currently available require a modified final stage to correctly perform two's complement multiplication due to the negative weighting of the most-significant-bit (MSB). This modification, however, is only required in the final stage, and thus these chips and designs cannot be simply cascaded together.

The object of the present invention is to minimise or to overcome this disadvantage.

According to the invention there is provided a cascadable circuit module for performing pipelined two's complement serial multiplication, the circuit having means for multiplying a pair of digital words, and means whereby the circuit may be cascaded by direct coupling with a plurality of similar modules.

The circuit module may be cascaded with similar modules on a single chip, and a plurality of such chips may also be cascaded without the need for interface circuitry.

Many algorithms exist to simplify the calculation of the product of two's complement words; for example, it is possible to treat all data bits as positive and then to add a correction term to the product after its calculation. For serial multipliers three algorithms are suitable, namely the Shift-and-Add algorithm, Booth's algorithm, and a five-level modified Booth's algorithm. These algorithms are described below.

The shift and add algorithm A two's complement n-bit word (a01, an 2,...

aO) has a value of:

The multiplication of two words must therefore take into consideration the negative weighting of the MSB (an, ). This is generally taken care of by "inverting and adding one" when forming the most significant partial product (PP), and by "sign extending" the MSB of less significant partial products.

For example: 1.01 x 1.11 1.1101 note sign extension of MSB 1.101 note sign extension of MSB 0.11 "invert and add one" (io) 0.0011 The overflow bits in the final product are ignored.

If a product is to be formed in a continuously pipelined serial multiplier, then all parallel products must be restricted to n-bits. This is the basis of the "shift and add" algorithm. Instead of forming all the partial products separately and then the final product in one accumulation calculation, a "partial product sum" (PPS) is formed after the calculation of each partial product. This PPS is then sign extended and truncated and then added to the next PP at the correct weighting (by "shifting and adding") to form the next PPS: i.e. 1.01 x 1.11 0.00 PPSO 1.01 PP1 (first partial product) 1.01 PPS1 (first partial product sum) 1.10 Shift (sign-extend and truncate) 1.01 PP2 (1) 0.11 PPS2 Add (NB carry overflow is "lost") 0.01 Shift 0.11 PP3 ("invert and add one" for most significant partial product) 1.00 PPS3 Add i.e.Product = 1.00 Note that each PP and PPS is restricted to n-bits throughout the calculation. This means that input and output data are the same length, and a continuous stream of data can be processed.

In the example given above, an error arises in the final product due to the overflow which occurred during the formation of PPS2 so that the correct bit was sign-extended. This overflow error can be avoided if the initial multiplicand word is sign-extended so that the multiplication becomes: 11.01 x 1.11 00.00 PPSO 11.01 PP1 11.01 PPS1 11.10 Shift 11.01 PP2 (1) 10.11 PPS2 carry-overflow lost, but sign-extension is now of correct bit ('1') Shift 00.11 PP3 (invert and add one) (1) 00.00 (ignore carry overflow) i.e.Final product = 00.00 The truncation which occurs during the shift and add operation means that the product is also truncated (the full result should be 00.0011). If rounding of the final product is desired, a '1' should be added at an appropriate stage so that it has 1/2 LSB significance relative to the final (truncated) product. This "rounding bit" could, for example, be added to the penultimate stage: e.g. 11.01 x 1.11 00.00 11.01 PP1 11.01 PPS1 11.10 shift 11.01 PP2 1 rounding bit (1) 11.00 PPS2 1110 shift 00.11 PP3 (1) 00.01 Product i.e. Rounded product = 00.01 Alternatively, the rounding bit could be added in the first stage: e.g. 11.01 x 1.11 00.00 11.01 PP1 1 rounding bit 11.11 PPS1 11.11 shift 11.01 PP2 (1) 11.00 PPS2 1110 shift 00.11 PP3 (1) 00.01 i.e.Rounded product = 00.01 Any stage may be used for adding in the rounding bit, provided that the bit has the correct weighting significance relative to the final product.

Booth's algorithm The shift and add algorithm requires the stage forming the most significant PP to "invert and add one", and all other stages to simply add. One possible way of using identical stages throughout is to recode the multiplier word using Booth's algorithm, Here each bit is compared with the preceding (less significant) bit, and a new (recoded) bit is then formed. The recoded bit value is listed below: Multiplier bits Recoded bit Action bi by 1 bj 0 0 0 Add zero 0 1 +1 Add 1 0 -1 Subtract 1 1 0 Add zero All stages are now required to perform either "add", "subtract", or "do nothing", and can thus all be identical. Cascadability is inherent with this algorithm, Note that in order to form the least significant recoded bit, the LSB of the multiplier is compared with a '0'.

If the Booth algorithm is combined with the shift and add algorithm, the product (and all PPs) can be restricted in length to allow a constant data flow. Again, rounding can be effected by the addition of a rounding bit of the correct significance to any stage.

e.g.1.01 x 1.11 Recorded multiplier is 0.01 1.01 x 0.01 0.00 0.11 PP1 ("invert and add one" for subtraction) 0.11 PPS1 0.01 Shift 0.00 PP2 0.01 PPS2 0.00 Shift 0.00 PP3 0.00 i.e. (Truncated) product = 0.00 It is not necessary to sign-extend the original multiplicand to prevent errors due to carry-overflow.

Five-level (modified booth) algorithm Further recoding of the multiplier can be achieved by comparing alternate bits with the preceding and the following bits. This has the advantage that the recoded multiplier has only half the number of bits of the original multiplier, and thus a serial multiplier can be constructed using half the number of stages. The recoding sequence is listed below: Recoded bit Action bits bi bl bi 0 0 0 0 0 add zero 0 0 1 +1 )add 0 1 0 +1 0 1 1 +2 add x 2 1 0 0 -2 subtract x 2 1 1 0 1 -1 )subtract 1 1 0 -1 1 1 1 0 add zero Note that when recoding the multiplier LSB, the preceding bit is again assumed to be '0'. Furthermore, when recoding the MSB, a sign-extension bit is assumed to follow.

e.g. 1.01 x 1.11 Recoded multiplier = 0.1 The addisubtract x 2 operation is simply performed by shifting the multiplicand left, and then adding or subtracting.

By combining with the shift and add algorithm, the five-level algorithm can also produce a restricted bit-length product, and thus can also be used to process a continuous data flow. Since the recoding is only by alternate bits, a double-shift operation is required in order to ensure that the partial products are added with the correct weighting: e.g. 1.01 x 0.1 0.00 0.11 PP1 0.11 PPS1 0.00 Shift (twice) 0.00 PP2 0.00 i.e. (Truncated) product = 0.00 e.g. 1.01 x 0.1 0.00 0.11 PP1 0.11 PPS1 0.00 Shift (twice) 0.00 PP2 0.00 i.e. (Truncated) product = 0.00 Again, rounding can be effected by adding a rounding bit with the correct significance at any stage.

Further levels of recoding are possible, but offer no advantage as there is no simple way of implementing addlsubtract x 3 in one operation.

Embodiments of the invention will now be described with reference to the accompanying drawings in which: Figure 1 shows the architecture of a conventional non-cascadable 3 stage serial multiplier; Figure 2 shows the basic cascadable module; Figure 3 shows a cascadable module for implementing the drift and add algorithm; Figure 3a illustrates the data format of the module of Figure 3.

Figures 4 and Sshow modules for implementing respectively Booth's algorithm and the modified Booth's algorithm; Figures 4a and 5a illustrate the date formats corresponding to the modules of Figures 4 and 5 respectively, and Figure 6 shows a latch arrangement for cascading five level multiplier chips.

Figure 1 shows the architecture of a 3-stage serial multiplier. The modular block is indicated by the lines x-x. Data and coefficient words (LSB-first) are clocked through the circuit, the control signal L (synchronised with the data LSB) ensuring that successive bits of the coefficient word are latched into each module. A partial product is then formed by the action of this latched coefficient bit upon the data word through the logical AND gate (PP1, PP2, PP3). A full-adder then sums this partial product with the partial product sum formed by the previous stage to form a new partial product sum. Delay latches between stages ensure that this summation occurs between bits having the same weighting factor.Note that with this architecture no truncation of the partial product sums occurs, sign extension bits must be obtained by supplying sign extension bits with the original data word, and no bits having a negative weighting can be handled. The arrangement of Figure 1 is not cascadable without the introduction of additional circuitry.

Figure 2 shows a module suitable for a serial multiplier with the modification necessary to allow a continuous stream of data words and products (pipelining), i.e. the module is cascadable. The delay latch at the PPS input can be configured by the control signal to simultaneously extend the sign of the partial-product sum of one calculation and to truncate the partial product sum of the next calculation. This latch and the controlling circuitry are normally associated with the PPS output of a conventional module and it is this position which requires that the final stage of such a conventional module be modified and thus prevents chips being cascadable (since sign-extension and truncation of the final PPS is not required). In the module shown in Figure 2 the delay latch and control circuitry are associated with the PPS input.It is this construction of the module which allows identical two's complement multiplier chips to be simply cascaded.

The switch associated with the carry-input (Cj) of the full-adder ensures that Cj=0 when the LSB of each partial product sum is formed.

The basic module shown in Figure 2 is not capable of bandling bits having a negative weighting as required by two's complement data. a modification to effect this is shown in Figure 3 for a multiplier implementing the shift-and-add algorithm. Here an extra input (FLAG) is supplied, and this input is tied to logic zero in all stages except the final module where it is logic one, and thus identifies the final stage. The partial product sum formed in the final stage is then inverted by the exclusive-OR gate, and the switch associated with the C; input of the full-adder ensures that a '1' is added as required by the shift-and-add algorithm described earlier.

Figure 3a shows the data format of a multiplier using the module of Figure 3.

Figures 4 and 5 show modules, derived from the basic module of Figure 2, for implementing Booth's algorithm and the modified Booth's algorithm, the corresponding data formats being shown in Figures 4a and 5a respectively. Note the important feature in Figures 2, 3, 4 and 5 is the position of the delay latch responsible for performing the sign-extension and truncation which makes two's complement multipliers formed from these modules fully cascadable, although the modified Booth's algorithm version requires extra latches as shown in Figure 6. Many other possible implementations exist, for instance the control lines Land R may be merged if desired. In all these constructions sign-extension and truncation is performed at the input of each module rather than the output. This allows chips for two's complement multiplication to be cascaded so that long coefficient wordlengths can be handled without difficulty.

The module arrangements of Figures 2 to 5 may be employed in a variety of applications, but they are particularly suitable for the construction of GaAs cascadable serial multiplier integrated circuits.

Claims

1. A cascadable circuit module for performing pipelined two's complement serial multiplication, the circuit having means for multiplying a pair of digital words, and means whereby the circuit may be cascaded by direct coupling with a plurality of similar modules.

2. A circuit module as claimed in claim 1, wherein said cascading means includes an input delay latch.

3. a circuit module as claimed in claim 1 or 2 and adapted to implement Booth's algorithm or a modified Booth's algorithm.

4. A cascadable circuit module substantially as described herein with reference to Figure 2 together with any one of Figure 3, Figure 4 or Figure 5 of the accompanying drawings.

5. A serial multiplier incorporating a plurality of cascaded modules as claimed in any one of claims 1 to 4.