CN107977191B - Low-power-consumption parallel multiplier - Google Patents

Low-power-consumption parallel multiplier Download PDF

Info

Publication number
CN107977191B
CN107977191B CN201610920203.1A CN201610920203A CN107977191B CN 107977191 B CN107977191 B CN 107977191B CN 201610920203 A CN201610920203 A CN 201610920203A CN 107977191 B CN107977191 B CN 107977191B
Authority
CN
China
Prior art keywords
transistor
bit
gate
multiplier
partial product
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610920203.1A
Other languages
Chinese (zh)
Other versions
CN107977191A (en
Inventor
陈岚
张琦
吴玉平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Microelectronics of CAS
Original Assignee
Institute of Microelectronics of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Microelectronics of CAS filed Critical Institute of Microelectronics of CAS
Priority to CN201610920203.1A priority Critical patent/CN107977191B/en
Publication of CN107977191A publication Critical patent/CN107977191A/en
Application granted granted Critical
Publication of CN107977191B publication Critical patent/CN107977191B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only
    • G06F7/527Multiplying only in serial-parallel fashion, i.e. one operand being entered serially and the other in parallel
    • G06F7/5277Multiplying only in serial-parallel fashion, i.e. one operand being entered serially and the other in parallel with column wise addition of partial products
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only
    • G06F7/533Reduction of the number of iteration steps or stages, e.g. using the Booth algorithm, log-sum, odd-even
    • G06F7/5332Reduction of the number of iteration steps or stages, e.g. using the Booth algorithm, log-sum, odd-even by skipping over strings of zeroes or ones, e.g. using the Booth Algorithm

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention provides a low-power consumption parallel multiplier, comprising: the partial product compression device comprises a partial product generation module, a partial product compression module and a carry skip adder. The partial product generation module comprises a Booth coding circuit and a decoding circuit, wherein the Booth coding circuit codes adjacent bit values of a first multiplier into target parameters, the decoding circuit decodes bit values of a second multiplier and the target parameters into partial products, and the partial product generation module reduces the number of the partial products by half, greatly saves the area of a multiplier circuit and improves the operation speed of the multiplier circuit. The partial product compression module comprises a one-bit full adder and a summation circuit, wherein the one-bit full adder outputs an inversion value of carry according to partial products, the summation circuit adds the partial products to generate two target partial products with different weights, and the generated target partial products are output to a lower-level compression module, so that the speed of compressing the partial products is greatly improved. The carry-skip adder includes a plurality of CSA modules for obtaining a target product.

Description

Low-power-consumption parallel multiplier
Technical Field
The invention relates to the technical field of integrated circuit design, in particular to a low-power-consumption parallel multiplier.
Background
With the increasing demand for portable mobile devices, low power design has become a major requirement for integrated circuit design. The multiplier is an important arithmetic unit in devices such as a processor, a filter, a Digital Signal Processor (DSP), and the like, and the calculation speed thereof directly determines the performance of the processor.
At present, as shown in fig. 1, a commonly used parallel multiplier generally uses a Booth coding algorithm to generate partial products in parallel, then performs accumulation compression on all the obtained partial products to obtain two partial products, and then adds the two partial products by an adder to obtain a final product.
The inventor finds that the algorithm of the existing parallel multiplier is complex, the circuit structure is complex, and the occupied volume is large. Therefore, how to provide a multiplier that not only satisfies the requirements of simple circuit structure, fast calculation speed, but also satisfies low power consumption is a major technical problem to be solved urgently at present.
Disclosure of Invention
In view of this, the present invention provides a low power consumption parallel multiplier, which has a simple circuit structure, a fast calculation speed and low power consumption.
In order to achieve the purpose, the invention provides the following technical scheme:
a low power consumption parallel multiplier comprising: a partial product generation module, a partial product compression module and a carry skip adder,
the partial product generation module comprises a Booth coding circuit and a decoding circuit, wherein the Booth coding circuit codes adjacent bit values of a first multiplier into a target parameter, and the decoding circuit decodes bit values of a second multiplier and the target parameter into a partial product;
the partial product compression module comprises a one-bit full adder and a summation circuit, wherein the one-bit full adder outputs an inverted value of a carry bit according to the partial product, the output circuit adds the partial products to generate two target partial products with different weights, and the generated target partial products are output to a lower-level compression module;
the carry skip adder comprises a plurality of CSA modules, and each CSA module comprises a plurality of the one-bit full adders for obtaining the target product.
Preferably, the Booth encoding circuit includes: a first exclusive-OR gate, a first exclusive-OR gate and a second exclusive-OR gate,
a first bit value of the first multiplier and a second bit value of the first multiplier are respectively used as input ends of the first exclusive-OR gate, and an output end of the first exclusive-OR gate is used for outputting a first target parameter;
the second bit value of the first multiplier and the third bit value of the first multiplier are respectively used as the input end of the first exclusive-or gate, and the output end of the first exclusive-or gate is used for outputting a second target parameter;
a first bit value of the first multiplier and a second bit value of the first multiplier are respectively used as input ends of the second exclusive-or gate, and an output end of the second exclusive-or gate is used for outputting a third target parameter;
a third bit value of the first multiplier is taken as a fourth target parameter.
Preferably, the decoding circuit includes: a third exclusive-OR gate, a fourth exclusive-OR gate, a first NAND gate, a second NAND gate and a third NAND gate,
the first bit value of the second multiplier and the fourth target parameter are used as input ends of the first exclusive-or gate;
a second bit value of the second multiplier and the fourth target parameter are used as input ends of the second exclusive-or gate;
the output end of the third exclusive-or gate, the second target parameter and the first target parameter are used as the input end of the first nand gate;
the output end of the fourth exclusive-or gate and the third target parameter are used as the input end of the second nand gate;
and the output end of the first NAND gate and the output end of the second NAND gate are used as the input ends of the third NAND gate, and the output end of the third NAND gate is used for outputting the partial product.
Preferably, the logic output of the one-bit full adder is:
Figure BDA0001135553590000021
Co=AB+Ci(A+B)
wherein S is a target partial product, A is a first partial product, B is a second partial product, Ci is a third partial product, C0Is the carry value of the partial product.
Preferably, the one-bit full adder includes: a first transistor, a second transistor, a third transistor, a fourth transistor, a fifth transistor, a sixth transistor, a seventh transistor, an eighth transistor, a ninth transistor, and a tenth transistor,
the source electrode of the first transistor, the source electrode of the second transistor and the source electrode of the seventh transistor are connected and are connected with Vcc;
the drain electrode of the first transistor, the drain electrode of the second transistor and the source electrode of the third transistor are connected;
the drain electrode of the seventh transistor is connected with the source electrode of the eighth transistor;
the drain electrode of the third transistor, the drain electrode of the eighth transistor, the drain electrode of the fourth transistor and the drain electrode of the ninth transistor are connected, and the common connection end is used for outputting an inverted value of the carry bit;
the source electrode of the fourth transistor, the drain electrode of the fifth transistor and the drain electrode of the sixth transistor are connected;
a source of the ninth transistor is connected to a drain of the tenth transistor;
the source electrode of the fifth transistor, the source electrode of the sixth transistor and the source electrode of the tenth transistor are all grounded;
the grid electrode of the third transistor is connected with the grid electrode of the fourth transistor and is used as the input end of the third partial product;
the grid electrode of the first transistor, the grid electrode of the seventh transistor, the grid electrode of the fifth transistor and the grid electrode of the ninth transistor are used as input ends of the first partial product;
and the grid electrode of the second transistor, the grid electrode of the eighth transistor, the grid electrode of the sixth transistor and the grid electrode of the tenth transistor are used as input ends of the second partial product.
Preferably, the summing circuit includes: a fifth xor gate and a sixth xor gate,
the first partial product and the second partial product are used as input terminals of the fifth exclusive-or gate, the output terminal of the fifth exclusive-or gate and the third partial product are used as input terminals of the sixth exclusive-or gate, and the output terminal of the sixth exclusive-or gate is used for outputting the target partial product.
Preferably, said carry skip adder comprises 4 8-bit modules, said 8-bit module comprises two 4-bit CSA modules, said CSA modules comprise 4 of said one-bit full adders and a 2-input data selector.
Preferably, four of the one-bit full adders are cascaded, and if the input of the previous full adder is not inverted, the input of the current full adder is inverted.
Preferably, each one-bit full adder in the 4-bit CSA module generates a carry bit, and the four carry bits are subjected to an and operation to obtain an output as the control end of the 2-input data selector, the carry input of the lowest-order full adder and the carry output of the highest-order full adder in the 4-bit CSA module are used as the input end of the 2-input data selector, and the output end of the 2-input data selector is used as the carry output of the 4-bit CSA module.
Compared with the prior art, the technical scheme provided by the invention has the following advantages:
the invention provides a low-power consumption parallel multiplier, comprising: the partial product generation module comprises a Booth coding circuit and a decoding circuit, wherein the Booth coding circuit codes adjacent bit values of a first multiplier into target parameters, the decoding circuit decodes bit values of a second multiplier and the target parameters into partial products, and the partial product generation module reduces the number of the partial products by half, greatly saves the area of a multiplier circuit, improves the operation speed of the multiplier circuit and is easy for realizing VLSI. The partial product compression module comprises a one-bit full adder and an output circuit, wherein the one-bit full adder outputs an inverted value of a carry according to the partial product, the output circuit adds the partial products to generate two target partial products with different weights, and outputs the generated target partial products to a lower-level compression module, and the speed of compressing the partial products is greatly improved by the partial product compression module. The carry skip adder comprises a plurality of CSA modules, and each CSA module comprises a plurality of one-bit full adders and a 2-input data selector and is used for obtaining a target product. The partial product compression module adopts a one-bit full adder, so that the circuit structure is simple while the requirement of quick calculation is met, and extremely low power consumption is realized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a diagram illustrating a multiplier according to the prior art;
fig. 2 is a schematic structural diagram of a low power consumption parallel multiplier according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a specific structure of a Booth encoding circuit provided in this embodiment;
fig. 4 is a schematic diagram of a specific structure of a decoding circuit according to this embodiment;
fig. 5 is a schematic circuit diagram of a 1-bit full adder provided in this embodiment;
fig. 6 is a schematic diagram of a specific structure of a summing circuit according to this embodiment;
FIG. 7 is a schematic structural diagram of a 4:2 compressor employed in the present embodiment;
FIG. 8 is a schematic diagram of a compressor assembly employed in the present embodiment;
FIG. 9 is a schematic diagram of another compressor combination used in the present embodiment;
fig. 10 is a schematic diagram of another specific structure of the output circuit provided in this embodiment;
fig. 11 is a schematic circuit diagram of an xor gate provided in this embodiment;
fig. 12 is a schematic circuit diagram of a circuit for implementing a four-input and gate function by combining a nand gate and a nor gate according to the present embodiment;
fig. 13 is a circuit diagram of a 2-input MUX according to the present embodiment;
fig. 14 is a schematic circuit diagram of a first 4-bit CSA module in the 8-bit module according to the present embodiment;
fig. 15 is a schematic circuit diagram of an 8-bit module according to the present embodiment;
fig. 16 is a schematic circuit diagram of the 32-bit module according to this embodiment.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a low-power consumption parallel multiplier, comprising: the partial product generation module comprises a Booth coding circuit and a decoding circuit, wherein the Booth coding circuit codes adjacent bit values of a first multiplier into target parameters, the decoding circuit decodes bit values of a second multiplier and the target parameters into partial products, and the partial product generation module reduces the number of the partial products by half, greatly saves the area of a multiplier circuit, improves the operation speed of the multiplier circuit and is easy for realizing VLSI. The partial product compression module comprises a one-bit full adder and a summation circuit, wherein the one-bit full adder outputs an inverted value of a carry bit according to the partial product, the output circuit adds the partial products to generate two target partial products with different weights, and outputs the generated target partial products to a lower-level compression module, and the speed of compressing the partial products is greatly improved by the partial product compression module. The carry skip adder comprises a plurality of CSA modules, and each CSA module comprises a plurality of one-bit full adders and a 2-input data selector and is used for obtaining a target product. The partial product compression module adopts a one-bit full adder, so that the circuit structure is simple while the requirement of quick calculation is met, and extremely low power consumption is realized.
Referring to fig. 2, fig. 2 is a schematic diagram illustrating a low power consumption parallel multiplier according to the present embodiment. The parallel multiplier comprises: a partial product generation module 10, a partial product compression module 20, and a carry skip adder 30.
The partial product generation module comprises a Booth coding circuit and a decoding circuit, wherein the Booth coding circuit codes adjacent bit values of a first multiplier into a target parameter, and the decoding circuit decodes bit values of a second multiplier and the target parameter into a partial product; the partial product generating module reduces the number of partial products by half, greatly saves the area of a multiplier circuit, improves the operation speed of the multiplier circuit and is easy to realize VLSI.
The partial product compression module comprises a one-bit full adder and a summation circuit, wherein the one-bit full adder outputs an inverted value of a carry bit according to the partial product, the output circuit adds the partial products to generate two target partial products with different weights, and the generated target partial products are output to a lower-level compression module; the partial volume compression module greatly increases the speed of compressing the partial volume.
The carry skip adder comprises a plurality of CSA modules, the CSA modules comprise a plurality of one-bit full adders and 2-input data selectors and are used for obtaining target products, the partial product compression module adopts the one-bit full adder, fast calculation is achieved, meanwhile, the circuit structure is simple, and extremely low power consumption is achieved.
On the basis of the foregoing embodiments, the present embodiment provides a specific structure of a Booth encoding circuit, as shown in fig. 3, including: a first exclusive-OR gate I1, a first exclusive-OR gate I2, and a second exclusive-OR gate I3,
a first bit value B of the first multiplier2i-1A second bit value B of the first multiplier2iThe output ends of the first exclusive-OR gates I1 are respectively used as input ends of the first exclusive-OR gates I2;
a second bit value B of the first multiplier2iAnd a third bit value B of the first multiplier2i+1The first exclusive-or gate I2 is respectively used as an input end, and an output end is used for outputting a second target parameter Z;
a first bit value B of the first multiplier2i-1A second bit value B of the first multiplier2iThe output ends of the second exclusive-or gates I3 are respectively used as input ends of the second exclusive-or gates I1;
a third bit value B of the first multiplier2i+1As a fourth target parameter Neg.
In addition, the present embodiment provides a decoding circuit, as shown in fig. 4, including: a third exclusive-OR gate I4, a fourth exclusive-OR gate I5, a first NAND gate I6, a second NAND gate I7 and a third NAND gate I8,
a first bit value A of the second multiplierj-1And the fourth target parameter Neg as an input of the first exclusive or gate I4;
a second bit value A of the second multiplierjAnd the fourth target parameter Neg as an input of the second exclusive or gate I5;
the output end of the third exclusive-or gate I4, the second target parameter Z and the first target parameter X2 are used as input ends of the first NAND gate I6;
the output end of the fourth exclusive-or gate I5 and the third target parameter X1 serve as input ends of the second NAND gate I7;
the output of the first NAND gate I6 and the output of the second NAND gate I7 serve as the inputs of the third NAND gate I8, the output of which is used for outputting the partial product PPij
With reference to fig. 3 and 4, the truth table of the above circuit diagram is as follows:
TABLE 1
B2i+1 B2i B2i-1 Func Neg Z X1 X2
0 0 0 0 0 1 1 0
0 0 1 +A 0 1 0 1
0 1 0 +A 0 0 0 1
0 1 1 +2A 0 0 1 0
1 0 0 -2A 1 0 1 0
1 0 1 -A 1 0 0 1
1 1 0 -A 1 1 0 1
1 1 1 0 1 1 1 0
As is well known, the size of the multiplier is determined by both the Booth encoding unit and the partial product compression unit. The number of transistors required by the Booth coding unit has great influence on the total area of the multiplier, the number of partial products is reduced by half, the occupied area and the generated delay are very small, and the coding method enables the circuit structure to be regular and simple and is easier to realize by VLSI.
It should be noted that the truth table is exemplified by multiplying two N-bit numbers, wherein the multiplicand and the multiplier are respectively represented by Ai and Bi.
In the above table, Neg, Z, X1, and X2 are encoded by 3 adjacent bits of the multiplier B, and Func is a partial product. Decoding the obtained result to obtain partial product PPij
In addition, the partial product compression module has a function of performing an accumulation operation of all partial products, wherein a widely used compression method is a Wallace Tree (Wallace Tree) structure, the structure groups the partial products in rows, each row corresponds to a group of adders, the addition operation of each row is performed simultaneously, a carry generated by a previous row is transmitted to a next row, and a new partial product is generated. The new partial product is simplified in the same way until the last two rows of partial products remain, and then the products are obtained by adding with a fast adder. The Wallace tree structure has the advantage of high operation speed and is suitable for multiplication operation with more than 16 bits.
In the compressor array structure, the types of the compression devices which are adopted are more 3:2 compressors, 4:2 compressors and high-order compressors. Of these, the 3:2 compressor configuration is the most basic, while the 4:2 compressor configuration is most widely used because it is very structured and the compression ratio of 2:1 is very good. Although the compression ratio of the high-order compressor is high, the structure is too complex, and the connecting lines are not regular.
The 3:2 compressor works by adding 3 partial products of the same bit by an adder to generate 2 partial products with different weights, and then outputting the partial products to the next compressor, wherein the compression ratio is 3: 2. The 3:2 compressor in the invention adopts a 1-bit full adder structure, and the logic output of the 1-bit full adder is as follows:
Figure BDA0001135553590000081
Co=AB+Ci(A+B)
wherein S is a target partial product, A is a first partial product, B is a second partial product, Ci is a third partial product, C0Is the carry value of the partial product.
As shown in fig. 5, the present embodiment provides a circuit structure of a 1-bit full adder, where the one-bit full adder includes: a first transistor, a second transistor, a third transistor, a fourth transistor, a fifth transistor, a sixth transistor, a seventh transistor, an eighth transistor, a ninth transistor, and a tenth transistor,
the source electrode of the first transistor, the source electrode of the second transistor and the source electrode of the seventh transistor are connected and are connected with Vcc;
the drain electrode of the first transistor, the drain electrode of the second transistor and the source electrode of the third transistor are connected;
the drain electrode of the seventh transistor is connected with the source electrode of the eighth transistor;
the drain electrode of the third transistor, the drain electrode of the eighth transistor, the drain electrode of the fourth transistor and the drain electrode of the ninth transistor are connected, and the common connection end is used for outputting an inverted value of the carry bit;
the source electrode of the fourth transistor, the drain electrode of the fifth transistor and the drain electrode of the sixth transistor are connected;
a source of the ninth transistor is connected to a drain of the tenth transistor;
the source electrode of the fifth transistor, the source electrode of the sixth transistor and the source electrode of the tenth transistor are all grounded;
the grid electrode of the third transistor is connected with the grid electrode of the fourth transistor and is used as the input end of the third partial product;
the grid electrode of the first transistor, the grid electrode of the seventh transistor, the grid electrode of the fifth transistor and the grid electrode of the ninth transistor are used as input ends of the first partial product;
and the grid electrode of the second transistor, the grid electrode of the eighth transistor, the grid electrode of the sixth transistor and the grid electrode of the tenth transistor are used as input ends of the second partial product.
As can be seen from the figure, the one-bit full adder provided by the scheme adopts a mirror circuit structure to obtain the inverted value of the carry
Figure BDA0001135553590000091
As soon as the input arrives, C can be obtained immediatelyoThe delay is the same as that of the smallest sized inverter. In addition, the pull-up/pull-down network has only two transistors each, which can provide a good Ion/IoffAnd (4) the ratio.
On the basis of the above, the present embodiment further provides a specific structure of a summing circuit, as shown in fig. 6, the output circuit includes: a fifth exclusive or gate and a sixth exclusive or gate.
Specifically, the first partial product and the second partial product are used as input terminals of the fifth exclusive or gate, the output terminal of the fifth exclusive or gate and the third partial product are used as input terminals of the sixth exclusive or gate, and the output terminal of the sixth exclusive or gate is used for outputting the target partial product.
The inventor considers that:
the 3:2 compressor, when used, is typically fitted with other types of compressors, such as a 4:2 compressor. The 5 inputs of the 4:2 compressor comprise 4 partial product signals and 1 carry signal of the previous stage to the current stage, and the 3 output signals comprise 2 output signals of the current stage and 1 carry output signal output to the next stage compressor structure. Thus, a 4:2 compressor is also referred to as a 5:3 compressor. The 4:2 compressor has the advantages of good compression ratio and regular circuit structure. The 4:2 compressor employed in this embodiment is a structure in which a selector and an exclusive or gate are combined, as shown in fig. 7.
As can be seen from fig. 7, the compressor with this structure has substantially uniform delays in three paths, and is well balanced in delay and area, and is a very ideal 4:2 compressor structure.
As shown in fig. 2, the second-order booth encoding algorithm generates 8 partial products, the number of numbers to be added in each column varies from 2 to 8 when accumulation is performed, and if 4:2 compressors are all used, resource waste occurs, so a mixed structure array combining 3:2 compressors and 4:2 compressors is used in the present invention. When the number of added numbers is 8, the compressor configuration is as shown in fig. 8, and when the number of added numbers is 6 to 7, the compressor configuration is as shown in fig. 9. When the number of added numbers is smaller, the compressor unit needs only one 4:2 compressor or only one 3:2 compressor.
On the basis of the above embodiments, the carry skip adder employed in this embodiment includes 4 8-bit modules, where the 8-bit module includes two 4-bit CSA modules, and the CSA module includes 4 one-bit full adders and a 2-input data selector. Preferably, four of the one-bit full adders are cascaded, and if the input of the previous full adder is not inverted, the input of the current full adder is inverted.
Specifically, the carry skip adder may be divided into a number of small blocks, each block being a 4-bit modified adder structure, and the blocks being connected to form a 32-bit adder. The adder adopts modules with the same size, so that the complexity, the non-modularity and the high energy consumption caused by different sizes in low voltage are avoided. Connecting 4 1-bit full adders in series can obtain a 4-bit CSA module.
Each one-bit full adder in the 4-bit CSA module generates a transmission carry, and the four transmission carries are subjected to AND operation, the obtained output is used as the control end of the 2-input data selector, the carry input of the lowest-bit full adder and the carry output of the highest-bit full adder in the 4-bit CSA module are used as the input end of the 2-input data selector, and the output end of the 2-input data selector is used as the carry output of the 4-bit CSA module.
It should be noted that, the 1-bit full adder used in this embodiment adds the carry-propagate output P, and its logic output is:
Figure BDA0001135553590000101
the logic circuit in which the generation of the outputs P and S is as described in fig. 10.
Specifically, the exclusive or gate uses pass-pipe logic to generate P and S. It is particularly important to note that all xor gate outputs are buffered since it is important to maintain a high switching current ratio in the subthreshold region. Because if there is no buffering, V _ OH of the output S will be less than 0.9V _ DD and the driving current will be small, resulting in a slow circuit. The circuit schematic of the xor gate is shown in fig. 11.
Further, the 4-bit CSA module is formed by cascading 4 one-bit full adders FA. To avoid outputting the carry of the previous stage
Figure BDA0001135553590000111
The speed of the whole circuit is reduced by performing the inversion again, so that the input is
Figure BDA0001135553590000112
The input A, B of the partial level FA is also inverted, resulting in a constant P value, but only an inversion of the S value.
In this module, P*=P3P2P1P0If this value is 1, the carry is skipped. To avoid the low switching current ratio associated with high fan-in at sub-threshold voltages, a combination of nand gates and nor gates is used instead of directly using four-input and gates, as shown in fig. 12.
The carry output of each 4-bit CSA module is obtained from a 2-input MUX, as shown in FIG. 13. The inverter serves as an output buffer, and the function of the inverter is dual: first, let the output signal
Figure BDA0001135553590000113
More strongly, secondly because sometimes C0 can be given directly to
Figure BDA0001135553590000114
Assigned, without any intermediate buffering, C0 would go directly through all 8 4-bit blocks, an inverter would avoid this. And no buffering is provided, the driving current is small, the intensity of the final output signal can be greatly reduced, and the overall speed of the adder can also be reduced. And because the carry output is inverted again, part of the input of the 4-bit CSA module of the next stage needs to be inverted again. Such two 4-bit CSA modules constitute one 8-bit module. And the final 32-bit CSA is composed of 4 8-bit modules connected in series.
In particular, FIG. 14 shows a circuit schematic of a first 4-bit CSA module in an 8-bit module. Fig. 15 is a circuit schematic of an 8-bit module. Similarly, the final 32-bit CSA circuit schematic is shown in fig. 16.
In summary, the low power consumption parallel multiplier provided by the present invention includes: the partial product generation module comprises a Booth coding circuit and a decoding circuit, wherein the Booth coding circuit codes adjacent bit values of a first multiplier into target parameters, the decoding circuit decodes bit values of a second multiplier and the target parameters into partial products, and the partial product generation module reduces the number of the partial products by half, greatly saves the area of a multiplier circuit, improves the operation speed of the multiplier circuit and is easy for realizing VLSI. The partial product compression module comprises a one-bit full adder and an output circuit, wherein the one-bit full adder outputs an inverted value of a carry according to the partial product, the output circuit adds the partial products to generate two target partial products with different weights, and outputs the generated target partial products to a lower-level compression module, and the speed of compressing the partial products is greatly improved by the partial product compression module. The carry skip adder comprises a plurality of CSA modules, and each CSA module comprises a plurality of one-bit full adders and a 2-input data selector and is used for obtaining a target product. The partial product compression module adopts a one-bit full adder, so that the circuit structure is simple and low power consumption is realized while the requirement of quick calculation is met.
Compared with the traditional multiplier with the structure, the multiplier simplifies the complexity of circuit implementation, reduces the difficulty of layout implementation, effectively improves the running speed of the multiplier and greatly reduces the power consumption.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (8)

1. A low power consumption parallel multiplier, comprising: a partial product generation module, a partial product compression module and a carry skip adder,
the partial product generation module comprises a Booth coding circuit and a decoding circuit, wherein the Booth coding circuit codes adjacent bit values of a first multiplier into a target parameter, and the decoding circuit decodes bit values of a second multiplier and the target parameter into a partial product;
the partial product compression module comprises a one-bit full adder and a summation circuit, wherein the one-bit full adder outputs an inverted value of a carry according to the partial product, the output circuit adds the partial products to generate two target partial products with different weights, and the generated target partial products are output to a lower-level compression module;
said carry-skip adder comprising a plurality of CSA modules, said CSA modules comprising a plurality of said one-bit full adders for obtaining a target product; wherein said carry skip adder comprises 4 8-bit modules, said 8-bit modules comprising two 4-bit CSA modules, said CSA modules comprising 4 of said one-bit full adders and a 2-input data selector.
2. The low power consumption parallel multiplier of claim 1, wherein the Booth encoding circuit comprises: a first exclusive-OR gate, a first exclusive-OR gate and a second exclusive-OR gate,
a first bit value of the first multiplier and a second bit value of the first multiplier are respectively used as input ends of the first exclusive-OR gate, and an output end of the first exclusive-OR gate is used for outputting a first target parameter;
the second bit value of the first multiplier and the third bit value of the first multiplier are respectively used as the input end of the first exclusive-or gate, and the output end of the first exclusive-or gate is used for outputting a second target parameter;
a first bit value of the first multiplier and a second bit value of the first multiplier are respectively used as input ends of the second exclusive-or gate, and an output end of the second exclusive-or gate is used for outputting a third target parameter;
a third bit value of the first multiplier is taken as a fourth target parameter.
3. The low power consumption parallel multiplier of claim 2, wherein said decoding circuit comprises: a third exclusive-OR gate, a fourth exclusive-OR gate, a first NAND gate, a second NAND gate and a third NAND gate,
the first bit value of the second multiplier and the fourth target parameter are used as input ends of the first exclusive-or gate;
a second bit value of the second multiplier and the fourth target parameter are used as input ends of the second exclusive-or gate;
the output end of the third exclusive-or gate, the second target parameter and the first target parameter are used as the input end of the first nand gate;
the output end of the fourth exclusive-or gate and the third target parameter are used as the input end of the second nand gate;
and the output end of the first NAND gate and the output end of the second NAND gate are used as the input ends of the third NAND gate, and the output end of the third NAND gate is used for outputting the partial product.
4. The low power consumption parallel multiplier of claim 1, wherein the logic output of said one-bit full adder is:
Figure FDA0003119659330000021
Co=AB+Ci(A+B)
wherein S is a target partial product, A is a first partial product, B is a second partial product, Ci is a third partial product, C0Is the carry value of the partial product.
5. The low power consumption parallel multiplier of claim 4, wherein said one-bit full adder comprises: a first transistor, a second transistor, a third transistor, a fourth transistor, a fifth transistor, a sixth transistor, a seventh transistor, an eighth transistor, a ninth transistor, and a tenth transistor,
the source electrode of the first transistor, the source electrode of the second transistor and the source electrode of the seventh transistor are connected and are connected with Vcc;
the drain electrode of the first transistor, the drain electrode of the second transistor and the source electrode of the third transistor are connected;
the drain electrode of the seventh transistor is connected with the source electrode of the eighth transistor;
the drain electrode of the third transistor, the drain electrode of the eighth transistor, the drain electrode of the fourth transistor and the drain electrode of the ninth transistor are connected, and the common connection end is used for outputting an inverted value of the carry bit;
the source electrode of the fourth transistor, the drain electrode of the fifth transistor and the drain electrode of the sixth transistor are connected;
a source of the ninth transistor is connected to a drain of the tenth transistor;
the source electrode of the fifth transistor, the source electrode of the sixth transistor and the source electrode of the tenth transistor are all grounded;
the grid electrode of the third transistor is connected with the grid electrode of the fourth transistor and is used as the input end of the third partial product;
the grid electrode of the first transistor, the grid electrode of the seventh transistor, the grid electrode of the fifth transistor and the grid electrode of the ninth transistor are used as input ends of the first partial product;
and the grid electrode of the second transistor, the grid electrode of the eighth transistor, the grid electrode of the sixth transistor and the grid electrode of the tenth transistor are used as input ends of the second partial product.
6. The low power consumption parallel multiplier of claim 4, wherein said summing circuit comprises: a fifth xor gate and a sixth xor gate,
the first partial product and the second partial product are used as input terminals of the fifth exclusive-or gate, the output terminal of the fifth exclusive-or gate and the third partial product are used as input terminals of the sixth exclusive-or gate, and the output terminal of the sixth exclusive-or gate is used for outputting the target partial product.
7. The low power consumption parallel multiplier of claim 6, wherein four said one-bit full adders are cascaded and the input of the full adder of the previous stage is inverted if the input of the full adder of the previous stage is not inverted.
8. The low power consumption parallel multiplier of claim 6, wherein each one-bit full adder in the 4-bit CSA module generates a carry bit, and the four carry bits are summed to obtain an output as the control terminal of the 2-input data selector, the carry input of the least significant one-bit full adder and the carry output of the most significant one-bit full adder in the 4-bit CSA module are used as the input terminals of the 2-input data selector, and the output terminal of the 2-input data selector is used as the carry output of the 4-bit CSA module.
CN201610920203.1A 2016-10-21 2016-10-21 Low-power-consumption parallel multiplier Active CN107977191B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610920203.1A CN107977191B (en) 2016-10-21 2016-10-21 Low-power-consumption parallel multiplier

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610920203.1A CN107977191B (en) 2016-10-21 2016-10-21 Low-power-consumption parallel multiplier

Publications (2)

Publication Number Publication Date
CN107977191A CN107977191A (en) 2018-05-01
CN107977191B true CN107977191B (en) 2021-07-27

Family

ID=62003870

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610920203.1A Active CN107977191B (en) 2016-10-21 2016-10-21 Low-power-consumption parallel multiplier

Country Status (1)

Country Link
CN (1) CN107977191B (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109144473B (en) * 2018-07-19 2023-07-21 南京航空航天大学 Decimal 3:2 compressor structure based on redundant ODDS number
CN110196709B (en) * 2019-06-04 2021-06-08 浙江大学 Nonvolatile 8-bit Booth multiplier based on RRAM
EP4024200A4 (en) * 2019-09-20 2022-08-24 Huawei Technologies Co., Ltd. Multiplier
CN110413254B (en) * 2019-09-24 2020-01-10 上海寒武纪信息科技有限公司 Data processor, method, chip and electronic equipment
CN110554854B (en) * 2019-09-24 2024-05-03 上海寒武纪信息科技有限公司 Data processor, method, chip and electronic equipment
CN110673823B (en) * 2019-09-30 2021-11-30 上海寒武纪信息科技有限公司 Multiplier, data processing method and chip
CN111045643B (en) * 2019-11-19 2023-04-14 宁波大学 Multiplication unit circuit using threshold voltage characteristic and multiplier
CN113946312A (en) 2019-11-21 2022-01-18 华为技术有限公司 Multiplier and operator circuit
CN110955403B (en) * 2019-11-29 2023-04-07 电子科技大学 Approximate base-8 Booth encoder and approximate binary multiplier of mixed Booth encoding
CN111158635B (en) * 2019-12-27 2021-11-19 浙江大学 FeFET-based nonvolatile low-power-consumption multiplier and operation method thereof
CN111897513B (en) * 2020-07-29 2023-07-21 上海芷锐电子科技有限公司 Multiplier based on reverse polarity technology and code generation method thereof
CN112214199B (en) * 2020-09-11 2022-06-21 北京草木芯科技有限公司 256 bit multiplier
WO2022178861A1 (en) * 2021-02-26 2022-09-01 清华大学 Parallel multiplier and working method thereof
CN112860219B (en) * 2021-02-26 2022-09-09 清华大学 Parallel multiplier and working method thereof
CN113721986B (en) * 2021-07-23 2024-02-09 浪潮电子信息产业股份有限公司 Data compression method and device, electronic equipment and storage medium
CN113655991B (en) * 2021-07-27 2024-04-30 南京航空航天大学 Approximate 2-bit multiplier and large-scale multiplier
CN115956231A (en) * 2021-08-10 2023-04-11 华为技术有限公司 Multiplier unit

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS61246837A (en) * 1985-04-24 1986-11-04 Toshiba Corp Parallel multiplier
US4972362A (en) * 1988-06-17 1990-11-20 Bipolar Integrated Technology, Inc. Method and apparatus for implementing binary multiplication using booth type multiplication
CN1056939A (en) * 1990-05-31 1991-12-11 三星电子株式会社 Use the parallel multiplier of skip array and modified wallace tree
US5841812A (en) * 1995-09-06 1998-11-24 Analog Devices, Inc. NMOS pass transistor digital signal processor for a PRML system
US6275841B1 (en) * 1997-12-11 2001-08-14 Intrinsity, Inc. 1-of-4 multiplier
CN101482808A (en) * 2009-01-23 2009-07-15 清华大学 7:2 compressor used for large number multiplier
KR100935858B1 (en) * 2007-12-05 2010-01-07 한국전자통신연구원 Reconfigurable Arithmetic Operator and High Efficiency Processor having the Same
CN101685385A (en) * 2008-09-28 2010-03-31 北京大学深圳研究生院 Complex multiplier
CN102270110A (en) * 2011-06-30 2011-12-07 西安电子科技大学 Improved 16Booth-based coder
CN102722352A (en) * 2012-05-21 2012-10-10 华南理工大学 Booth multiplier
CN103412737A (en) * 2013-06-27 2013-11-27 清华大学 Base 4-Booth coding method, door circuit and assembly line large number multiplying unit
CN105739945A (en) * 2016-01-22 2016-07-06 南京航空航天大学 Modified Booth coding multiplier based on modified partial product array

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6065032A (en) * 1998-02-19 2000-05-16 Lucent Technologies Inc. Low power multiplier for CPU and DSP
CA2294554A1 (en) * 1999-12-30 2001-06-30 Mosaid Technologies Incorporated Method and circuit for multiplication using booth encoding and iterative addition techniques
KR100985110B1 (en) * 2004-01-28 2010-10-05 삼성전자주식회사 Simple 4:2 carry-save-adder and 4:2 carry save add method
CN101923459A (en) * 2009-06-17 2010-12-22 复旦大学 Reconfigurable multiplication/addition arithmetic unit for digital signal processing
CN102662624B (en) * 2012-04-13 2015-12-16 钜泉光电科技(上海)股份有限公司 Multiplier
CN103092560B (en) * 2013-01-18 2016-03-23 中国科学院自动化研究所 A kind of low-consumption multiplier based on Bypass technology

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS61246837A (en) * 1985-04-24 1986-11-04 Toshiba Corp Parallel multiplier
US4972362A (en) * 1988-06-17 1990-11-20 Bipolar Integrated Technology, Inc. Method and apparatus for implementing binary multiplication using booth type multiplication
CN1056939A (en) * 1990-05-31 1991-12-11 三星电子株式会社 Use the parallel multiplier of skip array and modified wallace tree
US5841812A (en) * 1995-09-06 1998-11-24 Analog Devices, Inc. NMOS pass transistor digital signal processor for a PRML system
US6275841B1 (en) * 1997-12-11 2001-08-14 Intrinsity, Inc. 1-of-4 multiplier
KR100935858B1 (en) * 2007-12-05 2010-01-07 한국전자통신연구원 Reconfigurable Arithmetic Operator and High Efficiency Processor having the Same
CN101685385A (en) * 2008-09-28 2010-03-31 北京大学深圳研究生院 Complex multiplier
CN101482808A (en) * 2009-01-23 2009-07-15 清华大学 7:2 compressor used for large number multiplier
CN102270110A (en) * 2011-06-30 2011-12-07 西安电子科技大学 Improved 16Booth-based coder
CN102722352A (en) * 2012-05-21 2012-10-10 华南理工大学 Booth multiplier
CN103412737A (en) * 2013-06-27 2013-11-27 清华大学 Base 4-Booth coding method, door circuit and assembly line large number multiplying unit
CN105739945A (en) * 2016-01-22 2016-07-06 南京航空航天大学 Modified Booth coding multiplier based on modified partial product array

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
基于修正BOOTH编码的32_32位乘法器;崔晓平;《电子测量技术》;20070131;第30卷(第1期);第82-85页 *
基于部分积优化的高速并行乘法器实现;李康;《微电子学与计算机》;20110131;第28卷(第1期);第61-68页 *
高性能并行乘法器关键技术研究;林钰凯;《中国优秀硕士学位论文全文数据库(电子期刊) 信息科技辑》;20101015;第2010年卷(第10期);第I137-4页 *

Also Published As

Publication number Publication date
CN107977191A (en) 2018-05-01

Similar Documents

Publication Publication Date Title
CN107977191B (en) Low-power-consumption parallel multiplier
Rao et al. A high speed and area efficient Booth recoded Wallace tree multiplier for fast arithmetic circuits
KR100449963B1 (en) An adder circuit and a multiplier circuit including the adder circuit
Karthick et al. Design and analysis of low power compressors
Tripathy et al. Low power multiplier architectures using Vedic mathematics in 45nm technology for high speed computing
Priya et al. Enhanced area efficient architecture for 128 bit Modified CSLA
Ykuntam et al. Design of 32-bit carry select adder with reduced area
Kumar et al. Complex multiplier: implementation using efficient algorithms for signal processing application
US20060221724A1 (en) Data converter and a delay threshold comparator
Al-Khaleel et al. Fast binary/decimal adder/subtractor with a novel correction-free BCD addition
Khan et al. Performance analysis of reduced complexity Wallace multiplier using energy efficient CMOS full adder
Modugu et al. A fast low-power modulo 2 n+ 1 multiplier design
CN113227963B (en) Multiplier and operator circuit
Kalaiselvi et al. Area efficient high speed and low power MAC unit
Esmaeildoust et al. High speed reverse converter for new five-moduli set {2n, 22n+ 1-1, 2n/2-1, 2n/2+ 1, 2n+ 1}
KR100953342B1 (en) Encoder of multiplier
Pathak et al. Low power Dadda multiplier using approximate almost full adder and Majority logic based adder compressors
CN111106825A (en) Data compressor logic circuit
Dayall et al. Multiplexer-Based Design of Adders for Low Power VLSI Applications
CN220305789U (en) Low-power-consumption full adder based on basic gate circuit
Meshram et al. Designed Implementation of Modified Area Efficient Enhanced Square Root Carry Select Adder
Kharwar et al. Design & Comparison of 32-bit CSLA with Hybrid Logic
Gowdar et al. Design of Energy Efficient Approximate Multipliers for Image Processing Applications
Ahammed et al. Fast performance of parallel adders using VLSI
Ghasemizadeh et al. A 1.6 GHz 16× 16-bit low-latency pipelined booth multiplier

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant