CN110058840B - Low-power-consumption multiplier based on 4-Booth coding - Google Patents

Low-power-consumption multiplier based on 4-Booth coding Download PDF

Info

Publication number
CN110058840B
CN110058840B CN201910238829.8A CN201910238829A CN110058840B CN 110058840 B CN110058840 B CN 110058840B CN 201910238829 A CN201910238829 A CN 201910238829A CN 110058840 B CN110058840 B CN 110058840B
Authority
CN
China
Prior art keywords
power gating
input
gating switch
stage
power
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910238829.8A
Other languages
Chinese (zh)
Other versions
CN110058840A (en
Inventor
余宁梅
马文恒
高钰迪
黄自力
张文东
刘和娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Technology
Original Assignee
Xian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Technology filed Critical Xian University of Technology
Priority to CN201910238829.8A priority Critical patent/CN110058840B/en
Publication of CN110058840A publication Critical patent/CN110058840A/en
Application granted granted Critical
Publication of CN110058840B publication Critical patent/CN110058840B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only

Abstract

The invention discloses a low-Power-consumption multiplier based on 4-Booth coding, which comprises a coder group formed by connecting at least two coders in parallel, wherein the input end of the coder group is connected with a bit selector, the input end of the bit selector is respectively connected with a multiplier input port and a multiplicand input port, first Power gating switches are respectively connected between the input end of the bit selector and the multiplier input port as well as between the input end of the bit selector and the multiplicand input port, the output end of the coder group is connected with the input end of a compressor through a second Power gating switch, and the output end of the compressor is connected with the input end of a carry-ahead adder through a third Power gating switch. The 4-Booth coding-based low-power-consumption multiplier disclosed by the invention can reduce power consumption while ensuring correct calculation results.

Description

Low-power-consumption multiplier based on 4-Booth coding
Technical Field
The invention belongs to the technical field of low-power-consumption multipliers, and particularly relates to a low-power-consumption multiplier based on 4-Booth coding.
Background
In various chips such as a high-speed Digital Signal Processing (DSP), a Microprocessor (MCU), and a RISC, a multiplier is an indispensable unit, and the multiplier is often located in a critical path, so the speed of a system often depends on the speed of the multiplier. To achieve normal operation of the pipeline, the multiplier in the execution unit needs to complete within one clock cycle. The operation efficiency and the stability of the whole processor can be influenced and improved by optimizing the design of the multiplier. Therefore, high-speed, portable and low-power multiplier design is an important and necessary loop in system design in the field of application-specific integrated circuits, digital signal processing and digital filtering.
One implementation scheme of a high-speed, portable and low-power-consumption multiplier is to increase the amount of parallel computation and reduce the amount of subsequent computation, for N-bit multiplication, a conventional algorithm generates N-bit partial products, and a final result can be obtained after accumulation, while since the Booth encoding algorithm comes out, the performance of the multiplier is improved to a great extent. The basic principle is to simplify the operation by reducing the number of partial products, and the more the number of bits of a multiplier and a multiplicand participating in the multiplication operation is, the more outstanding the operation simplifying capability of the Booth encoding algorithm is. Typical Booth encoding algorithms are: a base 2-Booth coding algorithm, a base 4-Booth coding algorithm and a base 8-Booth coding algorithm. The base 2-Booth coding algorithm coding table is simple, the algorithm is easy to realize, but the operation cannot be simplified; the 4-Booth encoding algorithm can simplify 1/2 of the calculated amount, and the encoding circuit is easy to realize; the basic 8-Booth coding algorithm can simplify the calculated amount of 3/4, but the coding table has the operation of multiplying a multiplier by (-3), and the method cannot be realized by a simple circuit for shifting and taking a complement code. When the processor performs multiplication, the multiplier and the multiplicand are both 32 bits, and since 2^32=4 294 967 296, the 64-bit number obtained by multiplying two 32-bit numbers is a more huge number. In the design, such huge numbers are hardly used, that is, the high bits of the multiplier B and the multiplicand a participating in the operation are very likely to have many "zero" occupation cases. For the situation, according to the conventional coding, compression and summation, not only a large amount of time is wasted, but also a large amount of hardware resources are occupied, and the power consumption of the whole system is improved.
In addition, when the multiplier circuit carries out multiplication, the coding unit, the compression unit and the carry look ahead adder unit are in series, when the front-stage circuit does not finish the operation, the rear-stage circuit is in a waiting state all the time, although the circuit is turned on, the rear-stage circuit does not participate in the operation, and the power consumption of the system is increased. When partial products enter the compression unit to participate in the summation operation process, because the generation time delay of the carry signal is different from that of the partial product signal, when the partial products enter the next-stage Wallace tree type compression circuit, the situation of competition hazard can be generated, and an error calculation result can be generated.
Disclosure of Invention
The invention aims to provide a low-power-consumption multiplier based on 4-Booth coding, which can reduce power consumption while ensuring the correctness of a calculation result.
The first technical scheme adopted by the invention is as follows: A4-Booth coding-based low-Power-consumption multiplier comprises a coder group formed by at least two coders in parallel, wherein the input end of the coder group is connected with a bit selector, the input end of the bit selector is respectively connected with a multiplier input port and a multiplicand input port, first Power gating switches are respectively connected between the input end of the bit selector and the multiplier input port as well as between the input end of the multiplier input port and the multiplicand input port, the first Power gating switches are used for switching on or off a circuit according to whether the input multiplier or the multiplicand is zero or not, the coder group controls partial product of output of a complement signal, the output end of the coder group is connected with the input end of a compressor through a second Power gating switch, the second Power gating switches are used for switching on the circuit according to the maximum delay of partial product generated by the coder group, the output end of the compressor is connected with the input end of an advanced carry adder through a third Power gating switch, the third Power gating switch is used for receiving pseudo and carry signals finally output by the compressor to switch the circuit, and the output end of the advanced carry adder outputs of the multiplicand and the product of the multiplicand.
The present invention is also characterized in that,
the encoder is provided with three data input ends, the three data input ends of each encoder are connected with the output end of the bit selector, and the output end of each encoder is connected with the second Power switching switch.
The encoder logic circuit comprises a three-input AND gate, three input ends of the three-input AND gate are input ends of an encoder, an output end of the three-input AND gate is connected with an input end of a carry-reserving adder, three input ends of the three-input AND gate are connected with an input end of the carry-reserving adder through a register I, the register I outputs Ei corresponding to a bit selector output signal, a pseudo sum signal output end of the carry-reserving adder is connected with an input end of a register II, an output end of the register II is connected with an input end of a shift register, an output part product of the shift register is formed, a control complement participation circuit is connected between the output end of the three-input AND gate and a shift counter, and the control complement participation circuit is used for controlling formation of the complement participation part product.
The control complement participation circuit comprises a register III, a phase inverter is connected between the register III and the output end of the three-input AND gate, the input end of the phase inverter is connected with the output end of the three-input AND gate, the output end of the register III is connected with the input end of the shift register through an alternative selector, the alternative selector is connected with a fourth power gating switch, the fourth power gating switch is further connected with the alternative selector through a complement circuit, the complement circuit generates a partial product used for complement calculation, and the fourth power gating switch is used for controlling the on and the gate of the complement circuit.
The compressor adopts a Wallace tree compressor formed by a plurality of carry-retaining adders, each stage of compression circuit in the Wallace tree compressor is connected with a fifth power gating switch, the fifth power gating switch is used for controlling the on-off of the stage of compression circuit, the fifth power gating switch of each stage is connected with the fifth power gating switch of the next stage in series, the fifth power gating switch of each stage controls the on-off of the fifth power gating switch of the next stage through the maximum computation time delay of the stage of compression circuit, each carry-retaining adder in the Wallace tree compressor is connected with a sixth power gating switch, and the sixth power gating switch is used for controlling the on-off of the carry-retaining adder.
The invention has the beneficial effects that:
(1) The invention relates to a 4-Booth coding-based low-Power-consumption multiplier, which is characterized in that a first Power gating switch is used for controlling the operation of the integral multiplier before the operation of the multiplier, a second Power gating switch is arranged before a compressor, and a third Power gating switch is arranged before a carry-ahead adder for accurately turning on a circuit, so that unnecessary standby Power consumption is reduced;
(2) According to the low-power-consumption multiplier based on the 4-Booth coding, a plurality of coders are operated in parallel, a three-input AND gate is arranged in each coder, and according to the level output by the three-input AND gate, the operation of shifting or complementing the residual product of the multiplier part is carried out, so that the partial product can be rapidly solved, and the power consumption is reduced;
(3) The invention relates to a low-power-consumption multiplier based on 4-Booth coding, wherein a compressor adopts a Wallace tree type compression structure in which carry-retaining adders are arranged and combined, the turn-on and turn-off of the carry-retaining adders are controlled by a logic unit according to the sum of maximum time delay in the calculation and judgment processes, a power gating control switch is added to each carry-retaining adder to judge whether three signals input into the carry-retaining adders are zero or not, and if the three inputs are zero, a CSA circuit is turned off to directly output zero, so that the purposes of accurate calculation and power consumption reduction are achieved.
Drawings
FIG. 1 is a schematic diagram of a low-power-consumption multiplier based on 4-Booth coding according to the present invention;
FIG. 2 is a circuit diagram of an encoder in a 4-Booth encoding-based low-power-consumption multiplier according to the invention;
FIG. 3 is a circuit diagram of a compressor in a 4-Booth coding-based low-power-consumption multiplier according to the invention;
FIG. 4 is a base 4-Booth encoding table of an encoder in a 4-Booth encoding-based low-power-consumption multiplier according to the present invention;
FIG. 5 is a design timing diagram of a low power consumption multiplier based on 4-Booth coding according to the present invention;
fig. 6 is a partial product arrangement diagram of a low-power-consumption multiplier based on 4-Booth coding according to the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
The invention relates to a low-Power-consumption multiplier based on 4-Booth coding, which takes processing 32-bit multiplication as an example to explain a concrete structure, wherein PG1, PG2, PG3, PG4, PG5 and PG6 respectively represent a first Power gating switch, a second Power gating switch, a third Power gating switch, a fourth Power gating switch, a fifth Power gating switch and a sixth Power gating switch in the attached drawing, and CSA represents a Carry-preserving Adder (Carry Save Adder).
As shown in fig. 1, the low Power consumption multiplier based on 4-Booth coding of the present invention includes an encoder group composed of 17 encoders connected in parallel, an input end of the encoder group is connected with a bit selector, an input end of the bit selector is connected with a multiplier input port and a multiplicand input port respectively, a first Power gating switch is connected between the input end of the bit selector and the multiplier input port and the multiplicand input port respectively, the first Power gating switch is used for turning on or off a circuit according to whether an input multiplier or multiplicand is zero, the encoder group controls partial product of output of a complement signal, an output end of the encoder group is connected with an input end of a compressor through a second Power gating switch, the second Power gating switch generates a maximum delay of partial product according to the encoder group, an output end of the compressor is connected with an input end of a carry look-ahead adder through a third Power gating switch, the third Power gating switch is used for receiving the adder and a pseudo carry-ahead signal finally output by the compressor and turning on the circuit, and an output end of carry-ahead is used for outputting the product of the multiplicand and the multiplier.
As shown in fig. 2, the encoders have three data inputs, each of the three data inputs of the encoders is connected to the output of the bit selector, and each of the outputs of the encoders is connected to the second Power gating switch.
The encoder logic circuit comprises a three-input AND gate, three input ends of the three-input AND gate are input ends of an encoder, an output end of the three-input AND gate is connected with an input end of a carry-remaining adder, three input ends of the three-input AND gate are connected with an input end of the carry-remaining adder through a register I, the register I outputs Ei corresponding to a bit selector output signal, a pseudo sum signal output end of the carry-remaining adder is connected with an input end of a register II, an output end of the register II is connected with an input end of a shift register, a partial product is output from the shift register, a control complement participation circuit is connected between the output end of the three-input AND gate and a shift counter, and the control complement participation circuit is used for controlling formation of the complement participation partial product.
The control complement participation circuit comprises a register III, a phase inverter is connected between the register III and the output end of the three-input AND gate, the input end of the phase inverter is connected with the output end of the three-input AND gate, the output end of the register III is connected with the input end of the shift register through an alternative selector, the alternative selector is connected with a fourth power gating switch, the fourth power gating switch is further connected with the alternative selector through a complement circuit, the complement circuit generates a partial product for complement calculation, and the fourth power gating switch is used for controlling the on and the gate of the complement circuit.
As shown in fig. 3, the compressor is a wallace tree compressor formed by a plurality of carry-remaining adders, each stage of compression circuit in the wallace tree compressor is connected with a fifth power gating switch, the fifth power gating switch is used for controlling the on and off of the compression circuit of the stage, the fifth power gating switch of each stage is connected with the fifth power gating switch of the next stage in series, the fifth power gating switch of each stage controls the on and off of the fifth power gating switch of the next stage through the maximum computation time delay of the compression circuit of the stage, each carry-remaining adder in the wallace tree compressor is connected with a sixth power gating switch, and the sixth power gating switch is used for controlling the on and off of the carry-remaining adder.
All power gating switches are connected to a power supply, thereby driving the power gating switches.
The invention discloses a 4-Booth coding-based low-power-consumption multiplier principle description, which comprises the following steps: the multiplier is a fast operation unit applied in the integral instruction processing of an execution module, the design adopts a base 4-Booth coding algorithm to design a multiplier with low power consumption, CSA in the multiplier adopts CSA with 3-2 models, the integral logic of the multiplier is turned off before the multiplication operation is carried out, a first power gating switch is firstly used for judging whether a multiplicand and a multiplier are not zero or not before the multiplier carries out the operation, if at least one of the multiplicands and the multiplier is zero, zero is directly output and transmitted to a write-back module, if the multiplier B and the multiplicand A are not zero, a logic unit circuit of an encoder part is turned on, and then the turning on of the CSA1, the CSA2, the CSA3, the CSA4, the CSA5 and the CSA6 is controlled according to the sum of the maximum delay time of a generated partial product and the judgment processing time of a second power gating switch. Because the CSAs of the first stage of compression are parallel, the on and off processes of the CSAs are synchronous, and only the CSAs need to be connected to the same power gating switch. After the CSA1, the CSA2, the CSA3, the CSA4, the CSA5 and the CSA6 are simultaneously turned on, the second-stage compression circuits CSA7, the CSA8, the CSA9 and the CSA10 are turned on after the sum of the maximum time delay of the CSA and the judgment time of the first fifth power gating switch is waited, and the first-stage compression circuit and the coding circuit are kept turned on. Similarly, after waiting for the sum of the maximum time delay of one CSA and the power gating judgment time, the third-stage compression circuits CSA11 and CSA12 are switched on, and the coding circuit, the first-stage compression circuit and the second-stage compression circuit are kept switched on. And after waiting for the sum of the maximum time delay of a CSA adder and the power gating judgment time, turning on the fourth-stage compression circuits CSA13 and CSA14, and turning on the coding circuit and the front three-stage compression circuit. And after waiting for the sum of the maximum time delay of one CSA adder and the power gating judgment time, turning on the fifth-stage compression circuit CSA15 and keeping the coding circuit and the front four-stage compression circuit on. After waiting for the sum of the maximum time delay of a CSA adder and the judgment time of power gating, the sixth-stage compression circuit CSA16 is switched on, the coding circuit and the first five-stage compression circuit are kept switched on to obtain a final carry signal and a final sum signal, the result of adding according to the weight is the result of the multiplier, and at the moment, all circuits in the multiplier are switched on. And after the multiplication operation is finished, turning off the encoder group, the Wallace tree compressor and the carry look ahead adder of the multiplier, and waiting for the arrival of the next multiplication operation. The timing diagram is shown in fig. 5.
The operation process of the multiplier is divided into three steps: generating a partial product, compressing the partial product, and adding a carry and a pseudo sum to obtain a final result, wherein the final result specifically comprises the following steps:
let multiplicand A = a 31 a 30 …a 0 Multiplier B = B 31 b 30 …b 0 Wherein a is 31 ,b 31 For the sign bit, P is the product with:
Figure BDA0002009036910000071
the coding table is shown in FIG. 4, which has eight cases corresponding to five different operations and five different operationsThe operations are the addition and subtraction of a multiplicand by one time, the addition and subtraction of a multiplicand by two times and the addition of zero multiplicands, respectively. In two zero-times multiplicand adding operations, the conditions in the initial Booth coding are (+ 0) and (-0) respectively, in the coding stage of the multiplier, the +0 in the coding condition is processed by a carry-reserving adder, namely, the signal is inverted to be (+ 0), the (+ 0) and the (-0) in a coding table are changed to be two (+ 0) codes, namely, the (-0) is processed to be (+ 0), therefore, the variation of the Ei signal size is realized through an adding circuit, so that the Ei signal inversion rate transmitted to the register II and the register III is reduced, namely, the inversion rate of the signal in the coding process is reduced, and the power consumption is reduced. The specific circuit is realized as follows: b is added to the original coding circuit in the coder 2i+1 、b 2i 、b 2i-1 The circuit structure of three signal AND is that b2i +1, b2i, b2i-1 are used as the input of a register and a three-input AND gate, the AND result EN1 is output through the three-input AND gate, and if the output result EN1 is high level, the output result EN1 is (b) 2i+1 ,b 2i ,b 2i-1 ) And = 1, EN1 controls itself to perform an addition operation as an enable signal of the CSA, performs "+0" operation on Ei1, reads a source operand Ei1 of the addition operation from a register i, stores Ei1 signals corresponding to (b 2i +1, b2i, b2 i-1) in the register i, outputs Ei1 as a sum signal S of the CSA after being added by the CSA, then takes Ei2 through a register ii, and finally outputs partial product through a shift register. If EN1 is low (b) 2i+1 ,b 2i ,b 2i-1 ) When the current value is not 1, EN1 outputs EN2 signal by inverting through the inverter, EN2 signal enters the register III to take Ei3, if Ei3 is one of (-2) or (-1), the fourth power gating switch controls the conduction of the paths of the complement circuit and the complement circuit, the complement signal is transmitted to the shift register to output partial product, namely, if the fourth power gating switch, the 2-to-1 selector is thrown to the complement circuit to generate partial product. In other cases, ei3 is fed directly into the shift register for the associated shift operation to obtain the corresponding partial product. The register I, the register II and the register III store eight codes of (b 2i +1, b2i, b2 i-1) and Ei values corresponding to the codes.
The design is in partial productIn the design of the generating circuit, all B used in the multiplier B are taken out simultaneously 2i+1 、b 2i 、b 2i-1 And is related to b in Booth-based 4-code table 2i+1 、b 2i 、b 2i-1 And (4) comparing according to the bit, quickly solving a partial product in a parallel mode of 17 encoders, and inputting the partial product into a compression module. In order to ensure the correctness of the partial product, the Booth coding module is always in an open state in the operation process of the multiplier.
Because the operation of subtracting the multiplicand (-A) by one time and subtracting the multiplicand (-2 x A) by two times exists in the coding table, in an actual circuit, although subtraction can reduce the power consumption by using the addition code to reduce the number of signal inversions, because the negative addition code is inverted according to bits and then added by one, an additional adder is introduced every time the addition code is calculated, the area of a system is increased, the operation speed of the system is reduced, and additional power consumption is brought. The design adds a complement circuit in a coding circuit, firstly uses an adder unit to calculate the complement of two operands which are reduced by one time of multiplicand (-A) and reduced by two times of multiplicand (-2A), and transmits two complements into each partial product generating unit through an interconnection line with power gating, if the values of b2i +1, b2i and b2i-1 are consistent with 1, 0 and 0, the complement signal of the multiplier (-2A) is switched on through the power gating, if the values of b2i +1, b2i and b2i-1 are consistent with 1, 0, 1 or 1,1 and 0, the complement signal of the multiplier (-A) is switched on through the power gating, otherwise, two complement signals are switched off through the power gating, namely when the complements are not needed to participate in the generation of partial products, the two-selected switch switches throw the complement circuit to be invalid, namely, the complement signals are not transmitted into the coding unit. Therefore, the additional power consumption caused by signal inversion is reduced, the number of adders is reduced, the area of a system is reduced, and the generation speed of partial products is increased because the complement of a multiplicand is solved in advance.
The Wallace tree is a tree algorithm for reducing partial products, a Wallace tree type compression structure formed by CSA arrangement and combination is adopted in the design of the compressor part, aiming at the multiplication of 32, the number of compression stages required by the design is six, and the number of the total number of partial products and carry products is changed into 17 → 12 → 8 → 6 → 4 → 3 → 2. Because the generation time delay of the carry signal and the partial product signal is different when the partial product is generated, when the carry signal C and the sum signal S are output from the compression circuit at the current stage and enter the Wallace tree type compression circuit at the lower stage, the situation of competition hazard is possibly generated. The design uses a logic circuit, and controls the on and off of the CSA through the fifth power gating switch and the sixth power gating switch according to the maximum time delay generated by the partial product, thereby ensuring the correct calculation of the carry and the partial product and simultaneously reducing the power consumption of the system.
For each phase of the multiplier B, which is the 17-bit partial product generated in the first step, the weight difference is as shown in fig. 6, and high-low bit expansion needs to be performed according to the weight of the partial product when entering the wallace tree compressor, so that the weights of the three partial products entering the same CSA are the same, and the correct carry signal and sum signal can be calculated. The most basic unit for constructing this module is CSA, and their construction mode will determine the logic depth and complexity of the whole circuit, and even affect the routing requirement and the complexity of the interconnection line, which has a very obvious effect on power consumption.
In the combined form of the CSA, six CSAs of 3-2 models at the first stage are connected to a fifth power gating switch through interconnection wires. The fifth power gating switch is used to control the turn-on of all CSAs in the first stage in parallel, depending on the maximum delay of the encoder group that produces the partial product, i.e., the maximum delay of the encoder that computes (-2 x a). And meanwhile, each 3-2CSA is respectively connected with a sixth power gating switch which is responsible for judging whether the three inputs of the 3-2CSA are all zero or not, and if the three inputs are all zero, the 3-2CSA is kept in a turn-off state. When the input CSA of the first stage CSA is not totally zero, the encoder group keeps on, and the 17 partial products are sent to the Wallace tree compressor for partial product compression under the control of the power gating array.
The specific connection structure and compression principle of the Wallace tree compressor are as follows: 17 partial products (P0 to P16) outputted from the encoder, where P0, P1, and P2 are 3 inputs of CSA1, P3, P4, and P5 are 3 inputs of CSA2, P6, P7, and P8 are 3 inputs of CSA3, P9, P10, and P11 are 3 inputs of CSA4, P12, P13, and P14 are 3 inputs of CSA5, and P15 and P16 are 2 inputs of CSA6, and perform zero padding for the other input terminal of CSA 6; the outputs S1, C1 and S2 of CSA1 are used as 3 inputs of CSA7, the outputs S3 and C3 of CSA2 and CSA3 are used as 3 inputs of CSA8, the outputs S4, C4 and S5 of CSA4 are used as 3 inputs of CSA9, and the outputs S6 and C6 of CSA5 and CSA6 of CSA5 are used as 3 inputs of CSA 10; the output S7 of the CSA7, the output S8 of the C7 and the output C8 of the CSA8 are used as 3 inputs of the CSA11, the output S9 of the C8 and the output C9 of the CSA8 are used as 3 inputs of the CSA12, the output S11 of the CSA11, the output C11 of the C11 and the output S12 of the CSA12 are used as 3 inputs of the CSA13, the output S10 of the C12 and the output C10 of the CSA10 are used as 3 inputs of the CSA14, the output S13 of the CSA13, the output C13 of the CSA13 and the output S14 of the CSA14 are used as 3 inputs of the CSA15, and the output S15 of the CSA15 and the output C14 of the CSA14 are used as 3 inputs of the CSA 16; the outputs S16, C16 of the CSA16 are 2 inputs to the carry look ahead adder.
The 17 partial products (P0-P16) output from the encoder enter different CSAs in a weight value from low to high mode, the CSAs (CSAs 1-CSAs 6) of the first stage adopt a parallel connection mode and are connected in series with the fifth power gating switch of the first stage, the power gating switches simultaneously control the connection and disconnection of each CSA compression unit of the first stage, and each CSA simultaneously carries out compression processing. Simultaneously, each CSA is independently connected with a sixth power gating switch to control the on-off of the CSA, and if three inputs of the CSA connected with the CSA are all zero, the CSA circuit is switched off; the output of the first-stage compression circuit is connected with the input of the second-stage compression circuit, the second-stage compression is formed by connecting CSA 7-CSA 10 in parallel, the fifth power gating switch of the second stage is connected with CSA 7-CSA 10 in series and is connected with the fifth power gating switch of the first stage in series, if and only when the fifth power gating switch of the first stage is switched on, the fifth power gating switch of the second stage is in a switching-on state and controls the compression circuit of the stage to be switched on, each compressor of CSA 7-CSA 10 is independently connected with a sixth power gating switch, whether three inputs entering the CSA are zero or not is judged, and the switching-on and the switching-off of the CSA are controlled; the output of the second stage compression circuit is connected to the input of the third stage compression circuit, the third stage compression is formed by connecting CSA11 and CSA12 in parallel, the fifth power gating switch of the third stage is connected with CSA 11-CSA 12 in series and is connected with the fifth power gating switch of the second stage in series, when and only when the fifth power gating switches of the first stage and the second stage are both switched on, the fifth power gating switch of the third stage is in a switching-on state and controls the compression circuit of the stage to be switched on, the CSA11 and the CSA12 are both independently connected with a sixth power gating switch, whether three inputs entering the CSA are all zero or not is judged, and the switching-on and the switching-off of the CSA are controlled; the output of the third stage compression circuit is connected with the input of the fourth stage compression circuit, the fourth stage compression circuit is formed by connecting a CSA13 and a CSA14 in parallel, the fifth power gating switch of the fourth stage is connected with the CSA 13-CSA 14 in series and is connected with the fifth power gating switch of the third stage in series, the fifth power gating switch of the fourth stage is in an on state when and only when the fifth power gating switches of the first stage, the second stage and the third stage are all on, the compression circuits of the CSA 13-CSA 14 of the stage are controlled to be on, the CSA13 and the CSA14 are both independently connected with a sixth power gating switch, and whether three inputs entering the CSA are all zero or not is judged, so that the on and off of the CSA are controlled; the output of the fourth stage compression circuit is connected with the input of the fifth stage compression circuit, the compression of the fifth stage is composed of a CSA15 and is connected with the fifth power gating switch of the fifth stage in series, if and only if the fifth power gating switches of the first stage, the second stage, the third stage and the fourth stage are all switched on, the fifth power gating switch of the fifth stage is in a switching-on state, and controls the switching-on of the CSA15 compression circuit, the CSA15 is simultaneously connected with a sixth power gating switch, and whether the three inputs entering the CSA are all zero is judged, so that the switching-on and the switching-off of the CSA are controlled; the output of the fifth stage compression circuit is connected to the input of the sixth stage compression circuit, the sixth stage compression is composed of a CSA16 and is connected in series with the fifth power gating switch of the sixth stage, if and only if the fifth power gating switches of the first stage, the second stage, the third stage, the fourth stage and the fifth stage are all turned on, the fifth power gating switch of the sixth stage is in an on state and controls the on of the CSA16 compression circuit, the CSA16 is simultaneously connected with a sixth power gating switch, and whether the three inputs entering the CSA are all zero is judged, so that the on and off of the CSA are controlled. Thus, a Wallace tree type compressor with a power gating switch array is formed, and the power consumption of the compressor is remarkably reduced under the control of the power gating switch array.
The compression process is as follows: firstly, the high and low bits of the partial products P0, P1 and P2 of the input CSA are expanded to be the partial sum with the same weight, that is, the lowest bit of P2 is expanded by 4 bits of zero, and the lowest bit of P1 is expanded by two bits of zero. Next, the partial products P3, P4, and P5 of the input CSA2, the partial products P6, P7, and P8 of the input CSA3, the partial products P9, P10, and P11 of the input CSA4, the partial products P12, P13, and P14 of the input CSA5, and the partial products P15 and P16 of the input CSA6 are expanded to the same weight in the same manner as the input CSA1, and are subjected to partial compression at the first stage.
And after waiting for the detection time of the fifth power gating switch of the first stage and the maximum calculation delay time of the CSA, turning on the second-stage CSA compression circuit through the fifth power gating switch of the first stage. Sending S1, C1 and S2 output by the CSA1 and the CSA2 into the CSA7 as three inputs; outputs C2, S3 and C3 of the CSA2 and CSA3 are sent into the CSA8 as input; outputs S4, C4 and S5 of the CSA4 and the CSA5 are sent into the CSA9 as input; the outputs C5, S6, and C6 of CSA5 and CSA6 are fed to CSA10 as inputs. Before the compression operation, each CSA detects whether the input is zero by the sixth power gating switch, and then starts the operation.
And after waiting for the detection time of the fifth power gating switch of the second stage and the maximum calculation delay time of the CSA, turning on a third stage CSA compression circuit. The S7, C7 and S8 output by the CSA7 and CSA8 are fed into the CSA11 to be used as three inputs; outputs C8, S9 and C9 of the CSA8 and the CSA9 are fed into the CSA12 as input; the outputs S10, C10 of the CSA10 are sent to the next stage of the compression unit. Before the compression operation, each CSA detects whether the input is zero by the sixth power gating switch, and then the operation is started.
And after waiting for the detection time of the third-stage power gating switch and the maximum calculation delay time of the CSA, turning on a fourth-stage CSA compression circuit. Sending S11, C11 and S12 output by the CSA11 and the CSA12 into the CSA13 as three inputs; outputs S10, C10, and C12 of the CSA12 and CSA10 are fed to the CSA14 as inputs. Before the compression operation, each CSA detects whether the input is zero by the sixth power gating switch, and then the operation is started.
And after waiting for the detection time of the power gating switch of the fourth stage and the maximum calculation delay time of the CSA, turning on the CSA compression circuit of the fifth stage. S13, C13 and S14 output by the CSA13 and the CSA14 are fed into the CSA15 to be used as three inputs; the output C14 of the CSA14 is fed to the next stage of compression circuitry. Before the compression operation, each CSA detects whether the input is zero by the sixth power gating switch, and then starts the operation.
And after waiting for the detection time of the power gating switch of the fifth stage and the maximum calculation delay time of the CSA, turning on the CSA compression circuit of the sixth stage. The outputs S15, C15 and C14 of the CSA15 and CSA14 are used as three inputs of the CSA16 to generate the final output sum signal S and carry signal C. Before the compression operation, each CSA detects whether the input is zero by the sixth power gating switch, and then the operation is started.
In summary, 17 partial products generated by the 32-bit multiplication enter the first-stage CSA to be turned on under the control of the first-stage fifth power gating switch, and are compressed in parallel into 6 partial products and generate 6 carry signals, the second-stage fifth power gating switch controls the second-stage CSA to be turned on, the 6 partial products and the 6 carry signals are compressed in parallel into 4 partial products and generate 4 carry signals, the third-stage fifth power gating switch controls the third-stage CSA to be turned on to compress the 4 partial products and the 4 carry signals into 3 partial products and generate 3 carry signals, and the fourth-stage fifth power gating switch controls the fourth-stage CSA to be turned on, so that the 3 partial products and the 3 carry signals are compressed into 2 partial products and generate 2 carry signals. And finally, the fifth power gating switch of the sixth stage controls the turn-on of the CSA of the sixth stage, and compresses the 1 partial product and the 2 carry signals into a pseudo sum signal S and a carry signal C.
And finally, controlling the advance carry adder to be switched on through a third power gating switch, and adding the pseudo sum signal S and the carry signal C to obtain a final result. By processing the partial products with similar weights, the number of sign-extended bits can be increased step by step to the required number of bits. The data transmission on the interconnection line is reduced, meanwhile, the correctness of the operation result is ensured, and the calculation time and the power consumption of the system are reduced.

Claims (5)

1. A low-Power-consumption multiplier based on 4-Booth coding is characterized by comprising a coder group formed by at least two coders in parallel, wherein the input end of the coder group is connected with a bit selector, the input end of the bit selector is respectively connected with a multiplier input port and a multiplicand input port, first Power gating switches are respectively connected between the input end of the bit selector and the multiplier input port and between the input end of the multiplicand input port and the multiplicand input port, the first Power gating switches are used for switching on or off a circuit according to whether the input multiplier or multiplicand is zero, the coder group controls partial product output of a complement signal, the output end of the coder group is connected with the input end of a compressor through a second Power gating switch, the second Power gating switches are used for switching on the circuit according to maximum delay of partial product generated by the coder group, the output end of the compressor is connected with the input end of a carry look-ahead adder through a third Power gating switch, the third Power gating switch is used for receiving the final output of the compressor and pseudo carry signals to switch on the circuit, and the output of the carry-ahead signal, and the output product of the multiplicand adder.
2. A 4-Booth-encoding-based low-Power consumption multiplier of claim 1, wherein the encoder has three data inputs, the three data inputs of each encoder are connected to the output of the bit selector, and the output of each encoder is connected to the second Power gating switch.
3. The 4-Booth coding-based low-power-consumption multiplier of claim 1, wherein the encoder logic circuit comprises a three-input AND gate, three input ends of the three-input AND gate are input ends of an encoder, an output end of the three-input AND gate is connected with an input end of a carry-save adder, three input ends of the three-input AND gate are connected with an input end of the carry-save adder through a register I, the register I outputs Ei corresponding to a bit selector output signal, a pseudo sum signal output end of the carry-save adder is connected with an input end of a register II, an output end of the register II is connected with an input end of a shift register, the shift register outputs partial products, a control complement participation circuit is connected between the output end of the three-input AND gate and the shift register, and the control complement participation circuit is used for controlling formation of the complement participation partial products.
4. The 4-Booth coding-based low power consumption multiplier of claim 3, wherein the control complement participation circuit comprises a register iii, an inverter is connected between the register iii and the output end of the three-input and gate, the input end of the inverter is connected with the output end of the three-input and gate, the output end of the register iii is connected with the input end of the shift register through an alternative selector, the alternative selector is connected with a fourth power gating switch, the fourth power gating switch is further connected with the alternative selector through a complement circuit, the complement circuit generates a partial product for complement calculation, and the fourth power gating switch is used for controlling the on and off of the complement circuit.
5. The 4-Booth-coding-based low-power-consumption multiplier of claim 1, wherein the compressor is a walsh tree compressor formed by a plurality of carry-save adders, each stage of the walsh tree compressor is connected to a fifth power gating switch, the fifth power gating switch is used to control on/off of the stage of the compression circuit, each stage of the fifth power gating switch is connected in series with a next stage of the fifth power gating switch, each stage of the fifth power gating switch controls on/off of the next stage of the fifth power gating switch according to a maximum computation delay of the stage of the compression circuit, each carry-save adder in the walsh tree compressor is connected to a sixth power gating switch, and the sixth power gating switch is used to control on/off of the carry-save adders.
CN201910238829.8A 2019-03-27 2019-03-27 Low-power-consumption multiplier based on 4-Booth coding Active CN110058840B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910238829.8A CN110058840B (en) 2019-03-27 2019-03-27 Low-power-consumption multiplier based on 4-Booth coding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910238829.8A CN110058840B (en) 2019-03-27 2019-03-27 Low-power-consumption multiplier based on 4-Booth coding

Publications (2)

Publication Number Publication Date
CN110058840A CN110058840A (en) 2019-07-26
CN110058840B true CN110058840B (en) 2022-11-25

Family

ID=67317466

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910238829.8A Active CN110058840B (en) 2019-03-27 2019-03-27 Low-power-consumption multiplier based on 4-Booth coding

Country Status (1)

Country Link
CN (1) CN110058840B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20220074965A (en) 2019-11-21 2022-06-03 후아웨이 테크놀러지 컴퍼니 리미티드 Multiplier and operator circuits
CN110955403B (en) * 2019-11-29 2023-04-07 电子科技大学 Approximate base-8 Booth encoder and approximate binary multiplier of mixed Booth encoding
CN113031915A (en) * 2019-12-24 2021-06-25 上海寒武纪信息科技有限公司 Multiplier, data processing method, device and chip
CN113031913A (en) * 2019-12-24 2021-06-25 上海寒武纪信息科技有限公司 Multiplier, data processing method, device and chip
EP4080350A4 (en) * 2020-04-01 2022-12-28 Huawei Technologies Co., Ltd. Multimode fusion multiplier
CN111831255A (en) * 2020-06-30 2020-10-27 深圳市永达电子信息股份有限公司 Processing method and computer readable storage medium for ultra-long digit multiplication
CN113222132B (en) * 2021-05-22 2023-04-18 上海阵量智能科技有限公司 Multiplier, data processing method, chip, computer device and storage medium
CN116205244B (en) * 2023-05-06 2023-08-11 中科亿海微电子科技(苏州)有限公司 Digital signal processing structure

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5040139A (en) * 1990-04-16 1991-08-13 Tran Dzung J Transmission gate multiplexer (TGM) logic circuits and multiplier architectures
CN101382882A (en) * 2008-09-28 2009-03-11 宁波大学 Booth encoder based on CTGAL and thermal insulation complement multiplier-accumulator
CN103092560A (en) * 2013-01-18 2013-05-08 中国科学院自动化研究所 Low-power consumption multiplying unit based on Bypass technology
CN109388373A (en) * 2018-10-12 2019-02-26 胡振波 Multiplier-divider for low-power consumption kernel

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7272624B2 (en) * 2003-09-30 2007-09-18 International Business Machines Corporation Fused booth encoder multiplexer

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5040139A (en) * 1990-04-16 1991-08-13 Tran Dzung J Transmission gate multiplexer (TGM) logic circuits and multiplier architectures
CN101382882A (en) * 2008-09-28 2009-03-11 宁波大学 Booth encoder based on CTGAL and thermal insulation complement multiplier-accumulator
CN103092560A (en) * 2013-01-18 2013-05-08 中国科学院自动化研究所 Low-power consumption multiplying unit based on Bypass technology
CN109388373A (en) * 2018-10-12 2019-02-26 胡振波 Multiplier-divider for low-power consumption kernel

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
32位低功耗高速乘法器设计;张明英;《微处理机》;20160228(第1期);18-21页 *
A Compiler Based Leakage Reduction Technique by Power-Gating Functional units in Embedded Microprocessors;Soumyaroop Roy et al;《20th International Conference on VLSI Design held jointly with 6th International Conference on Embedded Systems》;20070212;82-85页 *
基于修正BOOTH编码的32×32位乘法器;崔晓平;《电子测量技术》;20070222(第01期);1-6页 *

Also Published As

Publication number Publication date
CN110058840A (en) 2019-07-26

Similar Documents

Publication Publication Date Title
CN110058840B (en) Low-power-consumption multiplier based on 4-Booth coding
US6915322B2 (en) Multiplier capable of multiplication of large multiplicands and parallel multiplications of small multiplicands
US7395304B2 (en) Method and apparatus for performing single-cycle addition or subtraction and comparison in redundant form arithmetic
US6366943B1 (en) Adder circuit with the ability to detect zero when rounding
Mueller et al. The vector floating-point unit in a synergistic processor element of a Cell processor
TWI763079B (en) Multiplier and method for floating-point arithmetic, integrated circuit chip, and computing device
CN112540743B (en) Reconfigurable processor-oriented signed multiply accumulator and method
CN104246690A (en) System and method for signal processing in digital signal processors
CN108255777B (en) Embedded floating point type DSP hard core structure for FPGA
CN110688086A (en) Reconfigurable integer-floating point adder
CN100465877C (en) High speed split multiply accumulator apparatus
CN116450217A (en) Multifunctional fixed-point multiplication and multiply-accumulate operation device and method
GB2359677A (en) A Booth array multiplier with low-noise and low power transfer of "drop-off" bits
CN107092462B (en) 64-bit asynchronous multiplier based on FPGA
JPH1195982A (en) Circuit, method and system for arithmetic processing
CN116661733A (en) Multiplier and microprocessor supporting multiple precision
US20050188000A1 (en) Adder
US7739323B2 (en) Systems, methods and computer program products for providing a combined moduli-9 and 3 residue generator
US7840628B2 (en) Combining circuitry
CN209879493U (en) Multiplier and method for generating a digital signal
Tang et al. Design of self-timed asynchronous Booth's multiplier
US9804998B2 (en) Unified computation systems and methods for iterative multiplication and division, efficient overflow detection systems and methods for integer division, and tree-based addition systems and methods for single-cycle multiplication
CN111897513A (en) Multiplier based on reverse polarity technology and code generation method thereof
JPH1040079A (en) Tree circuit
CN1553310A (en) Symmetric cutting algorithm for high-speed low loss multiplier and circuit strucure thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant