CN100552620C

CN100552620C - Large number multiplication device based on quadratic B ooth coding

Info

Publication number: CN100552620C
Application number: CNB2007101220865A
Authority: CN
Inventors: 李树国; 颜晓东; 张坚
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2007-09-21
Filing date: 2007-09-21
Publication date: 2009-10-21
Anticipated expiration: 2027-09-21
Also published as: CN101122850A

Abstract

Large number multiplication device based on quadratic B ooth coding, the integrated circuit (IC) design technical field that belongs to the public-key encryptosystem algorithm, it is characterized in that utilizing linear transform B=8a+b that Booth 64 arithmetic result that partial product produces are carried out secondary coding, the multiplier of encoding based on quadratic B ooth 64 is divided into 3 stage pipeline structure.The 1st level structure is by the multiplicand of 3 times of carry lookahead adder precomputations.In precomputation, be 8 to power respectively ¹A _jWith power be 8 ⁰B _jCarry out quadratic B ooth coding; The 2nd level structure is selected and compression array by two same sections are long-pending, advances a respectively _jAnd b _jThe partial product abbreviation.The 3rd level structure is carried out addition with the partial product that the second level obtains by totalizer.The present invention has improved the speed of multiplying, can be used for the realization of high performance RSA, ECC chip, is applicable to the application of large-scale PKI system on the server.

Description

Large number multiplication device based on quadratic B ooth coding

Technical field

The present invention relates to the integrated circuit (IC) design field of public-key encryptosystem algorithm, the hardware that particularly relates to a kind of large number multiplication of suitable public key encryption algorithm is realized.

Background technology

Application such as the ecommerce that develops rapidly, secure communication are had higher requirement to the information security on the open networking.Public-key encryptosystem such as RSA, ECC is widely used in key transmission and digital signature.The core operation of RSA and prime field ECC all is a modular multiplication, and in order to guarantee security to a certain degree, the long needs in position of RSA modulus reach more than 1024, and long also a needs of ECC modulus reaches more than 233.Most widely used mould multiplication algorithm is a Montgomery algorithm, and its core concept is that modular multiplication is converted into basic multiplying.To sum up the key operation of RSA, the realization of ECC algorithm is a large number multiplication.But the large number multiplication computing of this scale is very low with the software implementation efficiency, can take a large amount of system resource, and therefore the hardware design of various large number multiplication devices is arisen at the historic moment.

In multiplication, if multiplier is the number more than two or two, when taking advantage of, will remove to take advantage of multiplicand with each of multiplier, take advantage of at every turn long-pending, be called partial product.Big number high-speed multiplier adopts parallel organization usually, generally is divided into 3 parts: the one, produce partial product; The 2nd, the partial product that produces is compressed, obtain two partial products: and (Sum), carry (Carry); The 3rd, by totalizer two partial product additions are obtained the result.

Producing the simple method of partial product is by the bit Y among multiplicand X and the multiplier Y _iWith.Then scale-of-two multiplier in N position will produce N partial product.Its specific algorithm is expressed as:

Function?Mult(X，Y)＝X×Y：

For?i?from?0?to?n-1?step?by?1

if?Y _i?equal?1?then

temp←temp+X

X←X×2

Return?temp

Wherein n is binary number X, the figure place of Y.

Y = Σ_{i = 0}^{n - 1} Y_{i} 2^{i}

Improved Booth 4 algorithms are a kind of common generation partial product methods.Its principle is with binary number Y adjacent three among the multiplier Y _I-1Y _iY _I+1Encode, thus make the partial product number reduce near half.Improved Booth 4 algorithm mathematics expression formulas are as follows

Z = X \times Y = X \times Σ_{j = 0, Y_{- 1} = 0}^{\frac{n}{2} - 1} ((- 2 Y_{2 j + 1} + Y_{2 j} + Y_{2 j - 1}) \times 2^{2 j}) = X \times Σ_{j = 0}^{\frac{n}{2} - 1} (B_{j} \times 2^{2 j})

B wherein _j=-2Y _2j+1+ Y _2j+ Y _2j-1(Y _-1=0), n is scale-of-two signed number X, the figure place of Y.

Fig. 1 is that example illustrates improved Booth 4 algorithms with 8 * 8b multiplier.Improved Booth 4 codings are once considered three multipliers: one's own department or unit, an adjacent high position, adjacent low level because between per three all overlapping one, therefore actual each encoding process two multipliers, like this, reduced nearly 1/2 than the partial product number of not encoding.When multiplier was encoded, multiplier needed to replenish one behind the 0th of the lowest order again, i.e. the-1 Y _-1, this perseverance is 0.Partial product among Fig. 1 is selected to determine that by improved Booth 4 codings be shown in Table 1, wherein X represents multiplicand.The multiplicand 2X of twice can move to left 1 by multiplicand X and obtain in the table 1, and the multiplicand opposite number-X of complement representation can be by negate adds a realization to multiplicand.When partial product was chosen as positive number, compensation position S was 0; When partial product was chosen as negative, to the multiplicand negate, compensation position S was 1, thereby realizes that negate adds one operation.

Improved Booth 4 codings of table 1

Fig. 2 is with instantiation explanation Booth 4 algorithms.Multiplier is 91, and binary form is shown 01011011; Multiplicand M is 100, and binary form is shown 01100100.To multiplier from the low level to a high position per three encode, coding rule sees Table 1.For example the 1st of multiplier the, the 0th, the-1 is 110, and according to coding rule, the generation partial product is-X.For-X, because the hardware using complement representation can add a realization then by the every negate of multiplicand.The most significant digit of partial product is a sign bit, can directly expand.And for example the 5th of multiplier the, the 4th, the 3rd is 011, and according to coding rule, the generation partial product is 2X.A realization can move to left multiplicand on hardware.In like manner produce other partial products.By Booth 4 codings, produce 4 partial products, with these 4 partial product additions, promptly obtain the product 9100 that multiplier 91 and multiplicand 100 multiply each other.

Improved Booth 4 algorithms are generalized to Booth 8 algorithms.Booth 8 algorithms can be reduced to partial product quantity original 1/3.Its partial product from ± 0X, ± 1X, ± 2X, ± 3X selects among ± the 4X}.Because 3 times multiplicand 3X can not obtain by shifting function, need change into 2X+X; The long-pending selection of remainder can obtain by shifting function.Improved Booth 8 algorithm mathematics expression formulas are as follows

Z = X \times Y = X \times (- Y_{n - 1} 2^{n - 1} + Σ_{j}^{n - 2} Y_{j} 2^{j})

= X \times Σ_{j = 0, Y_{- 1} = 0}^{\frac{n}{3} - 1} ((- 4 Y_{3 j + 2} + 2 Y_{3 j + 1} + Y_{3 j} + Y_{3 j - 1}) \times 2^{3 j})

B wherein _j=-4Y _3j+2+ 2Y _3j+1+ Y _3j+ Y _3j-1(Y _-1=0), n is scale-of-two signed number X, the figure place of Y.

Equally, improved Booth 8 algorithms can be generalized to Booth 64 algorithms.It is reduced to partial product quantity original 1/6, but need calculate 3X, 5X in advance ..., 31X, this obviously restricts the application in practice of this algorithm.The invention solves high-order Booth algorithm needs the problem of a large amount of odd-multiple multiplicands of precomputation, has improved the arithmetic speed of large number multiplication device.

Summary of the invention

The objective of the invention is to propose a kind of secondary coding Booth 64 linear transforms that are applicable to the large number multiplication device, and provided large number multiplication device circuit realization based on this linear transform.This method can satisfy the large number multiplication computing velocity requirement of high speed public key algorithm system, improves the number of times of signature authentication.

The thinking of method of the present invention is, adopt secondary coding Booth64 linear transform that high-order Booth64 coding result is encoded once more, thereby making needs the quantity of precomputation odd-multiple multiplicand greatly to reduce, reduced the multiplier area, increase the partial product compressibility, improved the arithmetic speed of large number multiplication device.Secondary coding Booth64 linear transform of the present invention refers to and utilizes linear transform B=8a+b that Booth 64 arithmetic result that partial product produces are carried out this conversion regime of secondary coding, makes partial product no longer from the multiplicand set { ± 0, ± 1 of a large amount of odd-multiple is arranged, ± 2, ± 3 ..., choose among ± the 32}, but change into from set { ± 0, ± 1, ± 2, ± 3, choose among ± the 4}, thereby simplified the design of circuit widely.

The thinking of system of the present invention is, according to the roomy little difference of quadratic B ooth multiplier bit that coding method is handled of the present invention, adopt corresponding partial product compression array, adopt 3 stage pipeline structure, dwindled critical path delay, improved the hardware effort frequency, realized that hardware based 3 sections streamlines finish the hardware system of large number multiplication.Large number multiplication device framework of the present invention will provide detailed description among the embodiment hereinafter.

The invention is characterized in, form by first order streamline, second level streamline and third level streamline, wherein,

First order streamline comprises: addition precomputation device, power are 8 ¹A _jScrambler and power are 8 ⁰B _jScrambler, wherein,

Addition precomputation device, input signal are multiplicand X[n-1:0], and the multiplicand X 2X that the back generates that moves to left, the output signal of this addition precomputation device is 3X,

Power is 8 ¹A _jScrambler, input signal are multiplier Y[n-1:0], carry out following operation and obtain output signal sel_a:

Setting a quadratic B ooth64 coding schedule, is a described quadratic B ooth64 coding result B _jSignal a pairing, and the mapping table of signal b with it, j=1,2,3 ... 7,8 ..., 32,

Be calculated as follows quadratic B ooth64 coding result B _j:

B _j＝-32Y _6r+5+16Y _6r+4+8Y _6r+3+4Y _6r+2+2Y _6r+1+Y _6r+Y _6r-1

Wherein, r=0,1,2 ..., n/6; Looking into described quadratic B ooth64 coding schedule obtains and described coding result B _jCorresponding bit wide is 5 output signal sel_a, and having and have only a bit among the output signal sel_a is high level, and all the other bits are low level, this high level bit signal sel_a[i] expression, described i is the quadratic B ooth64 coding result of a, represents the i item,

Power is 8 ⁰B _jScrambler, input signal are multiplier Y[n-1:0], utilize described quadratic B ooth64 coding schedule, by described signal B _jThe bit wide that obtains correspondence is 5 output signal sel_b, and having and have only a bit among this signal sel_b is high level, and other bit is a low level, this high level bit signal sel_b[j] expression, described j is the quadratic B ooth coding result of b, represents the j item,

Second level streamline comprises: the long-pending MUX of selecting of part 1, and part 1 is amassed array n/6:2 reducer, and part 2 is long-pending selects MUX, and the long-pending array n/6:2 reducer of part 2, wherein,

The long-pending MUX of selecting of part 1, input signal are multiplicand X, the partial product 3X of described first order streamline generation and the signal sel_a of described control usefulness; As described sel_a[i] when being high level, select the i of described multiplicand doubly to export,

The long-pending MUX of selecting of part 2, input signal are multiplicand X, the partial product 3X of described first order streamline generation and the signal sel_b of described control usefulness; As described sel_b[j] when being high level, select the j of described multiplicand doubly to export,

The Wallace tree construction is adopted in the compression of the long-pending array of part 1, and PPA partial product array adopts the 4-2 counter and exempts from the abbreviation that the add with carry musical instruments used in a Buddhist or Taoist mass carries out partial product, export until obtaining two partial product carry_a, sum_a,

The Wallace tree construction is adopted in the compression of the long-pending array of part 2, and PPA partial product array adopts the 4-2 counter and exempts from the abbreviation that the add with carry musical instruments used in a Buddhist or Taoist mass carries out partial product, export until obtaining two partial product carry_b, sum_b,

Third level streamline, it is a PPA partial product array 4:2 reducer, input signal is carry_a, sum_a, carry_b, the sum_b that described second level streamline produces, for described signal a and signal b, according to quadratic B ooth64 transcoding, coding transform formula B=8a+b, produce 8_carry_a, 8_sum_a by displacement earlier, more described 8_carry_a, 8_sum_a, carry_b, sum_b are obtained final two partial product Sum, Carry by a 4:2 partial product compressor circuit.

The design carries out behavioral scaling, RTL level coding and functional simulation, the correctness of verification system function with Verilog.Based on 0.18 micron technology library completion logic of SMIC comprehensive (DC), and extract gate delay information, carry out the gate leve simulating, verifying, guarantee the accuracy on function correctness and the sequential.Finally, 570 * 570 large number multiplication device critical path time delay 5.8ns, the about 29.5mm of area ²

Description of drawings

Accompanying drawing in this instructions only provides for illustrated purpose, content of the present invention is not produced any restriction, wherein:

Fig. 1 shows the structured flowchart of 8 * 8 classical improved Booth 4 multipliers;

Fig. 2 shows the concrete multiplication calculated examples synoptic diagram of Booth 4 algorithms;

Fig. 3 shows the structured flowchart of quadratic B ooth 64 multipliers of the present invention's proposition;

Fig. 4 shows the quadratic B ooth 64 coder structure block diagrams that the present invention proposes;

Fig. 5 shows classical CSA counter structure block diagram;

Fig. 6 shows classical 4-2 compressor configuration block diagram;

Fig. 7 shows classical 4-2 compressor reducer chain structure block diagram;

Fig. 8 shows PPA partial product array synoptic diagram of the present invention;

Fig. 9 shows partial product compression Wallace Tree topology diagram of the present invention.

Embodiment

The present invention utilizes linear transform B=8a+b that Booth 64 arithmetic result that partial product produces are carried out secondary coding, and the multiplier of encoding based on quadratic B ooth 64 is divided into 3 stage pipeline structure.The 1st level structure is by the multiplicand of 3 times of carry lookahead adder precomputations.In precomputation, be 8 to power respectively ¹A _jWith power be 8 ⁰B _jCarry out quadratic B ooth coding; The 2nd level structure is selected and compression array by two same sections are long-pending, and weighing respectively is 8 ¹A _jWith power be 8 ⁰B _jPPA partial product array carry out abbreviation.The Wallace tree construction is adopted in the compression of partial product.PPA partial product array adopts the 4-2 counter and exempts from the abbreviation that the add with carry musical instruments used in a Buddhist or Taoist mass carries out partial product; The 3rd level structure is carried out addition with the partial product that the second level obtains by totalizer.

1. secondary coding Booth 64 linear transforms mentioned of the present invention are described in detail as follows:

1.1 high-order Booth 64 algorithms

Count X, Y for the complementary binary of n and represent multiplicand and multiplier respectively if two positions are long:

X = - X_{n - 1} 2^{n - 1} + Σ_{i}^{n - 2} X_{i} 2^{i} - - - (1)

Y = - Y_{n - 1} 2^{n - 1} + Σ_{j}^{n - 2} Y_{j} 2^{j} - - - (2)

X wherein _i, Y _j∈ 0,1}.

If multiplication result is Z, then basic multiplying is

Z = X \times Y = X \times (- Y_{n - 1} 2^{n - 1} + Σ_{j}^{n - 2} Y_{j} 2^{j}) . - - - (3)

High-order Booth 64 codings are once considered 7 multipliers.Because between per 7 all overlapping one, therefore actual each encoding process 6 multipliers, its mathematic(al) representation is as follows:

Y = - Y_{n - 1} 2^{n - 1} + Σ_{j}^{n - 2} Y_{j} 2^{j} = Σ_{j = 0, Y_{- 1} = 0}^{n / 6 - 1} ((- 32 Y_{6 j + 5} + 16 Y_{6 j + 4} + 8 Y_{6 j + 3} + 4 Y_{6 j + 2} + 2 Y_{6 j + 1} + Y_{6 j} + Y_{6 j - 1}) \times 2^{6 j}) . - - - (4)

Formula (4) supposes that the bit wide n of multiplier Y is 6 multiple, if its bit wide is not 6 multiple, then carries out the sign bit compensation in its high position, its bit wide complements to 6 multiple.

According to formula (4), the coding rule that high-order Booth 64 algorithm partial products produce is

B _j＝-32Y _6j+5+16Y _6j+4+8Y _6j+3+4Y _6j+2+2Y _6j+1+Y _6j+Y _6j-1，(5)

Then high-order Booth 64 algorithms can be expressed as

Z = X \times Y = X \times Σ_{j = 0}^{n / 6 - 1} (B_{j} \times 2^{6 j}) . - - - (6)

Formula (6) expression partial product from set ± 0X, ± 1X, ± 2X, ± 3X ..., ± 32X} selects.The odd-multiple 3X that before producing partial product, needs the precomputation multiplicand, 5X ..., 31X.The partial product number is reduced to from n

1.2 quadratic B ooth 64 codings

Because high-order Booth 64 algorithms need a large amount of odd-multiple multiplicands of precomputation, realize having caused difficulty to hardware.In order to overcome this difficulty, on the Booth basis of coding, carry out the linear transformation coding once more.Promptly express high-order Booth 64 coding results with a linear representation B=ka+b, wherein k is a coefficient, and a, b are variable.

Quadratic B ooth 64 linear transforms that the present invention proposes are:

B＝8a+b. (7)

Wherein: a ∈ 0,1,2,3,4}, b ∈ ± 0, ± 1, ± 2, ± 3, and ± 4}, B={ ± 0, ± 1, ± 2, ± 3 ..., ± 32}.

Can obtain quadratic B ooth 64 codings according to formula (7), see Table 2 quadratic B ooth, 64 coding schedules.

Table 2 quadratic B ooth 64 coding schedules

The result that last table explanation high-order Booth 64 algorithms obtain ± 0, ± 1, ± 2, ± 3 ..., ± 32} can realize with linear representation B=8 * a+b. fully.This conversion regime, make partial product no longer from have the set of a large amount of odd-multiple multiplicand ± 0, ± 1, ± 2, ± 3 ..., choose among ± the 32}, but change into from set ± 0, ± 1, ± 2, ± 3, choose among ± the 4}, thereby simplified the design of circuit widely.

Now with formula (7) substitution formula (6), further abbreviation gets

Z = X \times Σ_{j = 0}^{n / 6 - 1} ((8 a_{j} + b_{j}) \times 2^{6 j}) = 8 \times [Σ_{j = 0}^{n / 6 - 1} (a_{j} \times 2^{6 j} \times X)] + [Σ_{j = 0}^{n / 6 - 1} (b_{j} \times 2^{6 j} \times X)] . - - - (8)

Formula (8) illustrates that quadratic B ooth 64 codings will produce two groups of partial product compression array.The partial product quantity of each compression array is

By a _jAfter the partial product that produces need multiply by 8 again with by b _jThe partial product that produces is carried out addition.Take advantage of 8 computing to obtain by moving to left.

Distinguishing feature of the present invention is, adopt secondary coding Booth64 linear transform B=8a+b that high-order Booth64 coding result is encoded once more, thereby make partial product no longer from a large amount of odd-multiple multiplicand set { ± 0 is arranged, ± 1, ± 2, ± 3 ... choose among ± the 32}, but change into from set { ± 0, ± 1, ± 2, ± 3, choose among ± the 4}, need the quantity of precomputation odd number portion multiplicand greatly to reduce, reduced the multiplier area, increase the partial product compressibility, improved the arithmetic speed of large number multiplication device.

Another characteristics of the present invention are, have proposed a kind of secondary coding Booth64 coder structure.The Booth codimg logic comprises 3 parts, is respectively that high-order Booth coding, secondary coding and partial product are selected logic.The information of utilizing the phase ortho position of high-order Booth 64 coding results whether to equate according to linear transform B=8a+b, is carried out secondary coding, has simplified the hardware logic complexity of quadratic B ooth 64 scramblers.

The 3rd characteristics of the present invention are, have adopted three sections pipeline organization multipliers, and this structure meets the present invention and carries elder generation's coding in the secondary coding Booth64 linear change formula method, the back compression, remerge three step requirements, reasonable distribution calculation task, shortened critical path delay.

The 4th characteristics of the present invention are to have proposed a kind of digital circuitry, and this system has realized the Booth 64 linear transform methods of secondary coding proposed by the invention.This system reduces the number of precomputation odd-multiple multiplicand, large number multiplication fast.

To be elaborated to the specific embodiment of the present invention with reference to the accompanying drawings below.

2. the large number multiplication device structure that the present invention is based on the design of secondary coding Booth 64 linear transforms is as follows:

2.1 circuit structure

Fig. 3 has provided the multiplier streamline implementation structure based on quadratic B ooth 64 codings.

1) in first order structure, with the multiplicand of 3 times of carry lookahead adder precomputations.The input of totalizer is respectively the twice 2X of multiplicand X and multiplicand.2X can be by the realization that moves to left on hardware is realized.In precomputation, be 8 to power respectively ¹A _jWith power be 8 ⁰B _jCarry out quadratic B ooth coding.Scrambler and partial product selector switch logic are introduced 2.2.

2) in the structure of the second level, there are long-pending selection of two same sections and compression array, weighing respectively is 8 ¹A _jWith power be 8 ⁰B _jThe partial product abbreviation.Partial product is from set { ± 0X, ± 1X, ± 2X, ± 3X, 4X} selection.The Wallace tree construction is adopted in the compression of partial product number.PPA partial product array adopts the 4-2 counter and exempts from the abbreviation that add with carry musical instruments used in a Buddhist or Taoist mass (CSA-carry save adder) carries out partial product, until obtaining two partial products (Sum and Carry).So, two partial products are selected and the life of compression array common property 4 partial product Sum_a, Carry_a, Sum_b, Carry_b.

3) in third level structure, 4 partial product additions that the second level is obtained.According to formula (7), power is 8 ¹A _j2 partial product Sum_a, Carry_a that produce at first need to produce 8_Sum_a, 8_Carry_a by displacement.4 partial product 8_Sum_a, 8_Carry_a that obtain like this, Sum_b, Carry_b obtain final two partial product Sum, Carry by a 4:2 partial product compressor circuit again.

2.2 quadratic B ooth encoder design

As shown in Figure 4, the Booth codimg logic comprises 3 parts, is respectively that high-order Booth coding, secondary coding and partial product are selected logic.

1) high-order Booth coding

High-order Booth 64 codings are got per 7 adjacent multiplier Y ₆Y ₅Y ₄Y ₃Y ₂Y ₁Y ₀Be one group as input, its coding rule is seen formula (5).

Because

\begin{matrix} B = - 32 Y_{6} + 16 Y_{5} + 8 Y_{4} + 4 Y_{3} + 2 Y_{2} + Y_{1} + Y_{0}, \\ = - [- 32 (1 - Y_{6}) + 16 (1 - Y_{5}) + 8 (1 - Y_{4}) + 4 (1 - Y_{3}) + 2 (1 - Y_{2}) + (1 - Y_{1}) + (1 - Y_{0})], \\ = - [- 32 \overset{&OverBar;}{Y_{6}} + 16 \overset{&OverBar;}{Y_{5}} + 8 \overset{&OverBar;}{Y_{4}} + 4 \overset{&OverBar;}{Y_{3}} + 2 \overset{&OverBar;}{Y_{2}} + \overset{&OverBar;}{Y_{1}} + \overset{&OverBar;}{Y_{0}}], \end{matrix}

Wherein B is multiplier Y ₆Y ₅Y ₄Y ₃Y ₂Y ₁Y ₀High-order Booth 64 coding results, Y is the radix-minus-one complement of Y.

Therefore work as Y ₆Y ₅Y ₄Y ₃Y ₂Y ₁Y ₀Coding result when being B, with Y ₆Y ₅Y ₄Y ₃Y ₂Y ₁Y ₀Every negate, then Y ₆Y ₅Y ₄Y ₃Y ₂Y ₁Y ₀Coding result be-B.Utilize this characteristics, encoder logic I obtains Y by XOR gate ₆Y ₅Y ₄Y ₃Y ₂Y ₁Y ₀Adjacent two output signal E that whether equate ₅E ₄E ₃E ₂E ₁E ₀, work as Y _I+1Equal Y _iThe time, E _iBe 0; Work as Y _I+1Be not equal to Y _iThe time, E _iBe 1; I=0 wherein, 1,2,3,4,5.Encoder logic II passes through E ₅E ₄E ₃E ₂E ₁E ₀Obtain output signal B and polarity with high-order Booth coding rule formula (5).Wherein signal sel_B bit wide is 33, and having and have only a bit among the signal B is high level, and all the other bits are low level.The bit sel_B[j of high level among the signal sel_B] absolute value of expression high-order Booth coding result B | B| equals j.Signal polarity represents the positive-negative polarity of high-order Booth coding result B.

2) secondary coding

The quadratic B ooth 64 coding schedules mapping of secondary coding logical foundation table 2 generates.It is input as high-order Booth 64 coded signal sel_B, polarity.Output signal sel_a, sel_b bit wide are 5.Having and have only a bit among signal sel_a, the sel_b is high level, and all the other bits are low level.High level bit sel_a[i among signal sel_a, the sel_b], sel_a[j] be high level, show that a, the b secondary coding result in the secondary coding linear transform (7) is respectively i, j.

3) partial product is selected

It is MUX that partial product is selected logic.According to selecting signal sel_a, sel_b to produce partial product.As sel_a[i] when being high level, MUX selects the i of multiplicand doubly to export.For example as sel_a[3] when being high level, MUX is selected 3 times of multiplicand 3X outputs.The MUX logic of sel_b and sel_a's is identical.In multiplier architecture, by

Individual quadratic B ooth 64 scramblers generate two groups of partial product compression array.

2.3 partial product compression array design

When the partial product figure place was very big, the carry propagation addition was quite slow, because need very long line to propagate carry from low level to a high position.The method of most important raising multiplier speed is that the utilization carry is saved totalizer (by the Wallace invention, being full adder or 3-2 counter again), three or more numbers is represented with redundant and form, without the carry addition.

This method is represented at Fig. 5, PP1+PP2+PP3=result2+result1.The delay of compression is the delay of a totalizer, is not subjected to the restriction of partial product figure place.Use the most basic three-input adder, with the method layout of recurrence, any part is long-pending can addition and reduce to last 2, and does not use carry propagate adder.An independent carry propagate adder only needs in the end 2 partial products to be turned to net result.This general method can be applied to tree-like or linear in to improve performance.

The shortcoming of the tree structure that Wallace describes is irregular interconnection line and is difficult to carry out layout.A regular more tree structure is based on the binary tree structure.The binary tree structure is made up of a series of 4-2 counters.Promptly import 4 number additions and obtain 2 results.The required ratio of partial product addition is proportional to log N.Such tree structure is rule more.4-2 compressor reducer internal logic is formed 4-2 compressor reducer chain by the 4-2 compressor reducer, as shown in Figure 7 as shown in Figure 6

Make Cin=0, then input/output relation is I3+I2+I1+I0=2Cout+2C+S.

Large number multiplication device of the present invention is a 570*570 position large number multiplication device.Behind first order Booth coding, on the streamline of the second level, will produce partial product by the partial product selector switch.Owing to adopt secondary coding Booth64 linear change formula, will produce 570/6=95 partial product.Its PPA partial product array as shown in Figure 8.

Since the scope of partial product be ± 0, ± X, ± 2X, ± 3X, ± 4X}, so the partial product figure place will rise to 572 from 570.With per 7 one group codings of multiplier, adjacent marshalling has a coincidence according to secondary coding Booth64 linear change formula, and therefore adjacent partial product weight differs 6.Because partial product has positive-negative polarity, therefore under each partial product lowest order, have a carry information Carry.Therefore after considering carry information, in fact PPA partial product array have 96 row, and wherein the carry information of partial product and next partial product merge.

Adopt Wallace Tree structure, be used in combination CSA and 4:2 compressor reducer, 96 row partial products are finally turned to 2 partial products.Its topological structure synoptic diagram is shown in 9.

From the topological structure of Wallace Tree as can be seen, the long-pending summing network of entire portion needs 5 grades of 4:2 compressor reducers and one-level CSA.According to of the delay analysis of last two joints, can infer that the delay that the long-pending summing network of entire portion needs is that 5*3+2=17 XOR gate postpones to 4:2 compressor reducer and CSA.

The design carries out behavioral scaling, RTL level coding and functional simulation, the correctness of verification system function with Verilog.Based on SMIC0.18 micron technology library completion logic comprehensive (DC), and extract gate delay information, carry out the gate leve simulating, verifying, guarantee the accuracy on function correctness and the sequential.Finally, 570 * 570 large number multiplication device critical path time delay 5.8ns, the about 29.5mm of area ²

Claims

1. based on the large number multiplication device of quadratic B ooth coding, it is characterized in that, form by first order streamline, second level streamline and third level streamline, wherein,

Be calculated as follows quadratic B ooth64 coding result B _j:

B _j＝-32Y _6r+5+16Y _6r+4+8Y _6r+3+4Y _6r+2+2Y _6r+1+Y _6r+Y _6r-1