CN100552620C - Large number multiplication device based on quadratic B ooth coding - Google Patents

Large number multiplication device based on quadratic B ooth coding Download PDF

Info

Publication number
CN100552620C
CN100552620C CNB2007101220865A CN200710122086A CN100552620C CN 100552620 C CN100552620 C CN 100552620C CN B2007101220865 A CNB2007101220865 A CN B2007101220865A CN 200710122086 A CN200710122086 A CN 200710122086A CN 100552620 C CN100552620 C CN 100552620C
Authority
CN
China
Prior art keywords
partial product
quadratic
coding
carry
multiplicand
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2007101220865A
Other languages
Chinese (zh)
Other versions
CN101122850A (en
Inventor
李树国
颜晓东
张坚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CNB2007101220865A priority Critical patent/CN100552620C/en
Publication of CN101122850A publication Critical patent/CN101122850A/en
Application granted granted Critical
Publication of CN100552620C publication Critical patent/CN100552620C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

Large number multiplication device based on quadratic B ooth coding, the integrated circuit (IC) design technical field that belongs to the public-key encryptosystem algorithm, it is characterized in that utilizing linear transform B=8a+b that Booth 64 arithmetic result that partial product produces are carried out secondary coding, the multiplier of encoding based on quadratic B ooth 64 is divided into 3 stage pipeline structure.The 1st level structure is by the multiplicand of 3 times of carry lookahead adder precomputations.In precomputation, be 8 to power respectively 1A jWith power be 8 0B jCarry out quadratic B ooth coding; The 2nd level structure is selected and compression array by two same sections are long-pending, advances a respectively jAnd b jThe partial product abbreviation.The 3rd level structure is carried out addition with the partial product that the second level obtains by totalizer.The present invention has improved the speed of multiplying, can be used for the realization of high performance RSA, ECC chip, is applicable to the application of large-scale PKI system on the server.

Description

Large number multiplication device based on quadratic B ooth coding
Technical field
The present invention relates to the integrated circuit (IC) design field of public-key encryptosystem algorithm, the hardware that particularly relates to a kind of large number multiplication of suitable public key encryption algorithm is realized.
Background technology
Application such as the ecommerce that develops rapidly, secure communication are had higher requirement to the information security on the open networking.Public-key encryptosystem such as RSA, ECC is widely used in key transmission and digital signature.The core operation of RSA and prime field ECC all is a modular multiplication, and in order to guarantee security to a certain degree, the long needs in position of RSA modulus reach more than 1024, and long also a needs of ECC modulus reaches more than 233.Most widely used mould multiplication algorithm is a Montgomery algorithm, and its core concept is that modular multiplication is converted into basic multiplying.To sum up the key operation of RSA, the realization of ECC algorithm is a large number multiplication.But the large number multiplication computing of this scale is very low with the software implementation efficiency, can take a large amount of system resource, and therefore the hardware design of various large number multiplication devices is arisen at the historic moment.
In multiplication, if multiplier is the number more than two or two, when taking advantage of, will remove to take advantage of multiplicand with each of multiplier, take advantage of at every turn long-pending, be called partial product.Big number high-speed multiplier adopts parallel organization usually, generally is divided into 3 parts: the one, produce partial product; The 2nd, the partial product that produces is compressed, obtain two partial products: and (Sum), carry (Carry); The 3rd, by totalizer two partial product additions are obtained the result.
Producing the simple method of partial product is by the bit Y among multiplicand X and the multiplier Y iWith.Then scale-of-two multiplier in N position will produce N partial product.Its specific algorithm is expressed as:
Function?Mult(X,Y)=X×Y:
For?i?from?0?to?n-1?step?by?1
if?Y i?equal?1?then
temp←temp+X
X←X×2
Return?temp
Wherein n is binary number X, the figure place of Y. Y = Σ i = 0 n - 1 Y i 2 i
Improved Booth 4 algorithms are a kind of common generation partial product methods.Its principle is with binary number Y adjacent three among the multiplier Y I-1Y iY I+1Encode, thus make the partial product number reduce near half.Improved Booth 4 algorithm mathematics expression formulas are as follows
Z = X × Y = X × Σ j = 0 , Y - 1 = 0 n 2 - 1 ( ( - 2 Y 2 j + 1 + Y 2 j + Y 2 j - 1 ) × 2 2 j ) = X × Σ j = 0 n 2 - 1 ( B j × 2 2 j )
B wherein j=-2Y 2j+1+ Y 2j+ Y 2j-1(Y -1=0), n is scale-of-two signed number X, the figure place of Y.
Fig. 1 is that example illustrates improved Booth 4 algorithms with 8 * 8b multiplier.Improved Booth 4 codings are once considered three multipliers: one's own department or unit, an adjacent high position, adjacent low level because between per three all overlapping one, therefore actual each encoding process two multipliers, like this, reduced nearly 1/2 than the partial product number of not encoding.When multiplier was encoded, multiplier needed to replenish one behind the 0th of the lowest order again, i.e. the-1 Y -1, this perseverance is 0.Partial product among Fig. 1 is selected to determine that by improved Booth 4 codings be shown in Table 1, wherein X represents multiplicand.The multiplicand 2X of twice can move to left 1 by multiplicand X and obtain in the table 1, and the multiplicand opposite number-X of complement representation can be by negate adds a realization to multiplicand.When partial product was chosen as positive number, compensation position S was 0; When partial product was chosen as negative, to the multiplicand negate, compensation position S was 1, thereby realizes that negate adds one operation.
Improved Booth 4 codings of table 1
Fig. 2 is with instantiation explanation Booth 4 algorithms.Multiplier is 91, and binary form is shown 01011011; Multiplicand M is 100, and binary form is shown 01100100.To multiplier from the low level to a high position per three encode, coding rule sees Table 1.For example the 1st of multiplier the, the 0th, the-1 is 110, and according to coding rule, the generation partial product is-X.For-X, because the hardware using complement representation can add a realization then by the every negate of multiplicand.The most significant digit of partial product is a sign bit, can directly expand.And for example the 5th of multiplier the, the 4th, the 3rd is 011, and according to coding rule, the generation partial product is 2X.A realization can move to left multiplicand on hardware.In like manner produce other partial products.By Booth 4 codings, produce 4 partial products, with these 4 partial product additions, promptly obtain the product 9100 that multiplier 91 and multiplicand 100 multiply each other.
Improved Booth 4 algorithms are generalized to Booth 8 algorithms.Booth 8 algorithms can be reduced to partial product quantity original 1/3.Its partial product from ± 0X, ± 1X, ± 2X, ± 3X selects among ± the 4X}.Because 3 times multiplicand 3X can not obtain by shifting function, need change into 2X+X; The long-pending selection of remainder can obtain by shifting function.Improved Booth 8 algorithm mathematics expression formulas are as follows
Z = X × Y = X × ( - Y n - 1 2 n - 1 + Σ j n - 2 Y j 2 j )
= X × Σ j = 0 , Y - 1 = 0 n 3 - 1 ( ( - 4 Y 3 j + 2 + 2 Y 3 j + 1 + Y 3 j + Y 3 j - 1 ) × 2 3 j )
B wherein j=-4Y 3j+2+ 2Y 3j+1+ Y 3j+ Y 3j-1(Y -1=0), n is scale-of-two signed number X, the figure place of Y.
Equally, improved Booth 8 algorithms can be generalized to Booth 64 algorithms.It is reduced to partial product quantity original 1/6, but need calculate 3X, 5X in advance ..., 31X, this obviously restricts the application in practice of this algorithm.The invention solves high-order Booth algorithm needs the problem of a large amount of odd-multiple multiplicands of precomputation, has improved the arithmetic speed of large number multiplication device.
Summary of the invention
The objective of the invention is to propose a kind of secondary coding Booth 64 linear transforms that are applicable to the large number multiplication device, and provided large number multiplication device circuit realization based on this linear transform.This method can satisfy the large number multiplication computing velocity requirement of high speed public key algorithm system, improves the number of times of signature authentication.
The thinking of method of the present invention is, adopt secondary coding Booth64 linear transform that high-order Booth64 coding result is encoded once more, thereby making needs the quantity of precomputation odd-multiple multiplicand greatly to reduce, reduced the multiplier area, increase the partial product compressibility, improved the arithmetic speed of large number multiplication device.Secondary coding Booth64 linear transform of the present invention refers to and utilizes linear transform B=8a+b that Booth 64 arithmetic result that partial product produces are carried out this conversion regime of secondary coding, makes partial product no longer from the multiplicand set { ± 0, ± 1 of a large amount of odd-multiple is arranged, ± 2, ± 3 ..., choose among ± the 32}, but change into from set { ± 0, ± 1, ± 2, ± 3, choose among ± the 4}, thereby simplified the design of circuit widely.
The thinking of system of the present invention is, according to the roomy little difference of quadratic B ooth multiplier bit that coding method is handled of the present invention, adopt corresponding partial product compression array, adopt 3 stage pipeline structure, dwindled critical path delay, improved the hardware effort frequency, realized that hardware based 3 sections streamlines finish the hardware system of large number multiplication.Large number multiplication device framework of the present invention will provide detailed description among the embodiment hereinafter.
The invention is characterized in, form by first order streamline, second level streamline and third level streamline, wherein,
First order streamline comprises: addition precomputation device, power are 8 1A jScrambler and power are 8 0B jScrambler, wherein,
Addition precomputation device, input signal are multiplicand X[n-1:0], and the multiplicand X 2X that the back generates that moves to left, the output signal of this addition precomputation device is 3X,
Power is 8 1A jScrambler, input signal are multiplier Y[n-1:0], carry out following operation and obtain output signal sel_a:
Setting a quadratic B ooth64 coding schedule, is a described quadratic B ooth64 coding result B jSignal a pairing, and the mapping table of signal b with it, j=1,2,3 ... 7,8 ..., 32,
Be calculated as follows quadratic B ooth64 coding result B j:
B j=-32Y 6r+5+16Y 6r+4+8Y 6r+3+4Y 6r+2+2Y 6r+1+Y 6r+Y 6r-1
Wherein, r=0,1,2 ..., n/6; Looking into described quadratic B ooth64 coding schedule obtains and described coding result B jCorresponding bit wide is 5 output signal sel_a, and having and have only a bit among the output signal sel_a is high level, and all the other bits are low level, this high level bit signal sel_a[i] expression, described i is the quadratic B ooth64 coding result of a, represents the i item,
Power is 8 0B jScrambler, input signal are multiplier Y[n-1:0], utilize described quadratic B ooth64 coding schedule, by described signal B jThe bit wide that obtains correspondence is 5 output signal sel_b, and having and have only a bit among this signal sel_b is high level, and other bit is a low level, this high level bit signal sel_b[j] expression, described j is the quadratic B ooth coding result of b, represents the j item,
Second level streamline comprises: the long-pending MUX of selecting of part 1, and part 1 is amassed array n/6:2 reducer, and part 2 is long-pending selects MUX, and the long-pending array n/6:2 reducer of part 2, wherein,
The long-pending MUX of selecting of part 1, input signal are multiplicand X, the partial product 3X of described first order streamline generation and the signal sel_a of described control usefulness; As described sel_a[i] when being high level, select the i of described multiplicand doubly to export,
The long-pending MUX of selecting of part 2, input signal are multiplicand X, the partial product 3X of described first order streamline generation and the signal sel_b of described control usefulness; As described sel_b[j] when being high level, select the j of described multiplicand doubly to export,
The Wallace tree construction is adopted in the compression of the long-pending array of part 1, and PPA partial product array adopts the 4-2 counter and exempts from the abbreviation that the add with carry musical instruments used in a Buddhist or Taoist mass carries out partial product, export until obtaining two partial product carry_a, sum_a,
The Wallace tree construction is adopted in the compression of the long-pending array of part 2, and PPA partial product array adopts the 4-2 counter and exempts from the abbreviation that the add with carry musical instruments used in a Buddhist or Taoist mass carries out partial product, export until obtaining two partial product carry_b, sum_b,
Third level streamline, it is a PPA partial product array 4:2 reducer, input signal is carry_a, sum_a, carry_b, the sum_b that described second level streamline produces, for described signal a and signal b, according to quadratic B ooth64 transcoding, coding transform formula B=8a+b, produce 8_carry_a, 8_sum_a by displacement earlier, more described 8_carry_a, 8_sum_a, carry_b, sum_b are obtained final two partial product Sum, Carry by a 4:2 partial product compressor circuit.
The design carries out behavioral scaling, RTL level coding and functional simulation, the correctness of verification system function with Verilog.Based on 0.18 micron technology library completion logic of SMIC comprehensive (DC), and extract gate delay information, carry out the gate leve simulating, verifying, guarantee the accuracy on function correctness and the sequential.Finally, 570 * 570 large number multiplication device critical path time delay 5.8ns, the about 29.5mm of area 2
Description of drawings
Accompanying drawing in this instructions only provides for illustrated purpose, content of the present invention is not produced any restriction, wherein:
Fig. 1 shows the structured flowchart of 8 * 8 classical improved Booth 4 multipliers;
Fig. 2 shows the concrete multiplication calculated examples synoptic diagram of Booth 4 algorithms;
Fig. 3 shows the structured flowchart of quadratic B ooth 64 multipliers of the present invention's proposition;
Fig. 4 shows the quadratic B ooth 64 coder structure block diagrams that the present invention proposes;
Fig. 5 shows classical CSA counter structure block diagram;
Fig. 6 shows classical 4-2 compressor configuration block diagram;
Fig. 7 shows classical 4-2 compressor reducer chain structure block diagram;
Fig. 8 shows PPA partial product array synoptic diagram of the present invention;
Fig. 9 shows partial product compression Wallace Tree topology diagram of the present invention.
Embodiment
The present invention utilizes linear transform B=8a+b that Booth 64 arithmetic result that partial product produces are carried out secondary coding, and the multiplier of encoding based on quadratic B ooth 64 is divided into 3 stage pipeline structure.The 1st level structure is by the multiplicand of 3 times of carry lookahead adder precomputations.In precomputation, be 8 to power respectively 1A jWith power be 8 0B jCarry out quadratic B ooth coding; The 2nd level structure is selected and compression array by two same sections are long-pending, and weighing respectively is 8 1A jWith power be 8 0B jPPA partial product array carry out abbreviation.The Wallace tree construction is adopted in the compression of partial product.PPA partial product array adopts the 4-2 counter and exempts from the abbreviation that the add with carry musical instruments used in a Buddhist or Taoist mass carries out partial product; The 3rd level structure is carried out addition with the partial product that the second level obtains by totalizer.
1. secondary coding Booth 64 linear transforms mentioned of the present invention are described in detail as follows:
1.1 high-order Booth 64 algorithms
Count X, Y for the complementary binary of n and represent multiplicand and multiplier respectively if two positions are long:
X = - X n - 1 2 n - 1 + Σ i n - 2 X i 2 i - - - ( 1 )
Y = - Y n - 1 2 n - 1 + Σ j n - 2 Y j 2 j - - - ( 2 )
X wherein i, Y j∈ 0,1}.
If multiplication result is Z, then basic multiplying is
Z = X × Y = X × ( - Y n - 1 2 n - 1 + Σ j n - 2 Y j 2 j ) . - - - ( 3 )
High-order Booth 64 codings are once considered 7 multipliers.Because between per 7 all overlapping one, therefore actual each encoding process 6 multipliers, its mathematic(al) representation is as follows:
Y = - Y n - 1 2 n - 1 + Σ j n - 2 Y j 2 j = Σ j = 0 , Y - 1 = 0 n / 6 - 1 ( ( - 32 Y 6 j + 5 + 16 Y 6 j + 4 + 8 Y 6 j + 3 + 4 Y 6 j + 2 + 2 Y 6 j + 1 + Y 6 j + Y 6 j - 1 ) × 2 6 j ) . - - - ( 4 )
Formula (4) supposes that the bit wide n of multiplier Y is 6 multiple, if its bit wide is not 6 multiple, then carries out the sign bit compensation in its high position, its bit wide complements to 6 multiple.
According to formula (4), the coding rule that high-order Booth 64 algorithm partial products produce is
B j=-32Y 6j+5+16Y 6j+4+8Y 6j+3+4Y 6j+2+2Y 6j+1+Y 6j+Y 6j-1,(5)
Then high-order Booth 64 algorithms can be expressed as
Z = X × Y = X × Σ j = 0 n / 6 - 1 ( B j × 2 6 j ) . - - - ( 6 )
Formula (6) expression partial product from set ± 0X, ± 1X, ± 2X, ± 3X ..., ± 32X} selects.The odd-multiple 3X that before producing partial product, needs the precomputation multiplicand, 5X ..., 31X.The partial product number is reduced to from n
Figure C20071012208600093
1.2 quadratic B ooth 64 codings
Because high-order Booth 64 algorithms need a large amount of odd-multiple multiplicands of precomputation, realize having caused difficulty to hardware.In order to overcome this difficulty, on the Booth basis of coding, carry out the linear transformation coding once more.Promptly express high-order Booth 64 coding results with a linear representation B=ka+b, wherein k is a coefficient, and a, b are variable.
Quadratic B ooth 64 linear transforms that the present invention proposes are:
B=8a+b. (7)
Wherein: a ∈ 0,1,2,3,4}, b ∈ ± 0, ± 1, ± 2, ± 3, and ± 4}, B={ ± 0, ± 1, ± 2, ± 3 ..., ± 32}.
Can obtain quadratic B ooth 64 codings according to formula (7), see Table 2 quadratic B ooth, 64 coding schedules.
Table 2 quadratic B ooth 64 coding schedules
Figure C20071012208600094
The result that last table explanation high-order Booth 64 algorithms obtain ± 0, ± 1, ± 2, ± 3 ..., ± 32} can realize with linear representation B=8 * a+b. fully.This conversion regime, make partial product no longer from have the set of a large amount of odd-multiple multiplicand ± 0, ± 1, ± 2, ± 3 ..., choose among ± the 32}, but change into from set ± 0, ± 1, ± 2, ± 3, choose among ± the 4}, thereby simplified the design of circuit widely.
Now with formula (7) substitution formula (6), further abbreviation gets
Z = X × Σ j = 0 n / 6 - 1 ( ( 8 a j + b j ) × 2 6 j ) = 8 × [ Σ j = 0 n / 6 - 1 ( a j × 2 6 j × X ) ] + [ Σ j = 0 n / 6 - 1 ( b j × 2 6 j × X ) ] . - - - ( 8 )
Formula (8) illustrates that quadratic B ooth 64 codings will produce two groups of partial product compression array.The partial product quantity of each compression array is
Figure C20071012208600102
By a jAfter the partial product that produces need multiply by 8 again with by b jThe partial product that produces is carried out addition.Take advantage of 8 computing to obtain by moving to left.
Distinguishing feature of the present invention is, adopt secondary coding Booth64 linear transform B=8a+b that high-order Booth64 coding result is encoded once more, thereby make partial product no longer from a large amount of odd-multiple multiplicand set { ± 0 is arranged, ± 1, ± 2, ± 3 ... choose among ± the 32}, but change into from set { ± 0, ± 1, ± 2, ± 3, choose among ± the 4}, need the quantity of precomputation odd number portion multiplicand greatly to reduce, reduced the multiplier area, increase the partial product compressibility, improved the arithmetic speed of large number multiplication device.
Another characteristics of the present invention are, have proposed a kind of secondary coding Booth64 coder structure.The Booth codimg logic comprises 3 parts, is respectively that high-order Booth coding, secondary coding and partial product are selected logic.The information of utilizing the phase ortho position of high-order Booth 64 coding results whether to equate according to linear transform B=8a+b, is carried out secondary coding, has simplified the hardware logic complexity of quadratic B ooth 64 scramblers.
The 3rd characteristics of the present invention are, have adopted three sections pipeline organization multipliers, and this structure meets the present invention and carries elder generation's coding in the secondary coding Booth64 linear change formula method, the back compression, remerge three step requirements, reasonable distribution calculation task, shortened critical path delay.
The 4th characteristics of the present invention are to have proposed a kind of digital circuitry, and this system has realized the Booth 64 linear transform methods of secondary coding proposed by the invention.This system reduces the number of precomputation odd-multiple multiplicand, large number multiplication fast.
To be elaborated to the specific embodiment of the present invention with reference to the accompanying drawings below.
2. the large number multiplication device structure that the present invention is based on the design of secondary coding Booth 64 linear transforms is as follows:
2.1 circuit structure
Fig. 3 has provided the multiplier streamline implementation structure based on quadratic B ooth 64 codings.
1) in first order structure, with the multiplicand of 3 times of carry lookahead adder precomputations.The input of totalizer is respectively the twice 2X of multiplicand X and multiplicand.2X can be by the realization that moves to left on hardware is realized.In precomputation, be 8 to power respectively 1A jWith power be 8 0B jCarry out quadratic B ooth coding.Scrambler and partial product selector switch logic are introduced 2.2.
2) in the structure of the second level, there are long-pending selection of two same sections and compression array, weighing respectively is 8 1A jWith power be 8 0B jThe partial product abbreviation.Partial product is from set { ± 0X, ± 1X, ± 2X, ± 3X, 4X} selection.The Wallace tree construction is adopted in the compression of partial product number.PPA partial product array adopts the 4-2 counter and exempts from the abbreviation that add with carry musical instruments used in a Buddhist or Taoist mass (CSA-carry save adder) carries out partial product, until obtaining two partial products (Sum and Carry).So, two partial products are selected and the life of compression array common property 4 partial product Sum_a, Carry_a, Sum_b, Carry_b.
3) in third level structure, 4 partial product additions that the second level is obtained.According to formula (7), power is 8 1A j2 partial product Sum_a, Carry_a that produce at first need to produce 8_Sum_a, 8_Carry_a by displacement.4 partial product 8_Sum_a, 8_Carry_a that obtain like this, Sum_b, Carry_b obtain final two partial product Sum, Carry by a 4:2 partial product compressor circuit again.
2.2 quadratic B ooth encoder design
As shown in Figure 4, the Booth codimg logic comprises 3 parts, is respectively that high-order Booth coding, secondary coding and partial product are selected logic.
1) high-order Booth coding
High-order Booth 64 codings are got per 7 adjacent multiplier Y 6Y 5Y 4Y 3Y 2Y 1Y 0Be one group as input, its coding rule is seen formula (5).
Because B = - 32 Y 6 + 16 Y 5 + 8 Y 4 + 4 Y 3 + 2 Y 2 + Y 1 + Y 0 , = - [ - 32 ( 1 - Y 6 ) + 16 ( 1 - Y 5 ) + 8 ( 1 - Y 4 ) + 4 ( 1 - Y 3 ) + 2 ( 1 - Y 2 ) + ( 1 - Y 1 ) + ( 1 - Y 0 ) ] , = - [ - 32 Y 6 ‾ + 16 Y 5 ‾ + 8 Y 4 ‾ + 4 Y 3 ‾ + 2 Y 2 ‾ + Y 1 ‾ + Y 0 ‾ ] ,
Wherein B is multiplier Y 6Y 5Y 4Y 3Y 2Y 1Y 0High-order Booth 64 coding results, Y is the radix-minus-one complement of Y.
Therefore work as Y 6Y 5Y 4Y 3Y 2Y 1Y 0Coding result when being B, with Y 6Y 5Y 4Y 3Y 2Y 1Y 0Every negate, then Y 6Y 5Y 4Y 3Y 2Y 1Y 0Coding result be-B.Utilize this characteristics, encoder logic I obtains Y by XOR gate 6Y 5Y 4Y 3Y 2Y 1Y 0Adjacent two output signal E that whether equate 5E 4E 3E 2E 1E 0, work as Y I+1Equal Y iThe time, E iBe 0; Work as Y I+1Be not equal to Y iThe time, E iBe 1; I=0 wherein, 1,2,3,4,5.Encoder logic II passes through E 5E 4E 3E 2E 1E 0Obtain output signal B and polarity with high-order Booth coding rule formula (5).Wherein signal sel_B bit wide is 33, and having and have only a bit among the signal B is high level, and all the other bits are low level.The bit sel_B[j of high level among the signal sel_B] absolute value of expression high-order Booth coding result B | B| equals j.Signal polarity represents the positive-negative polarity of high-order Booth coding result B.
2) secondary coding
The quadratic B ooth 64 coding schedules mapping of secondary coding logical foundation table 2 generates.It is input as high-order Booth 64 coded signal sel_B, polarity.Output signal sel_a, sel_b bit wide are 5.Having and have only a bit among signal sel_a, the sel_b is high level, and all the other bits are low level.High level bit sel_a[i among signal sel_a, the sel_b], sel_a[j] be high level, show that a, the b secondary coding result in the secondary coding linear transform (7) is respectively i, j.
3) partial product is selected
It is MUX that partial product is selected logic.According to selecting signal sel_a, sel_b to produce partial product.As sel_a[i] when being high level, MUX selects the i of multiplicand doubly to export.For example as sel_a[3] when being high level, MUX is selected 3 times of multiplicand 3X outputs.The MUX logic of sel_b and sel_a's is identical.In multiplier architecture, by
Figure C20071012208600121
Individual quadratic B ooth 64 scramblers generate two groups of partial product compression array.
2.3 partial product compression array design
When the partial product figure place was very big, the carry propagation addition was quite slow, because need very long line to propagate carry from low level to a high position.The method of most important raising multiplier speed is that the utilization carry is saved totalizer (by the Wallace invention, being full adder or 3-2 counter again), three or more numbers is represented with redundant and form, without the carry addition.
This method is represented at Fig. 5, PP1+PP2+PP3=result2+result1.The delay of compression is the delay of a totalizer, is not subjected to the restriction of partial product figure place.Use the most basic three-input adder, with the method layout of recurrence, any part is long-pending can addition and reduce to last 2, and does not use carry propagate adder.An independent carry propagate adder only needs in the end 2 partial products to be turned to net result.This general method can be applied to tree-like or linear in to improve performance.
The shortcoming of the tree structure that Wallace describes is irregular interconnection line and is difficult to carry out layout.A regular more tree structure is based on the binary tree structure.The binary tree structure is made up of a series of 4-2 counters.Promptly import 4 number additions and obtain 2 results.The required ratio of partial product addition is proportional to log N.Such tree structure is rule more.4-2 compressor reducer internal logic is formed 4-2 compressor reducer chain by the 4-2 compressor reducer, as shown in Figure 7 as shown in Figure 6
Make Cin=0, then input/output relation is I3+I2+I1+I0=2Cout+2C+S.
Large number multiplication device of the present invention is a 570*570 position large number multiplication device.Behind first order Booth coding, on the streamline of the second level, will produce partial product by the partial product selector switch.Owing to adopt secondary coding Booth64 linear change formula, will produce 570/6=95 partial product.Its PPA partial product array as shown in Figure 8.
Since the scope of partial product be ± 0, ± X, ± 2X, ± 3X, ± 4X}, so the partial product figure place will rise to 572 from 570.With per 7 one group codings of multiplier, adjacent marshalling has a coincidence according to secondary coding Booth64 linear change formula, and therefore adjacent partial product weight differs 6.Because partial product has positive-negative polarity, therefore under each partial product lowest order, have a carry information Carry.Therefore after considering carry information, in fact PPA partial product array have 96 row, and wherein the carry information of partial product and next partial product merge.
Adopt Wallace Tree structure, be used in combination CSA and 4:2 compressor reducer, 96 row partial products are finally turned to 2 partial products.Its topological structure synoptic diagram is shown in 9.
From the topological structure of Wallace Tree as can be seen, the long-pending summing network of entire portion needs 5 grades of 4:2 compressor reducers and one-level CSA.According to of the delay analysis of last two joints, can infer that the delay that the long-pending summing network of entire portion needs is that 5*3+2=17 XOR gate postpones to 4:2 compressor reducer and CSA.
The design carries out behavioral scaling, RTL level coding and functional simulation, the correctness of verification system function with Verilog.Based on SMIC0.18 micron technology library completion logic comprehensive (DC), and extract gate delay information, carry out the gate leve simulating, verifying, guarantee the accuracy on function correctness and the sequential.Finally, 570 * 570 large number multiplication device critical path time delay 5.8ns, the about 29.5mm of area 2

Claims (1)

1. based on the large number multiplication device of quadratic B ooth coding, it is characterized in that, form by first order streamline, second level streamline and third level streamline, wherein,
First order streamline comprises: addition precomputation device, power are 8 1A jScrambler and power are 8 0B jScrambler, wherein,
Addition precomputation device, input signal are multiplicand X[n-1:0], and the multiplicand X 2X that the back generates that moves to left, the output signal of this addition precomputation device is 3X,
Power is 8 1A jScrambler, input signal are multiplier Y[n-1:0], carry out following operation and obtain output signal sel_a:
Setting a quadratic B ooth64 coding schedule, is a described quadratic B ooth64 coding result B jSignal a pairing, and the mapping table of signal b with it, j=1,2,3 ... 7,8 ..., 32,
Be calculated as follows quadratic B ooth64 coding result B j:
B j=-32Y 6r+5+16Y 6r+4+8Y 6r+3+4Y 6r+2+2Y 6r+1+Y 6r+Y 6r-1
Wherein, r=0,1,2 ..., n/6; Looking into described quadratic B ooth64 coding schedule obtains and described coding result B jCorresponding bit wide is 5 output signal sel_a, and having and have only a bit among the output signal sel_a is high level, and all the other bits are low level, this high level bit signal sel_a[i] expression, described i is the quadratic B ooth64 coding result of a, represents the i item,
Power is 8 0B jScrambler, input signal are multiplier Y[n-1:0], utilize described quadratic B ooth64 coding schedule, by described signal B jThe bit wide that obtains correspondence is 5 output signal sel_b, and having and have only a bit among this signal sel_b is high level, and other bit is a low level, this high level bit signal sel_b[j] expression, described j is the quadratic B ooth coding result of b, represents the j item,
Second level streamline comprises: the long-pending MUX of selecting of part 1, and part 1 is amassed array n/6:2 reducer, and part 2 is long-pending selects MUX, and the long-pending array n/6:2 reducer of part 2, wherein,
The long-pending MUX of selecting of part 1, input signal are multiplicand X, the partial product 3X of described first order streamline generation and the signal sel_a of described control usefulness; As described sel_a[i] when being high level, select the i of described multiplicand doubly to export,
The long-pending MUX of selecting of part 2, input signal are multiplicand X, the partial product 3X of described first order streamline generation and the signal sel_b of described control usefulness; As described sel_b[j] when being high level, select the j of described multiplicand doubly to export,
The Wallace tree construction is adopted in the compression of the long-pending array of part 1, and PPA partial product array adopts the 4-2 counter and exempts from the abbreviation that the add with carry musical instruments used in a Buddhist or Taoist mass carries out partial product, export until obtaining two partial product carry_a, sum_a,
The Wallace tree construction is adopted in the compression of the long-pending array of part 2, and PPA partial product array adopts the 4-2 counter and exempts from the abbreviation that the add with carry musical instruments used in a Buddhist or Taoist mass carries out partial product, export until obtaining two partial product carry_b, sum_b,
Third level streamline, it is a PPA partial product array 4:2 reducer, input signal is carry_a, sum_a, carry_b, the sum_b that described second level streamline produces, for described signal a and signal b, according to quadratic B ooth64 transcoding, coding transform formula B=8a+b, produce 8_carry_a, 8_sum_a by displacement earlier, more described 8_carry_a, 8_sum_a, carry_b, sum_b are obtained final two partial product Sum, Carry by a 4:2 partial product compressor circuit.
CNB2007101220865A 2007-09-21 2007-09-21 Large number multiplication device based on quadratic B ooth coding Expired - Fee Related CN100552620C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2007101220865A CN100552620C (en) 2007-09-21 2007-09-21 Large number multiplication device based on quadratic B ooth coding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2007101220865A CN100552620C (en) 2007-09-21 2007-09-21 Large number multiplication device based on quadratic B ooth coding

Publications (2)

Publication Number Publication Date
CN101122850A CN101122850A (en) 2008-02-13
CN100552620C true CN100552620C (en) 2009-10-21

Family

ID=39085195

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2007101220865A Expired - Fee Related CN100552620C (en) 2007-09-21 2007-09-21 Large number multiplication device based on quadratic B ooth coding

Country Status (1)

Country Link
CN (1) CN100552620C (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI489375B (en) * 2010-12-03 2015-06-21 Via Tech Inc Carryless multiplication apparatus and method
CN102184086B (en) * 2011-05-11 2012-11-07 电子科技大学 Booth encoder and multiplier
CN102270110B (en) * 2011-06-30 2013-06-12 西安电子科技大学 Improved 16Booth-based coder
CN103684763A (en) * 2012-09-19 2014-03-26 北京握奇数据系统有限公司 Data encryption method based on RSA algorithm, device and smart card
CN102999311A (en) * 2012-12-10 2013-03-27 张友能 48*30 bit multiplier based on Booth algorithm
CN103645883A (en) * 2013-12-18 2014-03-19 四川卫士通信息安全平台技术有限公司 FPGA (field programmable gate array) based high-radix modular multiplier
CN106775577B (en) * 2017-01-03 2019-05-14 南京航空航天大学 A kind of design method of the non-precision redundant manipulators multiplier of high-performance
CN110196709B (en) * 2019-06-04 2021-06-08 浙江大学 Nonvolatile 8-bit Booth multiplier based on RRAM
CN110428247A (en) * 2019-07-02 2019-11-08 常州市常河电子技术开发有限公司 The variable weight value Fast implementation of multiplication and divisions is counted in asymmetric encryption calculating greatly
CN110413254B (en) * 2019-09-24 2020-01-10 上海寒武纪信息科技有限公司 Data processor, method, chip and electronic equipment
CN111025133B (en) * 2019-10-24 2022-02-22 北京时代民芯科技有限公司 Test method of second-order Booth coding Wallace tree multiplier circuit
CN110955403B (en) * 2019-11-29 2023-04-07 电子科技大学 Approximate base-8 Booth encoder and approximate binary multiplier of mixed Booth encoding
CN111488133B (en) * 2020-04-15 2023-03-28 电子科技大学 High-radix approximate Booth coding method and mixed-radix Booth coding approximate multiplier
CN112068800B (en) * 2020-08-10 2022-10-25 北京草木芯科技有限公司 Array compressor and large number multiplier with same
CN112988112B (en) * 2021-04-27 2021-08-10 北京壁仞科技开发有限公司 Dot product calculating device
CN116991359B (en) * 2023-09-26 2023-12-22 上海为旌科技有限公司 Booth multiplier, hybrid Booth multiplier and operation method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于修正BOOTH编码的32×32位乘法器. 崔晓平.电子测量技术,第30卷第1期. 2007 *
改进型booth华莱士树的低功耗、高速并行乘法器的设计. 王定等.电子器件,第30卷第1期. 2007 *

Also Published As

Publication number Publication date
CN101122850A (en) 2008-02-13

Similar Documents

Publication Publication Date Title
CN100552620C (en) Large number multiplication device based on quadratic B ooth coding
Mathews Theory of numbers
CN103176767B (en) The implementation method of the floating number multiply-accumulate unit that a kind of low-power consumption height is handled up
CN101521504B (en) Implementation method for reversible logic unit used for low power consumption encryption system
CN201153259Y (en) Parallel data cyclic redundancy check apparatus and bidirectional data transmission system
CN103092560A (en) Low-power consumption multiplying unit based on Bypass technology
CN106775577A (en) A kind of high-performance non-precision redundant manipulators multiplier and its method for designing
Hiasat A reverse converter and sign detectors for an extended RNS five-moduli set
CN101262345A (en) Time point system for ellipse curve password system
CN1326397C (en) DCT rapid changing structure
CN103412737B (en) Realize the gate circuit of base 4-Booth coded method and streamline large number multiplication device based on the method
CN107992283A (en) A kind of method and apparatus that finite field multiplier is realized based on dimensionality reduction
Surendran et al. Implementation of fast multiplier using modified Radix-4 booth algorithm with redundant binary adder for low energy applications
CN100527073C (en) High efficiency modular multiplication method and device
Hiasat Sign detector for the extended four‐moduli set
CN106682732A (en) Gaussian error function circuit applied to neural networks
CN101482808B (en) 7:2 compressor used for large number multiplier
Basha et al. Design and Implementation of Radix-4 Based High Speed Multiplier for ALU's Using Minimal Partial Products
Cui et al. A parallel decimal multiplier using hybrid binary coded decimal (BCD) codes
CN103955585A (en) FIR (finite impulse response) filter structure for low-power fault-tolerant circuit
US20020103840A1 (en) Apparatus and method for digital multiplication using redundant binary arithmetic
Timarchi et al. A novel high-speed low-power binary signed-digit adder
Méloni et al. Efficient double bases for scalar multiplication
CN100382012C (en) Montgomery's modular multiply method of expansion operation number length
Mahapatra et al. RSA cryptosystem with modified Montgomery modular multiplier

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20091021

Termination date: 20100921