CN102122241A

CN102122241A - Analog multiplier/divider applicable to prime field and polynomial field

Info

Publication number: CN102122241A
Application number: CN2010100226476A
Authority: CN
Inventors: 韩军; 黄伟; 曹丹; 曾晓洋
Original assignee: Fudan University
Current assignee: Fudan University
Priority date: 2010-01-08
Filing date: 2010-01-08
Publication date: 2011-07-13

Abstract

The invention relates to a dual-field analog multiplier/divider which is suitable for an ECC (Elliptic Curve Cypher) algorithm required in high-speed network application and portable mobile equipment application. The analog multiplier/divider comprises four PE operating units, 5 register files (Regfile), a Booth encoding unit, an input register (Load file), a control module (control) and 17 multi-path selectors. The analog multiplier/divider changes the connection of the four PE operating units and the reading position of data through the 17 multi-path selectors so as to complete analog multiplying or analog dividing, has expandability, can support 480-bit analog multiplying/dividing operations maximally, and shares hardware units for multiplying/dividing operations so as to reduce the area of the hardware; and in the algorithm, addition and subtraction and shift operation of long operands are carried out by the unit of byte, therefore, the convergence rate of the algorithm is greatly accelerated, and further the operating speed is multiplied.

Description

A kind of mould in prime field and polynomial expression territory that is applicable to takes advantage of mould to remove device

Technical field

The invention belongs to the integrated circuit (IC) design technical field, be specifically related to a kind of two territories mould at elliptic curve cipher (ECC) algorithm of highspeed network applications and portable mobile apparatus application need that is applicable to and take advantage of mould to remove device.

Background technology

In the present age, deepen continuously along with informationalized, increasing information will be exposed in the disclosed media.In order to protect those sensitive informations, various cryptographic algorithms are applied in the wireless communication field.Yet the communication facilities especially relatively limited processing power of portable set can't satisfy the demand of the data volume of increase day by day.

Cipher system generally can be divided into two types: DSE arithmetic and public-key cryptosystem.The great advantage of DSE arithmetic is the efficient height, but major defect but is apparent in view: cipher key distribution problem, the channel that promptly requires distributed key be secret be again fidelity; Another shortcoming is a cipher key management considerations, and promptly in the network that N entity arranged, each entity is the key of a necessary access N-1 entity all.And public-key cryptosystem does not have these shortcomings, and it is fidelity that public key cryptography only requires the exchange of key, and need not maintain secrecy, and secret and non-repudiation are provided.

At present, elliptic curve cipher (ECC) has become one of public key cryptography that cry is the highest except that rsa cryptosystem, and it can provide with the same function of rsa cryptosystem system.Its security is based upon on the difficulty of elliptic curve discrete logarithm problem (ECDLP).Generally believe that 160 elliptic curve ciphers can provide the safe coefficient that is equivalent to 1024 rsa cryptosystems.Because key is short, so encryption/decryption speed is very fast in actual applications, and can save power consumption, bandwidth and storage space.Its main operational of ECC algorithm is that mould is taken advantage of and inverted, mould take advantage of and the property relationship of inverting to the performance of whole ECC chip.In addition, the temporal concurrency of ECC algorithm is apparent in view.A high-performance and multiple functional mould are taken advantage of the device of inverting, and fully excavate the temporal concurrency of algorithm, then can obtain higher enciphering rate only with less hardware costs, adapt to the needs of current highspeed network applications.Because hardware consumption is little, low in energy consumption, it also can adapt to the application need of portable mobile apparatus in addition. along with development of computer, arithmetic speed is also in improve at full speed.We are necessary to increase data length to add the security of strong cipher algorithms.The design has extensibility on hardware configuration, easily the growth data width.

Summary of the invention

The objective of the invention is to propose a kind of two territories mould at elliptic curve cipher ECC algorithm of highspeed network applications and portable mobile apparatus application need that is applicable to and take advantage of mould to remove device, have extensibility, reduce hardware cost simultaneously significantly.

Technical scheme of the present invention is: a kind of mould in prime field and polynomial expression territory that is applicable to takes advantage of mould to remove device (as shown in Figure 1), is made up of 17 MUX (1～17), 4 arithmetic element PE (26～29), input register (23), 5 register files (18～22), Booth coding unit (24) and control module (25); Wherein:

A. select 1 attitude, 13 MUX (3～7,10～17) when selecting 0 attitude in second MUX (2) and the 8th MUX (8), form mould by first MUX (1), the 9th MUX (9), 5 register files (18～22), input register (23), Booth coding unit (24), control module (25) and 4 PE arithmetic elements (26～29) and remove the device (see figure 5), the mould division operation adopts the Euclidean algorithm; Wherein:

First MUX (1) is input as the output of arithmetic element PE0 (26), selects output by the stage signal, exports first register file (18) or second register file (19) to;

The 9th MUX (9) is input as the output of arithmetic element PE2 (28), selects output by the stage signal, exports the 3rd register file (20) or the 4th register file (21) to;

The output of first register file (18) and second register file (19) is all to input register (23);

The output of the 3rd register file (20) and the 4th register file (21) is all to PE3 arithmetic element (29);

The 5th register file (22) output to Booth scrambler (24);

Input register (23) output to PE0 arithmetic element (26);

Booth coding unit (24) outputs to PE2 arithmetic element (28);

Control module (25) outputs to PE0 and PE1 arithmetic element (26,27);

The output of PE0 arithmetic element (26) is written in first register file (18) or second register file (19) according to the selection of stage signal;

PE1 arithmetic element (27) output to PE2 arithmetic element (28);

The output of PE2 arithmetic element (28) is written in the 3rd register file (20) or the 4th register file (21) according to the selection of stage signal;

PE3 arithmetic element (29) output to PE1 arithmetic element (27);

B. do not work in first MUX (1) and the 9th MUX (9), 13 MUX (3～7,10～17) are when selecting 1 attitude, form mould by second MUX (2), the 8th MUX (8), 5 register files (18～22), input register (23), Booth coding unit (24), control module (25) and 4 PE arithmetic elements (26～29) and take advantage of the device (see figure 6), modular multiplication adopts mould Montgomery algorithm; Wherein:

Second MUX (2) is input as the data of first register file (18) and the 4th register file (21) stored, as selecting signal, selects the correct data of output with First_round in input register (23);

The 8th MUX (8) is input as the data of second register file (19) and the 5th register file (22) stored, as selecting signal, selects the correct data of output with First_round in input register (23);

The 3rd register file (20) outputs in the Booth scrambler (24);

Two of input register (23) output to PE0 arithmetic element (26);

The output of Booth coding unit (24) is respectively 4 PE arithmetic elements (26～29) input is provided;

PE0 arithmetic element (26) output to PE1 arithmetic element (27);

PE1 arithmetic element (27) output to PE2 arithmetic element (28);

PE2 arithmetic element (28) output to PE3 arithmetic element (29);

The output branch of PE3 arithmetic element (29) is clipped to the 5th register file (22) and the 4th register file (21).

Above-mentioned arithmetic element PE (as shown in Figure 2) is made up of 6 MUX (30～35), 2 registers (36,37), 2 anti-phase controllers (38,39), 3 carry save adders (40,41,42), PE internal control module (43) and PE internal displacement device (44); Wherein:

The 18 MUX (30) be input as modular multiplication the time add up when adding up the multiple mul_h of multiplicand and the mould division operation multiple div_h of D register value, by operating function Func_sel as selection, select the multiple of correct W register value and select signal, output to the selection control end of the 19 MUX (31);

The 19 MUX (31) be input as 0, the value of W register and the value of 2*W register, as selecting signal, select the anti-phase controller of correct operand value to the first (38) by the output of the 18 MUX (31);

The 20 MUX (32) be input as word arithmetic the time mould P multiple double1 and double2,, export correct multiple and select signal to the 23 MUX (35) as selection by operating function Func_sel;

The 21 MUX (33) be input as word arithmetic the time mould P multiple zero1 and zero2,, export correct multiple and select signal to the 23 MUX (35) as selection by operating function Func_sel;

The 22 MUX (34) be input as word arithmetic the time mould P multiple neg1 and neg2,, export correct multiple and select the anti-phase controller of signal to the second (39) as selection by operating function Func_sel;

The 23 MUX (35) be input as 0, the value of mould P and the value of 2* mould P, by the 20, the output of the 21 MUX (32,33) is as selecting signal, selects the anti-phase controller of correct operand value to the second (39);

The value of a word among the mould P of the 7th register (36) when being used for depositing computing;

The 8th register (37) value of a word among the W when being used for depositing computing;

The output valve that is input as the 19 MUX (31) of the first anti-phase controller (38), control signal is the output valve of the 18 MUX (30), is output as input through the numerical value after the negate;

The output valve that is input as the 23 MUX (35) of the second anti-phase controller (39), control signal is the output valve of the 22 MUX (34), is output as input through the numerical value after the negate;

The result and the carry (Uc that are input as the current word of U register of preserving an operand of first carry save adder (40), Us), and signal Field is selected in the territory in prime field or polynomial expression territory, first word First_word, with the output of the first anti-phase controller (38), be output as value Us_1 and carry Uc_1 by carry save adder;

The output Uc_1 that is input as first carry save adder (40) of second carry save adder (41), Us_1, signal Field is selected in the territory in prime field or polynomial expression territory, first word First_word, with the output of the second anti-phase controller (39), be output as value Us_2 and carry Uc_2 by carry save adder;

Input Uc_3, the Us_3 and the carry that are input as PE internal displacement device (44) of the 3rd carry save adder (42) are output as value Us_out and carry Uc_out through carry save adder;

The output Us_1, the Uc_1's that are input as first carry save adder (40) of PE internal control module (43) is low two, and low two of mould P, the value of the figure place shift that moves to right, is output as and determines that the back adds the control signal of the multiple of mould P;

The output valve Us_2 that is input as second carry save adder (41) of PE internal displacement device (44), the value of Uc_2, and the value of the figure place shift that moves to right are output as the value through Us_3, Uc_3 after moving to right and the carry that moves to right out.

The two territories mould at ECC (elliptic curve cipher) algorithm that is applicable to highspeed network applications and portable mobile apparatus application need that the present invention proposes takes advantage of mould to remove device, the highest mould division operation of carrying out 480-bit, adopt SMIC 0.18 μ m CMOS process synthesis, the critical path time delay was 4.71 nanoseconds, highest frequency reaches 212.3MHz, finishes the 256-bit mould and removes time spent 13us, and area is the 37k equivalent gate, 1.4us when the 256-bit mould is riding, area are the 11.4k equivalent gate.

The ECC algorithm is a kind of algorithm of PKI, ECDH, and main computing concentrates on the computing of dot product among the ECDSA, and dot product is added by doubly point and point and forms, and mainly is that mould under the Galois field is taken advantage of with mould and removed algorithm and doubly put and put the computing that adds.Finish the mould of Galois field with hardware and take advantage of mould to remove algorithm, the upper strata goes to realize that with software different public key algorithms is the preferably compromise of a kind of performance and dirigibility.Being commonly used to handle the algorithm that the Galois field mould removes now mainly is to use the Euclidean algorithm, and specific algorithm realizes seeing Fig. 3; And the algorithm that processing Galois field mould is taken advantage of mainly is to use the Montgomery modular multiplication algorithm, and specific algorithm realizes seeing Fig. 4.

The invention has the advantages that can handle in the ECC computing mould under the prime field and binary field simultaneously takes advantage of and the mould division operation, main arithmetic unit is made up of the PE arithmetic element of same structure, and can accelerate the speed of algorithms of different accordingly by reconfiguring the annexation between the arithmetic element; Simultaneously also can accomplish the multiplexing of hardware as much as possible, when reaching performance requirement, reduce hardware area.

The present invention be advantageous in that and proposed the arithmetic element PE unit that mould takes advantage of mould to remove in a kind of special disposal ECC computing, by analyzing prime field and binary field calculating process down, extracted the fundamental operation in the computing: U=(U+mul_h*W+kP)＞＞shift operates.Each PE arithmetic element can both be finished this fundamental operation, is carrying out mould when taking advantage of the mould division operation, and the operation that can be undertaken by each PE unit of reasonable distribution is quickened mould and taken advantage of the mould division operation.

Three of advantage of the present invention is that this mould takes advantage of mould to remove the main arithmetic unit of device and be made up of the identical PE arithmetic element of a plurality of structures, structure is (the illustrating with four PE arithmetic elements among Fig. 1) that can dispose, can be according to the requirement of hardware or power, corresponding increasing or minimizing PE arithmetic element is to reach new requirement.

Description of drawings

Fig. 1 is that the two territories of the present invention mould takes advantage of mould to remove device top level structure figure;

Fig. 2 is that the two territories of the present invention mould takes advantage of mould to remove the PE arithmetic element structural drawing of device;

Fig. 3 is that the Euclidean mould after the present invention improves removes algorithm;

Fig. 4 is the Montgomery modular multiplication algorithm after the present invention improves;

Fig. 5 is the equivalent hardware structure diagram of the two territories of the present invention mould when taking advantage of mould to remove device to make mould and remove;

Fig. 6 is that the two territories of the present invention mould takes advantage of mould to remove the equivalent hardware structure diagram that device is taken the opportunity as mould;

Fig. 7 a is the algorithmic descriptions figure of the present invention's data path when carrying out the mould division operation;

Fig. 7 b is the equivalent key diagram of Fig. 5;

Fig. 8 a is the algorithmic descriptions figure of the present invention's data path when carrying out modular multiplication;

Fig. 8 b is the equivalent key diagram of Fig. 6.

Number in the figure: 1 is first MUX, 2 is second MUX, 3 is the 3rd MUX, 4 is the 4th MUX, 5 is the 5th MUX, 6 is the 6th MUX, 7 is the 7th MUX, 8 is the 8th MUX, 9 is the 9th MUX, 10 is the tenth MUX, and 11 is the 11 MUX, and 12 is the 12 MUX, 13 is the 13 MUX, 14 is the 14 MUX, and 15 is the 15 MUX, and 16 is the 16 MUX, 17 is the 17 MUX, 18 is first register file, and 19 is second register file, and 20 is the 3rd register file, 21 is the 4th register file, 22 is the 5th register file, and 23 is input register, and 24 is the Booth coding unit, 25 is control module, 26 is the PE0 arithmetic element, and 27 is the PE1 arithmetic element, and 28 is the PE2 arithmetic element, 29 is the PE3 arithmetic element, 30 is the 18 MUX, and 31 is the 19 MUX, and 32 is the 20 MUX, 33 is the 21 MUX, 34 is the 22 MUX, and 35 is the 23 MUX, and 36 is the 7th register, 37 is the 8th register, 38 is the first anti-phase controller, and 39 is the second anti-phase controller, and 40 is first carry save adder, 41 is second carry save adder, 42 is the 3rd carry save adder, and 43 is the PE internal control module, and 44 is PE internal displacement device.

Embodiment

Further specify the present invention below in conjunction with accompanying drawing.

Fig. 1 is that the two territories of the present invention mould takes advantage of mould to remove device top level structure figure, main arithmetic unit is PE0 (26), PE1 (27), PE2 (28) and (29) four arithmetic elements of PE3, four identical (see figure 2)s of arithmetic element structure, and can quicken to finish the fundamental operation that mould is taken advantage of the mould division operation.This mould take advantage of mould to remove that device can be finished under the prime field and binary field under mould take advantage of mould to remove operation, carrying out mould when taking advantage of (mould remove) computing, data such as used multiplicand (dividend), multiplier (divisor), multiplication result (result of division) and mould all are stored in five register files (18～22).The Booth coding unit is used to provide the value of used mul_h in the modular multiplication, and the value of Control unit used div_h when being used to provide the mould division operation.And 17 MUX (1～17) be used for changing four between the PE arithmetic element connection and the position of reading of data remove or modular multiplication to finish mould, whole ECC mould takes advantage of mould to remove device can finish different algorithms by reconfiguring the connecting path between the unit.

1. mould removes state

Mould removes state by the first, the 9th MUX (1,9), five register file Regfile (18～22), input register (23), Booth coding unit (24), control module (25), four PE arithmetic elements (26～29) are formed, data path as shown in Figure 5, the mould of finishing removes algorithm shown in accompanying drawing 7a, and the top-level specification figure when mould removes is shown in accompanying drawing 7b, wherein:

First MUX (1) is input as result calculated among the PE0, is selected by the stage signal, and the result who imports is written among Regfile1 or the Regfile2.

The 9th MUX (9) is input as result calculated among the PE2, is selected by the stage signal, and the result who imports is written among Regfile3 or the Regfile4.

First register file (18) is 0 o'clock at the stage signal, storage be with result of calculation and carry (Cc in the initialized C register of divisor, Cs), the stage signal is 1 o'clock, storage be with the result of calculation in the initialized D register of mould P and carry (Dc, Ds); Be input as the result of calculation of PE0, output in the input register (23).

Second register file (19) is 0 o'clock at the stage signal, storage be with result of calculation and carry (Dc in the initialized D register of mould P, Ds), the stage signal is 1 o'clock, storage be with the result of calculation in the initialized C register of divisor and carry (Cc, Cs); Be input as the result of calculation of PE0, output in the input register (23).

The 3rd register file (20) is 0 o'clock at the stage signal, storage be with result of calculation and carry (Uc in the initialized U register of dividend, Us), the stage signal is 1 o'clock, storage be with the result of calculation in the 0 initialized W register and carry (Wc, Ws); Be input as the result of calculation of PE2, output in the PE3 arithmetic element.

The 4th register file (21) is 0 o'clock at the stage signal, storage be with result of calculation in the 0 initialized W register and carry (Wc, Ws), the stage signal is 1 o'clock, storage be with the result of calculation in the initialized U register of dividend and carry (Uc, Us); Be input as the result of calculation of PE2, output in the PE3 arithmetic element.

The 5th register file (22) is 0 or 1 o'clock at the stage signal, all store be mould (0, value P); Output in the Booth scrambler (24).

Input register (23) is input as the first, the second output result who deposits heap (18,19); Output in the PE0 arithmetic element.

Booth coding unit (24) when Func sel is modular multiplication, is encoded to multiplier as the Booth scrambler; And when the mould division operation, only use as register; Be input as the result of the 5th register file (22); Output in the PE2 arithmetic element.

Control module (25) is calculated required parameter d ivh for PE0, PE1 arithmetic element provide, and outputs to PE0 and PE1 arithmetic element.

PE0 arithmetic element (26), finish C=(C+div_h*D)＞＞the shift computing, input is from output (C, value D), and the output divh of control module (25) of input register (23); Be output as the carry save adder (CSA) result calculated (Cc, Cs); Selection according to the stage signal is written in first register file (18) or second register file (19).

PE1 arithmetic element (27) is finished the U=U+div_h*W computing, be input as (U, value W) is from PE3 arithmetic element (29), and the output div_h of control module (25); (Uc Us), outputs to PE2 arithmetic element (28) to be output as the carry save adder (CSA) result calculated.

PE2 arithmetic element (28), finish U=(U+kP)＞＞the shift computing, be input as the output of PE1 arithmetic element (27), and the input mould P of Booth coding unit; Be output as the carry save adder (CSA) result calculated (Cc, Cs); Selection according to the stage signal is written in the the 3rd (20) or the 4th register file (21).

PE3 arithmetic element (29) is only finished the 3rd register file (20) read or the data in the 4th register file (21) is stored; Output to PE1 arithmetic element (27).

When carrying out the mould division operation, the annexation of four PE unit through after the selection of MUX as shown in Figure 5, PE0 arithmetic element (26) finish C=(C+div_h*D)＞＞shift operation, and PE1, PE2 cooperate, finish calculating to register U, wherein PE1 finishes the operation of U=(U+divh*W), PE2 arithmetic element (28) finish U=(U+kp)＞＞operation of shift, PE3 uses as data-carrier store.

When carrying out the mould division operation, the following (see figure 3) of basic thought:, ask Z=X/Ymod P at known X, Y, P.Use two equatioies: CX ≡ UY mod P and DX ≡ WY mod P.C=Y, U=X, D=P, W=0 during initialization are then with expansion Euclidean algorithm, with gcd (C, D) turn to gcd (0,1) or gcd (0 ,-1), W and U do the linear transformation under the corresponding prime field in this process, finally obtain W=X/Y mod P or W=-X/Y mod P.

Mould removes the fundamental operation of using in the algorithm to be had: C=(C+div_h*D)＞＞shift; U=U+div_h*W; U=(U+kP)＞＞shift, have similar structure, so design a basic arithmetic element, can finish X=(X+kY)＞＞computing of shift, so just can finish the computing requirement that mould removes.Consider and obtain maximum computing concurrency, four PE arithmetic elements (26～29) are carried out corresponding computing, and control module (25) provides computing required parameter d iv_h for arithmetic element, the required data of five register files (18～22) storage computing, owing to consider extensibility and dirigibility, so so the computing of word level is all resolved in the modular arithmetic of many bit in the computing. obtained the hardware cell at the mould division operation, seen accompanying drawing 5, the top layer sketch is shown in accompanying drawing 7b.

The PE arithmetic element of design has similar computing structure, can finish Z=(Z+mX+nY)＞＞computing of shift, as shown in Figure 1.The PE arithmetic element has also promptly satisfied the computing requirement in the division like this.

PE0 arithmetic element (26) finish C=(C+div_h*D)＞＞shift operation, operand C, D, div_h and shift are provided by external control module and register file.Each clock finish the C=(C+div_h*D) of a word＞＞operation of shift, in order to guarantee to be operated under the higher clock frequency, so carry save adder CSA is adopted in computing, the result comprises two parts like this, the result of computing and carry result, respectively be 32bit, this two parts value is temporarily stored in the register file again.

PE1 arithmetic element (27) is finished the operation of U=(U+div_h*W), and operand U, W and div_h provide from external control module and other arithmetic element.Each clock finishes the operation of the U=(U+div_h*W) of a word. and operation result comprises the result of computing and carry, and (Uc Us), writes results among the next arithmetic element PE2, finishes follow-up operation, forms two little level production lines.

PE2 arithmetic element (28) finish U=(U+kp)＞＞operation of shift, operand U, P, k and shift are from other arithmetic element or external register heap.Each clock finish the U=(U+kp) of a word＞＞operation of shift, operation result comprise the result of computing and carry (Uc, Us), that these two parts result is temporary again in the external register heap.

PE3 arithmetic element (29) is at this moment finished data storage function, and no longer participates in computing, like this can be consistent with algorithm, and need not add the operational data that many hardware configurations are stored PE1 arithmetic element (26) again.

At this moment needed arithmetic operation number has C, D, U, W, P, div_h and shift, wherein C, D, U and W need the value of storing initial on the one hand, also to handle the exchange of C and D, U and W data on the other hand, so adopted five register files in realizing, in order not add too many area, so adopt the stage signal to distinguish, in division, carry out multiplexing to register file.

First register file (18) has the register of 16 64-bit, is 0 o'clock at the stage signal, storage be (Cc, data Cs), and when the stage signal is 1, storage be (Dc, data Ds).

Second register file (19) has the register of 16 64-bit, is 0 o'clock at the stage signal, storage be (Dc, data Ds), and when the stage signal is 0, storage be (Cc, data Cs).

The 3rd register file (20) has the register of 16 64-bit, is 0 o'clock at the stage signal, storage be (Uc, data Us), and when the stage signal is 0, storage be (Wc, data Ws).

The 4th register file (21) has the register of 16 64-bit, is 0 o'clock at the stage signal, storage be (Wc, data Ws), and when the stage signal is 0, storage be (Uc, data Us).

The 5th register file (22) has the register of 16 64-bit, is 0 or 1 o'clock at the stage signal, storage all be mould P data (0, P).

And Booth coding unit (24) only provides the data storage effect of mould P this moment, and the Booth coding unit that multiplexing like this mould is taken the opportunity need not add temporary storage location for mould P more specially again.

2. mould is taken advantage of state

Mould takes advantage of state to be made up of the second, the 8th MUX (2,8), five register files (18～22), input register (23), Booth coding unit (24), four PE arithmetic elements (26～29), data path as shown in Figure 6, the modular multiplication algorithm of specific implementation is shown in accompanying drawing 8a, and the top-level specification sketch that mould is taken the opportunity is shown in accompanying drawing 8b. wherein:

Second MUX (2) is input as the data of first register file (18) and the 4th register file (21) stored, and the signal First round that whether is computing for the first time selects correct data in input register (23) as selecting signal.

The 8th MUX (8) is input as the data of second register file (19) and the 5th register file (22) stored, and the signal First_round that whether is computing for the first time selects correct data in input register (23) as selecting signal.

First register file (18), storage be that (W P), is stored in the register file by word for the value of multiplicand W and mould P; Output in second MUX (2).

Second register file (19), storage be that (Uc Us), also is to be stored in the register file by word for the value of the U in the computing and carry; Output in the 8th MUX (8).

The 3rd register file (20), storage be the multiplier in the computing value (0, C), also be to be stored in the register file by word; Output in the Booth scrambler (24).

The 4th register file (21), storage be that (W, value P) they are fifo in mould the 4th register file (21) effect of taking the opportunity, and the data of storage also are to be stored in the register file by word for the multiplicand stored in the computing and mould; Output in second MUX (2).

The 5th register file (22), storage be that (Uc Us), also is a fifo in the take the opportunity effect of the 5th register file (22) of mould, and the data of storage are stored in the register file by word for the value of the U that stores in the computing and carry; Output in the 8th MUX (8).

Input register (23) has the data input of two 64bit, be respectively multiplicand and mould P (W, P) and the carry in calculating and result (Uc, Us); Output to PE0 arithmetic element (26).

Booth coding unit (24) is input as the data among the multiplier C; Be respectively four PE arithmetic elements multiple mul_h0, mul_h1, mul_h2 and the mul_h3 that multiply by W is provided.

The PE0 arithmetic element is input as the data of two 64bit, be respectively multiplicand and mould P (W, P) and the carry in calculating and result (Uc, Us); Finish computing U=(U+mul_h*W+kP)＞＞shift; Be output as calculate back result and carry (Uc, Us) and multiplicand and mould P (W, value P) output in the PE1 arithmetic element.

The PE1 arithmetic element is input as the data of two 64bit, be respectively multiplicand and mould P (W, P) and the carry in calculating among the PE0 and result (Uc, Us); Finish computing U=(U+mul_h*W+kP)＞＞shift; Be output as calculate back result and carry (Uc, Us) and multiplicand and mould P (W, value P) output in the PE2 arithmetic element.

The PE2 arithmetic element is input as the data of two 64bit, be respectively multiplicand and mould P (W, P) and the carry in calculating among the PE1 and result (Uc, Us); Finish computing U=(U+mul_h*W+kP)＞＞shift; Be output as calculate back result and carry (Uc, Us) and multiplicand and mould P (W, value P) output in the PE3 arithmetic element.

The PE3 arithmetic element is input as the data of two 64bit, be respectively multiplicand and mould P (W, P) and the carry in calculating among the PE2 and result (Uc, Us); Finish computing U=(U+mul_h*W+kP)＞＞shift; Be output as calculate back result and carry (Uc, Us) and multiplicand and mould P (the former outputs to (22) in the 5th register file for W, value P), and the latter outputs in the 4th register file (21).

When carrying out modular multiplication, the annexation of four PE unit through after the selection of MUX shown in accompanying drawing 8b, four PE unit form a pipeline organization, all carry out U=(U+mul_h*W+kP)＞＞operation of shift.

For modular multiplication, under prime field, adopt montgomery modulo multiplication through the Booth coding of base-4, the fundamental operation of using in the algorithm have U=(U+mul_h*W+kP)＞＞shift.In order to obtain arithmetic speed faster, thus allow four PE arithmetic elements (26～29) all carry out U=(U+mul_h*W+kP)＞＞the shift operation, form a level Four streamline.

This required parameter value of accumulating operation of value setting according to Field (prime field still is the polynomial expression territory) and C.The multiple of multiplicand when wherein mul_h represents to add up.The figure place that U will move to right after parameter s hift represented this time to add up.The deciding means of parameter k is with algorithm 1, and purpose also is that to make the minimum shift position that is worth behind the U+kP be 0.The value of parameter h is the figure place of multiplier.In order to make operation result under the prime field in GF (P), algorithm is adjusted the result.And the computing meeting under the polynomial expression territory is skipped (see figure 4) with this step.

As seen, this algorithm can be supported two territories modular multiplication down, and the modular multiplication under the prime field is because of having adopted booth coding, the number of times of accumulating operation can be reduced to original half.Need to prove: in order to realize expanding, and mould is taken advantage of can the common hardware unit with the mould division operation, the plus-minus method of the long operand in algorithm 1 and the algorithm 2 and shift operation all are that unit carries out with the word.If operand length is n, word length is w,

Then operand with the word be unit vector representation for 0, W ^(e-1)..., W ⁽¹⁾, W ⁽⁰⁾.The purpose that increases all-zero word on the high position is that the result is overflowed when preventing additive operation, and in addition, it can also be as the symbol word, the sign bit expansion after making things convenient for addition results to move to right.

PE0 arithmetic element (26) finish U=(U+mul_h0*W+kP)＞＞shift operation, operand U, W, P, mul_h0 and shift are provided by external control module and register file.Each clock finish the U=(U+mul_h*W+kP) of a word＞＞operation of shift, in order to guarantee to be operated under the higher clock frequency, so carry save adder CSA is adopted in computing, the result comprises two parts like this, the result of computing and carry result, respectively be 32bit, this two parts value is then written in the next arithmetic element, carry out stream line operation.

PE1 arithmetic element (27) finish U=(U+mul_h1*W+kP)＞＞shift operation, operand U, W, P, mul_h1 and shift are provided by an external control module and a last PE arithmetic element.Each clock finish the U=(U+mul_h*W+kP) of a word＞＞operation of shift, in order to guarantee to be operated under the higher clock frequency, so carry save adder CSA is adopted in computing, the result comprises two parts like this, the result of computing and carry result, respectively be 32bit, this two parts value is then written in the next arithmetic element, carry out stream line operation.

PE2 arithmetic element (28) finish U=(U+mul_h2*W+kP)＞＞shift operation, operand U, W, P, mul_h2 and shift are provided by an external control module and a last PE arithmetic element.Each clock finish the U=(U+mul_h*W+kP) of a word＞＞operation of shift, in order to guarantee to be operated under the higher clock frequency, so carry save adder CSA is adopted in computing, the result comprises two parts like this, the result of computing and carry result, respectively be 32bit, this two parts value is then written in the next arithmetic element, carry out stream line operation.

PE3 arithmetic element (29) finish U=(U+mul_h3*W+kP)＞＞shift operation, operand U, W, P, mul_h3 and shift are provided by an external control module and a last PE arithmetic element.Each clock finish the U=(U+mul_h*W+kP) of a word＞＞operation of shift, in order to guarantee to be operated under the higher clock frequency, so carry save adder CSA is adopted in computing, the result comprises two parts like this, the result of computing and carry result, respectively be 32bit, this two parts value is then written in the register file cell, carry out stream line operation.

Here need to provide parameters such as U, W, P, mul_h, so parameters such as design U, W, P are deposited in the first, second, third, fourth and the 5th register file (18～22), and four used four parameters of mul_h0, mul_h1, mul_h2, mul_h3 of PE arithmetic element are provided by Booth coding unit (24).In the clock, the Booth coding unit once provides four mul_h values that arithmetic element is used, and such four PE arithmetic elements (26～29) are formed a level Four streamline and carried out parallel computation.Wherein:

(W P), outputs to PE0 arithmetic element (26), carries out computing for the multiplicand of first register file (18) storing initial and the value of mould P.

(Uc Us), outputs to PE0 arithmetic element (27), carries out computing for carry that second register file (19) storage arithmetic element is calculated and result.

The value of the 3rd register file (20) storage multiplier C (0, C), output in the Booth coding unit (24), be that four PE arithmetic elements (26～29) provide four parameter m ul_h that computing is required simultaneously.

The 4th register file (21) with the form of fifo store current computing (W, value P) output to PE0 arithmetic element (26) and carry out computing.

The 5th register file (22) with the form of fifo store current computing (Uc, value Us) output to PE0 arithmetic element (26) and carry out computing.

Claims

1. a mould that is applicable to prime field and polynomial expression territory takes advantage of mould to remove device, it is characterized in that: it is made up of 17 MUX (1～17), 4 arithmetic element PE (26～29), input register (23), 5 register files (18～22), Booth coding unit (24) and control module (25); Wherein:

A. select 1 attitude, 13 MUX (3～7,10～17) when selecting 0 attitude in second MUX (2) and the 8th MUX (8), form mould by first MUX (1), the 9th MUX (9), 5 register files (18～22), input register (23), Booth coding unit (24), control module (25) and 4 PE arithmetic elements (26～29) and remove device, the mould division operation adopts the Euclidean algorithm;

B. do not work in first MUX (1) and the 9th MUX (9), 13 MUX (3～7,10～17) are when selecting 1 attitude, form mould by second MUX (2), the 8th MUX (8), 5 register files (18～22), input register (23), Booth coding unit (24), control module (25) and 4 PE arithmetic elements (26～29) and take advantage of device, modular multiplication adopts mould Montgomery algorithm.

2. take advantage of mould to remove device by the described mould in prime field and polynomial expression territory that is applicable to of claim 1, it is characterized in that: among the described step a:

First MUX (1) is input as the output of arithmetic element PE0 (26); Select output by the stage signal, export first register file (18) or second register file (19) to;

The 9th MUX (9) is input as the output of arithmetic element PE2 (28); Select output by the stage signal, export the 3rd register file (20) or the 4th register file (21) to;

The 5th register file (22) output to Booth scrambler (24);

Input register (23) output to PE0 arithmetic element (26);

Booth coding unit (24) outputs to PE2 arithmetic element (28);

Control module (25) outputs to PE0 and PE1 arithmetic element (26,27);

The output of PE0 arithmetic element (26) is written to first register file (18) or second register file (19) according to the selection of stage signal;

PE1 arithmetic element (27) output to PE2 arithmetic element (28);

The output of PE2 arithmetic element (28) is written to the 3rd register file (20) or the 4th register file (21) according to the selection of stage signal;

PE3 arithmetic element (29) output to PE1 arithmetic element (27).

3. take advantage of mould to remove device by the described mould in prime field and polynomial expression territory that is applicable to of claim 1, it is characterized in that: among the described step b:

Second MUX (2) is input as the data of first register file (18) and the 4th register file (21) stored; As selecting signal, select the correct data of output with First_round in input register (23);

The 8th MUX (8) is input as the data of second register file (19) and the 5th register file (22) stored; As selecting signal, select the correct data of output with First_round in input register (23);

The 3rd register file (20) output to Booth scrambler (24);

Two of input register (23) output to PE0 arithmetic element (26);

PE0 arithmetic element (26) output to PE1 arithmetic element (27);

PE1 arithmetic element (27) output to PE2 arithmetic element (28);

PE2 arithmetic element (28) output to PE3 arithmetic element (29);

4. mould according to claim 1 takes advantage of mould to remove device, it is characterized in that: described arithmetic element PE is made up of 6 MUX (30～35), 2 registers (36,37), 2 anti-phase controllers (38,39), 3 carry save adders (40,41,42), PE internal control module (43) and PE internal displacement device (44); Wherein: