CN104252332A

CN104252332A - Multiplier and multiplier processing element for ellipse cipher apparatus

Info

Publication number: CN104252332A
Application number: CN201410414896.8A
Authority: CN
Inventors: 潘正祥; 杨春生; 李秋莹; 闫立军; 蔡正富
Original assignee: Airmate Electrical Shenzhen Co Ltd; Shenzhen Graduate School Harbin Institute of Technology
Current assignee: Airmate Electrical Shenzhen Co Ltd; Shenzhen Graduate School Harbin Institute of Technology
Priority date: 2014-08-20
Filing date: 2014-08-20
Publication date: 2014-12-31
Anticipated expiration: 2034-08-20
Also published as: CN104252332B

Abstract

The invention relates to a multiplier processing element PE for an ellipse cipher apparatus. The multiplier processing element comprises a computing element, an input end<Bin>, an input end<Cin>, an input end<Xin>, an output end<Bout>, an output end<Cout> and an output end<Xout>. Parameters are respectively inputted into the computing element by the input end<Bin>, the input end<Cin> and the input end<Xin>, are computed and processed and then are outputted from the computing element via the output end<Bout>, the output end<Cout> and the output end<Xout>, parameters<Bin> and parameters<Xin> in the computing element are subjected to ring shift left by d bits, the d meets inequalities of B<out>=B<in><<d and X<out>=X<in><<d, parameters<Cin> and operation values of the parameters<Bin> and the parameters<Xin> in the computing element are subjected to ring shift right by d bits according to a formula of C<out>=C<in>>>d+L(<B<in>, X<in>), the C<in> represents results of a previous processing element PE, the C<in> of a first processing element PE is zero initially, the C<out> represents computation output product results of the processing element PE and is used for being inputted into a next processing element PE, the d represents the lengths of digits, a k represents the numbers of divided segments, and the L represents operation identification. The multiplier processing element has the advantages that shift processing and J function computation are carried out during computation, accordingly, the processing element is high in operation speed and low in computation complexity, and the performance of the cipher apparatus can be improved.

Description

A kind of multiplier processing unit for elliptic curves cryptosystem device and multiplier

Technical field

The invention belongs to numerical coding field, particularly relate to multiplier processing unit and multiplier that a kind of Galois field is applicable to elliptic curves cryptosystem device.

Background technology

Recent years, effective, the high-performance of finite field operations and low complex design and application thereof have obtained a lot of concern.Such as, the algorithm of scrambler and system demand fulfillment American National Standard and Institute for Research and Technology (National Institute of Standards and Technology, and IEEE-USA (Institute of Electrical and Electronics Engineers NIST), IEEE) safety requirements proposed, to reduce potential attack, guarantee hardware security.Simultaneously an importance of scrambler resists side channel analysis (side-channel attacks) reducing costs.In industry member, error-detecting research field is much paid attention to, such as document [3-6], also can find out from the attack cryptographic system based on error analysis and side channel analysis.In actual applications, original design often needs to increase overhead, and therefore they need effective design, can tolerate and bear this overhead.Recently, elliptic curves cryptosystem device, as the effective technology of one, meets public key cryptography requirement, is implemented in a lot of high-performance and safe limit application aspect.Such as, this algorithm can make full use of mobile wireless ad hoc networks (Mobile Ad hoc NETworks, MANETs), effectively provides confidence level and integrity checking.This inspection does not need to consider whether physical layer safety has danger. elliptic curve cipher device is a kind of method based on elliptic curve Algebraic Structure in Galois field, and the arithmetic operation of the method determines the validity of the cryptographic system based on elliptic curve cipher device.Therefore, effective, low complex degree that many research work have focused on arithmetic element are become reconciled performance design, and these unit are used in the cryptographic system of elliptic curve cipher device and public key encryption algorithm (RSA).Nearest Gauss's normal basis multiplier (Gaussian normal basis, GNB) has been widely applied to and has calculated point multiplication (also can be referred to as scalar multiplication) in elliptic curve cipher device.It should be noted that this computing not only needs effective performance, and in temporal constraint application, its realization must be high-performance.

In two large-scale bit fields, territory multiplication can pass through systolic arrays method, and design obtains at a high speed and the VLSI (very large scale integrated circuit) of rule realizes.Systolic arrays can not run into irregular circuit design.In other words, for the different choice in two bit fields, their hardware configuration is modular closely similar.The features such as the balance of its simultaneity, input and output and simple well-regulated design, make it to be applicable to performance application.Although in the application of needs high-speed structures, pulsation framework is widely used, and normally can be accepted as prerequisite with its area complexity.Such as, document [16] proposes a kind of optimization base ripple multiplier, and this multiplier has very strong systematicness, can realize by data serial mode.This ripple multiplier obtains high-performance at document [17] and realizes on configurable hardware.

[3]A.Yazdani，H.Sepahvand，M.Crow，and?M.Ferdowsi，“Fault?Detection?and?Mitigation?in?Multilevel?Converter?STATCOMs，”IEEE?Trans.Ind.Electron.，vol.58，no.4，pp.1307-1315，2011.

[4]M.A.Rodr A.Claudio-Sanchez，D.Theilliol，L.Vela-Valdes，P.Sibaja-Teran，L.Hernandez-Gonzalez，and?J.Aguayo-Alquicira，“A?Failure-Detection?Strategy?for?IGBT?Based?on?Gate-Voltage?Behavior?Applied?to?a?Motor?Drive?System，”IEEE?Trans.Ind.Electron.，vol.58，no.5，pp.1625-1633，2011.

[5]T.A.Najafabadi，F.R.Salmasi，and?P.Jabehdar-Maralani，“Detection?and?Isolation?of?Speed-，DC-Link?Voltage-，and?Current-Sensor?Faults?Based?on?an?Adaptive?Observer?in?Induction-Motor?Drives，”IEEE?Trans.Ind.Electron.，vol.58，no.5，pp.1662-1672，2011.

[6]S.Cruz，M.Ferreira，A.Mendes，and?A.J.M.Cardoso，“Analysis?and?Diagnosis?of?Open-Circuit?Faults?in?Matrix?Converters，”IEEE?Trans.Ind.Electron.，vol.58，no.5，pp.1648-1661，2011.

[16]S.Kwon，“A?Low?Complexity?and?a?Low?Latency?Bit?Parallel?Systolic?Multiplier?over?GF(2m)Using?an?Optimal?Normal?Basis?of?Type?II，”in?Proc.IEEE?Symp.Computer?Arithmetic(Arith-16)，pp.196-202，2003.

[17]J.Fan，D.Bailey，L.Batina，T.Guneysu，C.Paar，and?I.Verbauwhede，“Breaking?Elliptic?Curves?Cryptosystems?using?Reconfigurable?Hardware，”in?Proc.of?20th?Intl?Conf.on?Field?Programmable?Logic?and?Applications(FPL2010)，2010，pp.133-138.

Summary of the invention

The invention provides a kind of multiplier processing unit for elliptic curves cryptosystem device, be intended to solve the problem that existing processing unit computing velocity is slow, operation time is long.

The present invention is achieved in that a kind of processing unit for elliptic curves cryptosystem device multiplier, and this multiplier processing unit PE comprises computing unit, input end B _in, input end C _in, input end X _in, output terminal B _out, output terminal C _outand output terminal X _out, described input end B _in, input end C _inand input end X _ininput computing unit respectively, from the described output terminal B of described computing unit after computing _out, output terminal C _outand output terminal X _outexport, B in described computing unit _in, X _incarry out ring shift left d position, its ring shift left d position is: B _out=B _in< < d, X _out=X _in< < d, B in computing unit _in, X _inoperation values and C _incarry out ring shift right d position to be added, its formula is: C _out=C _in> > d+L (B _in, X _in), wherein, C _inthe result of a upper processing unit PE, for the C of first processing unit PE _inbe initially zero, C _outbe that processing unit PE calculates and exports the result of product, as the input of next processing unit PE, d is expressed as numerical digit length, the hop count that k is expressed as point, and L is that computing identifies.

Another object of the present invention is to provide a kind of one dimension multiplier, this one dimension multiplier comprises a k multiplier processing unit PE and according to claim 1 summation circuit AC, described summation circuit AC is connected after described k processing unit PE connects, the input end of each PE is exported by the calculating of last PE to obtain, and input three parameters of first PE are B respectively ₀, B ₁..., B _n-1, 0,0 ..., 0, X ₀, X ₁..., X _n-1, wherein, X is shifted by A and obtains, and its output computing formula is:

C = C_{0} + C_{1}^{2^{kd}} + \cdot \cdot \cdot + C_{n - 1}^{2^{(n - 1) kd}} = {({({(C_{n - 1})}^{2^{kd}} + C_{n - 2})}^{2^{kd}} + \cdot \cdot \cdot)}^{2^{kd}} + C_{0} .

Further technical scheme of the present invention is: described summation circuit AC comprises adder unit, temporary storage location and shift unit, described shift unit output terminal connects described adder unit input end, described adder unit output terminal connects described temporary storage location input end, described temporary storage location output terminal connects described shift unit input end, and described summation circuit carries out displacement to the result that k PE processing unit once calculates and is added with the Output rusults next time of k PE processing unit.

Another object of the present invention is to provide a kind of two-dimentional multiplier, this two-dimentional multiplier comprises the one dimension multiplier described in k Claims 2 or 3, 2k-2 CS module, k-1 summation circuit AC1 and summation circuit AC2, k described one dimension multiplier is in parallel, first described one dimension multiplier outputs connects the shift unit of first described summation circuit AC1, k-1 described summation circuit AC1 connects, kth-1 described summation circuit AC1 connects with described summation circuit AC2, second output terminal to kth-1 described one dimension multiplier is connected with a described k-1 summation circuit respectively, second input B to the described one dimension multiplier of kth-1 holds, X end connects a described CS module respectively, the input end of first described one dimension multiplier directly inputs, its operational formula is:

C = C_{0} + C_{1}^{2^{k^{2} d}} + \cdot \cdot \cdot + C_{n - 1}^{2^{{(n - 1) k}^{2} d}} .

Further technical scheme of the present invention is: described summation circuit AC1 comprises shift unit and adder unit, described shift unit output terminal connects described adder unit input end, described summation circuit AC1 is shifted to input and is added with the described one dimension multiplier Output rusults that is connected and exports, described shift unit ring shift right kd position.

Further technical scheme of the present invention is: described summation circuit comprises shift unit, adder unit and temporary storage location, described shift unit output terminal connects described adder unit input end, described adder unit output terminal connects described temporary storage location, described temporary storage location output terminal connects the input end of described adder unit, and described shift unit is to input numerical value ring shift right k ²d position.

Further technical scheme of the present invention is: described CS module is used for carrying out ring shift right kd position to the numerical value of input.

Further technical scheme of the present invention is:

The invention has the beneficial effects as follows: by carrying out the calculating of shifting processing and J function when calculating, make processing unit fast operation, computation complexity is low, and the performance of scrambler is improved.The present invention is a kind of multiplier proposed based on systolic array architecture, is therefore easy to realize in VLSI system, has low delay, high performance nature.

Accompanying drawing explanation

Fig. 1 is the DL-PIPO GNB multiplier circuit of foundation of the present invention;

Fig. 2 is the structural drawing of the processing unit PE that the embodiment of the present invention provides;

Fig. 3 is the one dimension multiplier circuit that the embodiment of the present invention provides;

Fig. 4 is the two-dimentional multiplier circuit that the embodiment of the present invention provides.

Embodiment

Fig. 2 shows the processing unit for elliptic curves cryptosystem device multiplier provided by the invention, and this multiplier processing unit PE comprises computing unit, input end B _in, input end C _in, input end X _in, output terminal B _out, output terminal C _outand output terminal X _out, described input end B _in, input end C _inand input end X _ininput computing unit respectively, from the described output terminal B of described computing unit after computing _out, output terminal C _outand output terminal X _outexport, B in described computing unit _in, X _incarry out ring shift left d position, its ring shift left d position is: B _out=B _in< < d, X _out=X _in< < d, B in computing unit _in, X _inoperation values and C _incarry out ring shift right d position to be added, its formula is: C _out=C _in> > d+L (B _in, X _in), wherein, C _inthe result of a upper processing unit PE, for the C of first processing unit PE _inbe initially zero, C _outbe that processing unit PE calculates and exports the result of product, as the input of next processing unit PE, d is expressed as numerical digit length, the hop count that k is expressed as point, and L is that computing identifies.By carrying out the calculating of shifting processing and j function when calculating, make processing unit fast operation, computation complexity is low, and the performance of scrambler is improved.

Fig. 3 shows and another object of the present invention is to provide a kind of one dimension multiplier, this one dimension multiplier comprises a k multiplier processing unit PE and according to claim 1 summation circuit AC, described summation circuit AC is connected after described k processing unit PE connects, the input end of each PE is exported by the calculating of last PE to obtain, and input three parameters of first PE are B respectively ₀, B ₁..., B _n-1, 0,0 ..., 0, X ₀, X ₁..., X _n-1, wherein, X is shifted by A and obtains, and its output computing formula is:

C = C_{0} + C_{1}^{2^{kd}} + \cdot \cdot \cdot + C_{n - 1}^{2^{(n - 1) kd}} = {({({(C_{n - 1})}^{2^{kd}} + C_{n - 2})}^{2^{kd}} + \cdot \cdot \cdot)}^{2^{kd}} + C_{0} .

Described summation circuit AC comprises adder unit, temporary storage location and shift unit, described shift unit output terminal connects described adder unit input end, described adder unit output terminal connects described temporary storage location input end, described temporary storage location output terminal connects described shift unit input end, and described summation circuit carries out displacement to the result that k PE processing unit once calculates and is added with the Output rusults next time of k PE processing unit.

Fig. 4 shows and another object of the present invention is to provide a kind of two-dimentional multiplier, this two-dimentional multiplier comprises the one dimension multiplier described in k Claims 2 or 3, 2k-2 CS module, k-1 summation circuit AC1 and summation circuit AC2, k described one dimension multiplier is in parallel, first described one dimension multiplier outputs connects the shift unit of first described summation circuit AC1, k-1 described summation circuit AC1 connects, kth-1 described summation circuit AC1 connects with described summation circuit AC2, second output terminal to kth-1 described one dimension multiplier is connected with a described k-1 summation circuit respectively, second input B to the described one dimension multiplier of kth-1 holds, X end connects a described CS module respectively, the input end of first described one dimension multiplier directly inputs, its operational formula is:

C = C_{0} + C_{1}^{2^{k^{2} d}} + \cdot \cdot \cdot + C_{n - 1}^{2^{{(n - 1) k}^{2} d}} .

Described summation circuit AC1 comprises shift unit and adder unit, described shift unit output terminal connects described adder unit input end, described summation circuit AC1 is shifted to input and is added with the described one dimension multiplier Output rusults that is connected and exports, described shift unit ring shift right kd position.

Described summation circuit comprises shift unit, adder unit and temporary storage location, described shift unit output terminal connects described adder unit input end, described adder unit output terminal connects described temporary storage location, described temporary storage location output terminal connects the input end of described adder unit, and described shift unit is to input numerical value ring shift right k ²d position.

Described CS module is used for carrying out ring shift right kd position to the numerical value of input.

Use decomposition method to obtain two kinds of new numerical digit GNB multipliers below.

Get as GF (2 ^m) normal basis (Normal basis, NB), wherein β ∈ GF (2 ^m).β is GF (2 ^m) in a regular element, such set is GF (2 ^m) normal basis.Getting m and T is positive integer, makes p=mT+1 be a prime number and gcd (mT/k, m)=1, and wherein k is the multiplication exponent number of 2 mould p.Getting α is at GF (2 ^m) in the unit primitive root on mT+1 rank.? in, for any T rank unit primitive root τ, generate two bit field GF (2 based on GF (2) ^m) normal basis this base is also called (Gaussian normal basis, GNB) at the bottom of T-shaped Gauss's normal basis.The complexity (on Time and place) of GNB multiplier depends on their model T > 1.NIST suggested five kind of two bit field, and these five kinds of first fields are m=163, and 233,283,409 and 571.The T of these five kinds of first fields is even number, is respectively 4,2,6,4 and 10.

It is based on the multiplication matrix R in document [18] that GNB multiplication calculates _{(m-1) * T}.Get A=(a ₀, a ₁..., a _m-1), B=(b ₀, b ₁..., b _m-1) be two at GF (2 ^m) on T-shaped GNB element.They are at GF (2 ^m) in product can be expressed as:

Wherein,

S (i, B) = (B < < R (i, 1)) &CirclePlus; (B < < R (i, 2)) &CirclePlus; \cdot \cdot \cdot &CirclePlus; (B < < R (i, T)), 1 \leq i \leq m - 1 - - - (2)

Here (X < < i) represents X ∈ GF (2 ^m) carry out i ring shift left.Wherein X ⊙ Y=(x ₀y ₀..., x _m-1y _m-1), represent and step-by-step and, step-by-step XOR are carried out to the coefficient of X and Y.Finite field multiplier can be designed to position level (space complexity O (m) and time complexity O (m)), numerical digit rank (space complexity O (md) and time complexity O (m/d)) and bit parallel (space complexity O (m ²) and time complexity be O (1)) framework.

Recently, several bit-parallel input parallel output (digit-level parallel-in parallel-out of low complex degree, DL-PIPO) GNB multiplier is proposed by document [18] [19] [20], and its Literature [20] is optimum.DL-PIPO framework as shown in Figure 1.We can see, in this multiplier, two operand A and B (in advance stored in register <X>, in <Y>) all should retain in whole computation process, and result should in process obtain after the individual clock period simultaneously.Notice for a given field size, numerical digit width d should by choose reasonable to lower Time & Space Complexity.The time complexity of the GNB multiplier of numerical digit level is area complexity be dm AND logic gate and xor logic door.The identical sublist utilizing document [20] to propose reaches formula elimination algorithm, and area complexity reduces further, as long as xor logic door, wherein

n_{p} \leq \min {\frac{v_{p} T}{2}, (\begin{matrix} m \\ 2 \end{matrix})}, v_{p} = \frac{d (m - 1)}{2} .

A.1-D digital level heart contraction structure

From matrix R _{(m-1) × T}symmetrical structure in (1) can draw, formula S (i, B) can be written to as follows:

S (m - k, B) = S (k, B) > > k, 1 \leq k \leq \frac{m - 1}{2} - - - (3)

Therefore, for replacing matrix R _{(m-1) × T}, we can define matrix for:

Wherein, u _k, it is the row k of matrix u.In FIG, DL-PIPOGNB multiplier architecture is illustrated.Suppose that input element A (being loaded in advance in register <X>) is expressed as again

< X > = (x_{0}, x_{m - 1}, x_{m - 2}, \cdot \cdot \cdot, x_{2}, x_{1}) = \overset{&OverBar;}{A} > > 1,

Wherein

\overset{&OverBar;}{A} = Σ_{i = 0}^{m - 1} α_{m - 1 - i} β^{2^{i}} .

Then, matrix is utilized

u_{\frac{m - 1}{2} \times T},

A and B product can be obtained by formula:

C = AB = Σ_{i = 0}^{m - 1} J^{2^{i}} (X > > i, B > > i) - - - (5)

Wherein, J (X, Y)=X ⊙ P (Y),

P (Y) = (y_{1}, s^{'} (1, Y), s^{'} (2, Y), \cdot \cdot \cdot, s^{'} (2, Y), s^{'} (1, Y)), s^{'} (k, Y) = Σ_{i &Element; u_{k}} y_{i}, 1 \leq k \leq \frac{m - 1}{2} .

For each coordinate, J (X, Y) function, by suitable displacement input parameter, obtains result of calculation.These functions are weighted sums of each of input B (being loaded in advance in register <Y>), and by matrix determine with the position of input B.Matrix u is expressed as a P block again, and this module for calculating the linear combination of B, and realizes by using XOR to set.

Get then, we can be write as the product in (5):

C = Σ_{i = 0}^{q - 1} L^{2^{id}} (X > > id, B > > id) - - - (6)

Wherein,

L (X, B) = Σ_{j = 0}^{d - 1} J^{2^{j}} (X > > j, B > > j) - - - (7)

Suppose that n and k is that two integers meet q=kn.Notice if q can not be divided exactly by k, we need to meet q=kn in the least significant bit (LSB) zero padding of X and B to make it.By partial product C _ibe defined as:

C_{i} = Σ_{j = 0}^{k - 1} L^{2^{jd}} (X_{i} > > jd, B_{i} > > jd) - - - (8)

Here mention according to top, integer k and index i sum digit width d, Wo Menyou: X _i=X > > kid, B _i=B > > kid.Product C in formula (6) can be compressed into n partial product:

C = C_{0} + C_{1}^{2^{kd}} + \cdot \cdot \cdot + C_{n - 1}^{2^{(n - 1) kd}} = {({({(C_{n - 1})}^{2^{kd}} + C_{n - 2})}^{2^{kd}} + \cdot \cdot \cdot)}^{2^{kd}} + C_{0} - - - (9)

Wherein product C is preferentially represented by its most significant digit (most significant digit first, MSD-first).In order to calculate the partial product C in formula (8) _i, suppose

{\overset{&OverBar;}{X}}_{i} = X_{i} > > (k - 1) d = X > > kid + (k - 1) d, {\overset{&OverBar;}{B}}_{i} = B_{i} > > (k - 1) d = B > > kid + (k - 1) d

Determine before being.Each partial product C _ijust can again be expressed as:

\begin{matrix} C_{i} = {({((L ({\overset{&OverBar;}{X}}_{i}, {\overset{&OverBar;}{B}}_{i}))}^{2^{d}} + L ({\overset{&OverBar;}{X}}_{i} < < d, {\overset{&OverBar;}{B}}_{i} < < d))}^{2^{d}} + \\ \cdot \cdot \cdot)^{2^{d}} + L ({\overset{&OverBar;}{X}}_{i} < < (k - 1) d, {\overset{&OverBar;}{B}}_{i} < < (k - 1) d) \end{matrix} - - - (10)

Algorithm 1 describes the use (9) of proposition and the 1-D heart contraction GNB multiplication of (10).The 1-D digital level heart contraction GNB multiplier of proposition is described according to algorithm 1, Fig. 2,3.Fig. 3 illustrates the numerical digit level heart contraction multiplier of proposition.We can find, the structure of proposition is made up of k processing unit (processing element, PE), i.e. PE ₀, PE ₁..., PE _k-1with a summation circuit (accumulation circuit, AC).The multiplication shown in the core circuit computing formula (7) of Fig. 1.Therefore, we can use the circuits built PE circuit as shown in Figure 2 of Fig. 1.Each PE is by the step 8 in calculating implementation algorithm 1 and 9, AC circuit realiration step 11.

We explain the multiplication step shown in figure 3.Consider PE operation in fig. 2 and algorithm 1, PE _joutput B C is amassed for calculating section _ibe expressed as B _{i, j}.In the initial step, register <C> is initialized to 0, X _iand B _iwherein 0≤i < n-1 realizes calculating by circulative shift operation.In first clock period, two element X _n-1and B _n-1input as the 1-D heart contraction multiplier proposed goes calculating section to amass C _n-1.In the next clock cycle, two element X _n-2and B _n-2be used as being input in the heart contraction multiplier of proposition and amass C in order to calculating section _n-2, by that analogy.Each partial results, C _is, calculating and existing in register <C> through k PE, it needs k+1 the clock period altogether.Therefore, for the 1-D heart contraction multiplier proposed, GNB multiplication C=AB completes after k+n clock period.The clock periodicity of the heart contraction multiplier weighing the 1-D digital level proposed is removed in our proposition presented below.

Proposition 1.For the T-shaped GNB in each GF (2m) territory, the 1-D digital level bit parallel of proposition exports more than heart contraction multiplier needs the individual clock period, d is the numerical digit width selected here.In other words, the delay of the heart contraction multiplier of the 1-D digital level of proposition is

Prove: GNB multiplication is divided into q section and calculates.In Fig. 2,3, provide the digital level heart contraction structure of proposition, suppose that we have a k PEs and AC, wherein q=kn. is therefore, and GNB multiplication also can be divided into n partial results, wherein with X _iand B _ipartial product C _ibe used as the input of heart contraction array multiplier PEs.Whole GNB multiplication needs k+n clock period.For given q=kn, if k (quantity of PEs) is very little, so n (quantity of PEs input element) will become very large, and therefore, time delay (k+n) will become very large.In order to obtain minimum delay and realize for the very large m of value, in Galois field, multiplication has good performance, and we need to reduce therefore, first order derivative should equal 0, this needs therefore, I selects the delay of the multiplier proposed becomes the individual clock period.If be not a perfect square, so result of calculation even also will lack several clock period.This demonstrates our proposition completely.

Conclusion 1.According to proposition 1, if d=1 and the quantity of PEs by determine, being delayed to of 1-D heart contraction GNB multiplier that so we propose mostly is the individual cycle all the time.

In order to talk clearly the discussion of the 1-D heart contraction GNB multiplier that top proposes for us, we use following example that the operation of PE in the different clocks cycle is described.

Example 1.Get for two 6 type GNB elements in GF (227), we suppose the numerical digit width d=3 selected.Then, Wo Menyou according to (9), product C can be expressed as wherein

C_{i} = L (X_{i} < < 6, B_{i} < < 6) + L^{2^{3}} (X_{i} < < 3, B_{i} < < 3) + L^{2^{6}} (X_{i}, B_{i}), X_{i} = X > > 9 i + 6,

B _i=B > > 9i+6, for i=0,1,2.Table 1 lists the operation of each PE in each cycle all the time.We notice, need 6 clock period for the 1-D digital level heart contraction GNB multiplier proposed.

Utilize 1-D heart contraction array to realize Fig. 2,3, the GNB multiplier presented comprises a k PEs and AC circuit.Each PE circuit is by the Structure composing in Fig. 1, and it comprises dm AND, individual XOR, and the register that three m are.Each PE unit core path postpones aC circuit comprises GF (2m) totalizer of a m position and the register of a m position.The Fig. 2 provided, 3 structures, the delay of the GNB multiplier of proposition is the individual clock period.

B, 2-D digital level heart contraction structure

In this section, in order to obtain high performance realization, we show a 2-D heart contraction GNB multiplier, it can reach higher performance to compare segmentation heart contraction structure in front for decimal place width (or the field width degree beaten).Getting k and n is that two integers meet q=k ²n.Notice if q can not by k ²divide exactly.We can mend 0 to X and B and meet q=k to make it ²n.In order to the differentiate of 2-D digital level heart contraction multiplier, that our compression (6) is n partial results and be:

C = C_{0} + C_{1}^{2^{k^{2} d}} + \cdot \cdot \cdot + C_{n - 1}^{2^{{(n - 1) k}^{2} d}} - - - (11)

Wherein,

C_{i} = Σ_{j = 0}^{k - 1} C_{ij}^{2^{kjd}} - - - (12)

C_{ij} = Σ_{z = 0}^{k - 1} L^{2^{dz}} (X_{ij} < < dz, B_{ij} < < dz) - - - (13)

X _ij＝X＞＞k ²id+kjd，B _ij＝B＞＞k ²id+kjd

Each partial results C _ijall k partial product and, the partial results C in (12) in (7) _ik partial results C _ijand.Realize calculating section amass C to reach a complete streamline _i, we define each partial results C _ijall by 1-D heart contraction array structure realize, as Fig. 2 present.In this regard, the calculating C of proposition _i2-D heart contraction array multiplier show in figure 3.In figure 3, the 2-D heart contraction multiplier of proposition by k 1-D heart contraction array, (k-1) individual cyclic shift circuits, (k-1) individual AC1 structure, and an AC2 Structure composing.Each CS module provides kd position, and ring shift only need rewiring on hardware implementing to the right.1-D heart contraction array [i] (1-D systolic array [i]) being used in the drawings realizes k partial product C _ijadd and.

Proposition 2.Get territory to be made up of the T type GNB of even number, the 2-D heart contraction GNB multiplier of proposition needs maximum time delay to be the individual clock period, the quantity of PEs is

Prove: getting k and n is that two positive integers meet wherein d is the numerical digit width selected.GNB multiplication uses k ²individual PEs goes to build 2-D heart contraction array structure, if the C of this circuit counting in (12) _i, so we have 2k clock period.Therefore, the GNB multiplication shown in (11) needs 2k+n clock period.Similar proposition 1, single order lead and need to be zero, namely

2 - \frac{2 q}{k^{3}} = 0,

Needs make

k = \sqrt[3]{q} = \sqrt[3]{\frac{m}{d}} .

So we select the time delay that 2-D heart contraction multiplier can obtain is the individual clock period.When when not being complete cube, calculating needing less clock period, therefore demonstrating our proposition.

Conclusion 2.According to proposition 2, being delayed to of 2-D heart contraction GNB multiplier that we propose mostly is the individual clock period.

Realize Fig. 4 by using 2-D heart contraction array. the 2-D heart contraction multiplier that we propose by individual PEs, individual CS, individual AC1,1 AC2 composition.By using this structure, the minimum delay of GNB multiplier can reach the individual clock period.We notice and compare 1-D heart contraction multiplier, will be lower if select decimal place width (or large field width degree) to postpone.Such as, numerical digit width is selected to be 1, GF (2 ⁴⁰⁹) delay of 2-D heart contraction multiplier under territory 1-D heart contraction multiplier when be 24 clock period being 3 with numerical digit width Late phase with.This means the better effects if of 2-D heart contraction multiplier under these conditions.

[18]A.Reyhani-Masoleh，“Efficient?Algorithms?and?Architectures?for?Field?Multiplication?Using?Gaussian?Normal?Bases，”IEEE?Trans.Computers，vol.55，no.1，pp.34-47，Jan.2006.

[20]R.Azarderakhsh?and?A.Reyhani-Masoleh，“A?Modified?Low?Complexity?Digit-Level?Gaussian?Normal?Basis?Multiplier，”in?Proc.Intl?Workshop?Arithmetic?of?Finite?Fields(WAIFI)，vol.6087，pp.25-40，2010.

The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, all any amendments done within the spirit and principles in the present invention, equivalent replacement and improvement etc., all should be included within protection scope of the present invention.

Claims

1. for a multiplier processing unit PE for elliptic curves cryptosystem device, it is characterized in that, this multiplier processing unit PE comprises computing unit, input end B _in, input end C _in, input end X _in, output terminal B _out, output terminal C _outand output terminal X _out, described input end B _in, input end C _inand input end X _ininput computing unit respectively, from the described output terminal B of described computing unit after computing _out, output terminal C _outand output terminal X _outexport, B in described computing unit _in, X _incarry out ring shift left d position, its ring shift left d position is: B _out=B _in< < d, X _out=X _in< < d, B in computing unit _in, X _inoperation values and C _incarry out ring shift right d position to be added, its formula is: C _out=C _in> > d+L (B _in, X _in), wherein, C _inthe result of a upper processing unit PE, for the C of first processing unit PE _inbe initially zero, C _outbe that processing unit PE calculates and exports the result of product, as the input of next processing unit PE, d is expressed as numerical digit length, the hop count that k is expressed as point, and L is that computing identifies.

2. an one dimension multiplier, it is characterized in that, this one dimension multiplier comprises a k multiplier processing unit PE and according to claim 1 summation circuit AC, described summation circuit AC is connected after described k processing unit PE connects, the input end of each PE is exported by the calculating of last PE to obtain, and input three parameters of first PE are B respectively ₀, B ₁..., B _n-1, 0,0 ..., 0, X ₀, X ₁..., X _n-1, wherein, X is shifted by A and obtains, and its output computing formula is:

C = C_{0} + C_{1}^{2^{kd}} + \cdot \cdot \cdot + C_{n - 1}^{2^{(n - 1) kd}} = {({({(C_{n - 1})}^{2^{kd}} + C_{n - 2})}^{2^{kd}} + \cdot \cdot \cdot)}^{2^{kd}} + C_{0} .

3. one dimension multiplier according to claim 2, it is characterized in that, described summation circuit AC comprises adder unit, temporary storage location and shift unit, described shift unit output terminal connects described adder unit input end, described adder unit output terminal connects described temporary storage location input end, described temporary storage location output terminal connects described shift unit input end, and described summation circuit carries out displacement to the result that k PE processing unit once calculates and is added with the Output rusults next time of k PE processing unit.

4. a two-dimentional multiplier, it is characterized in that, this two-dimentional multiplier comprises the one dimension multiplier described in k Claims 2 or 3, 2k-2 CS module, k-1 summation circuit AC1 and summation circuit AC2, k described one dimension multiplier is in parallel, first described one dimension multiplier outputs connects the shift unit of first described summation circuit AC1, k-1 described summation circuit AC1 connects, kth-1 described summation circuit AC1 connects with described summation circuit AC2, second output terminal to kth-1 described one dimension multiplier is connected with a described k-1 summation circuit respectively, second input B to the described one dimension multiplier of kth-1 holds, X end connects a described CS module respectively, the input end of first described one dimension multiplier directly inputs, its operational formula is:

5. two-dimentional multiplier according to claim 4, it is characterized in that, described summation circuit AC1 comprises shift unit and adder unit, described shift unit output terminal connects described adder unit input end, described summation circuit AC1 is shifted to input and is added with the described one dimension multiplier Output rusults that is connected and exports, described shift unit ring shift right kd position.

6. two-dimentional multiplier according to claim 5, it is characterized in that, described summation circuit comprises shift unit, adder unit and temporary storage location, described shift unit output terminal connects described adder unit input end, described adder unit output terminal connects described temporary storage location, described temporary storage location output terminal connects the input end of described adder unit, and described shift unit is to input numerical value ring shift right k ²d position.

7. two-dimentional multiplier according to claim 6, is characterized in that, described CS module is used for carrying out ring shift right kd position to the numerical value of input.