CN104252332A - Multiplier and multiplier processing element for ellipse cipher apparatus - Google Patents

Multiplier and multiplier processing element for ellipse cipher apparatus Download PDF

Info

Publication number
CN104252332A
CN104252332A CN201410414896.8A CN201410414896A CN104252332A CN 104252332 A CN104252332 A CN 104252332A CN 201410414896 A CN201410414896 A CN 201410414896A CN 104252332 A CN104252332 A CN 104252332A
Authority
CN
China
Prior art keywords
multiplier
unit
input end
output terminal
summation circuit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410414896.8A
Other languages
Chinese (zh)
Other versions
CN104252332B (en
Inventor
潘正祥
杨春生
李秋莹
闫立军
蔡正富
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Airmate Electrical Shenzhen Co Ltd
Shenzhen Graduate School Harbin Institute of Technology
Original Assignee
Airmate Electrical Shenzhen Co Ltd
Shenzhen Graduate School Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Airmate Electrical Shenzhen Co Ltd, Shenzhen Graduate School Harbin Institute of Technology filed Critical Airmate Electrical Shenzhen Co Ltd
Priority to CN201410414896.8A priority Critical patent/CN104252332B/en
Publication of CN104252332A publication Critical patent/CN104252332A/en
Application granted granted Critical
Publication of CN104252332B publication Critical patent/CN104252332B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention relates to a multiplier processing element PE for an ellipse cipher apparatus. The multiplier processing element comprises a computing element, an input end<Bin>, an input end<Cin>, an input end<Xin>, an output end<Bout>, an output end<Cout> and an output end<Xout>. Parameters are respectively inputted into the computing element by the input end<Bin>, the input end<Cin> and the input end<Xin>, are computed and processed and then are outputted from the computing element via the output end<Bout>, the output end<Cout> and the output end<Xout>, parameters<Bin> and parameters<Xin> in the computing element are subjected to ring shift left by d bits, the d meets inequalities of B<out>=B<in><<d and X<out>=X<in><<d, parameters<Cin> and operation values of the parameters<Bin> and the parameters<Xin> in the computing element are subjected to ring shift right by d bits according to a formula of C<out>=C<in>>>d+L(<B<in>, X<in>), the C<in> represents results of a previous processing element PE, the C<in> of a first processing element PE is zero initially, the C<out> represents computation output product results of the processing element PE and is used for being inputted into a next processing element PE, the d represents the lengths of digits, a k represents the numbers of divided segments, and the L represents operation identification. The multiplier processing element has the advantages that shift processing and J function computation are carried out during computation, accordingly, the processing element is high in operation speed and low in computation complexity, and the performance of the cipher apparatus can be improved.

Description

A kind of multiplier processing unit for elliptic curves cryptosystem device and multiplier
Technical field
The invention belongs to numerical coding field, particularly relate to multiplier processing unit and multiplier that a kind of Galois field is applicable to elliptic curves cryptosystem device.
Background technology
Recent years, effective, the high-performance of finite field operations and low complex design and application thereof have obtained a lot of concern.Such as, the algorithm of scrambler and system demand fulfillment American National Standard and Institute for Research and Technology (National Institute of Standards and Technology, and IEEE-USA (Institute of Electrical and Electronics Engineers NIST), IEEE) safety requirements proposed, to reduce potential attack, guarantee hardware security.Simultaneously an importance of scrambler resists side channel analysis (side-channel attacks) reducing costs.In industry member, error-detecting research field is much paid attention to, such as document [3-6], also can find out from the attack cryptographic system based on error analysis and side channel analysis.In actual applications, original design often needs to increase overhead, and therefore they need effective design, can tolerate and bear this overhead.Recently, elliptic curves cryptosystem device, as the effective technology of one, meets public key cryptography requirement, is implemented in a lot of high-performance and safe limit application aspect.Such as, this algorithm can make full use of mobile wireless ad hoc networks (Mobile Ad hoc NETworks, MANETs), effectively provides confidence level and integrity checking.This inspection does not need to consider whether physical layer safety has danger. elliptic curve cipher device is a kind of method based on elliptic curve Algebraic Structure in Galois field, and the arithmetic operation of the method determines the validity of the cryptographic system based on elliptic curve cipher device.Therefore, effective, low complex degree that many research work have focused on arithmetic element are become reconciled performance design, and these unit are used in the cryptographic system of elliptic curve cipher device and public key encryption algorithm (RSA).Nearest Gauss's normal basis multiplier (Gaussian normal basis, GNB) has been widely applied to and has calculated point multiplication (also can be referred to as scalar multiplication) in elliptic curve cipher device.It should be noted that this computing not only needs effective performance, and in temporal constraint application, its realization must be high-performance.
In two large-scale bit fields, territory multiplication can pass through systolic arrays method, and design obtains at a high speed and the VLSI (very large scale integrated circuit) of rule realizes.Systolic arrays can not run into irregular circuit design.In other words, for the different choice in two bit fields, their hardware configuration is modular closely similar.The features such as the balance of its simultaneity, input and output and simple well-regulated design, make it to be applicable to performance application.Although in the application of needs high-speed structures, pulsation framework is widely used, and normally can be accepted as prerequisite with its area complexity.Such as, document [16] proposes a kind of optimization base ripple multiplier, and this multiplier has very strong systematicness, can realize by data serial mode.This ripple multiplier obtains high-performance at document [17] and realizes on configurable hardware.
[3]A.Yazdani,H.Sepahvand,M.Crow,and?M.Ferdowsi,“Fault?Detection?and?Mitigation?in?Multilevel?Converter?STATCOMs,”IEEE?Trans.Ind.Electron.,vol.58,no.4,pp.1307-1315,2011.
[4]M.A.Rodr A.Claudio-Sanchez,D.Theilliol,L.Vela-Valdes,P.Sibaja-Teran,L.Hernandez-Gonzalez,and?J.Aguayo-Alquicira,“A?Failure-Detection?Strategy?for?IGBT?Based?on?Gate-Voltage?Behavior?Applied?to?a?Motor?Drive?System,”IEEE?Trans.Ind.Electron.,vol.58,no.5,pp.1625-1633,2011.
[5]T.A.Najafabadi,F.R.Salmasi,and?P.Jabehdar-Maralani,“Detection?and?Isolation?of?Speed-,DC-Link?Voltage-,and?Current-Sensor?Faults?Based?on?an?Adaptive?Observer?in?Induction-Motor?Drives,”IEEE?Trans.Ind.Electron.,vol.58,no.5,pp.1662-1672,2011.
[6]S.Cruz,M.Ferreira,A.Mendes,and?A.J.M.Cardoso,“Analysis?and?Diagnosis?of?Open-Circuit?Faults?in?Matrix?Converters,”IEEE?Trans.Ind.Electron.,vol.58,no.5,pp.1648-1661,2011.
[16]S.Kwon,“A?Low?Complexity?and?a?Low?Latency?Bit?Parallel?Systolic?Multiplier?over?GF(2m)Using?an?Optimal?Normal?Basis?of?Type?II,”in?Proc.IEEE?Symp.Computer?Arithmetic(Arith-16),pp.196-202,2003.
[17]J.Fan,D.Bailey,L.Batina,T.Guneysu,C.Paar,and?I.Verbauwhede,“Breaking?Elliptic?Curves?Cryptosystems?using?Reconfigurable?Hardware,”in?Proc.of?20th?Intl?Conf.on?Field?Programmable?Logic?and?Applications(FPL2010),2010,pp.133-138.
Summary of the invention
The invention provides a kind of multiplier processing unit for elliptic curves cryptosystem device, be intended to solve the problem that existing processing unit computing velocity is slow, operation time is long.
The present invention is achieved in that a kind of processing unit for elliptic curves cryptosystem device multiplier, and this multiplier processing unit PE comprises computing unit, input end B in, input end C in, input end X in, output terminal B out, output terminal C outand output terminal X out, described input end B in, input end C inand input end X ininput computing unit respectively, from the described output terminal B of described computing unit after computing out, output terminal C outand output terminal X outexport, B in described computing unit in, X incarry out ring shift left d position, its ring shift left d position is: B out=B in< < d, X out=X in< < d, B in computing unit in, X inoperation values and C incarry out ring shift right d position to be added, its formula is: C out=C in> > d+L (B in, X in), wherein, C inthe result of a upper processing unit PE, for the C of first processing unit PE inbe initially zero, C outbe that processing unit PE calculates and exports the result of product, as the input of next processing unit PE, d is expressed as numerical digit length, the hop count that k is expressed as point, and L is that computing identifies.
Another object of the present invention is to provide a kind of one dimension multiplier, this one dimension multiplier comprises a k multiplier processing unit PE and according to claim 1 summation circuit AC, described summation circuit AC is connected after described k processing unit PE connects, the input end of each PE is exported by the calculating of last PE to obtain, and input three parameters of first PE are B respectively 0, B 1..., B n-1, 0,0 ..., 0, X 0, X 1..., X n-1, wherein, X is shifted by A and obtains, and its output computing formula is: C = C 0 + C 1 2 kd + &CenterDot; &CenterDot; &CenterDot; + C n - 1 2 ( n - 1 ) kd = ( ( ( C n - 1 ) 2 kd + C n - 2 ) 2 kd + &CenterDot; &CenterDot; &CenterDot; ) 2 kd + C 0 .
Further technical scheme of the present invention is: described summation circuit AC comprises adder unit, temporary storage location and shift unit, described shift unit output terminal connects described adder unit input end, described adder unit output terminal connects described temporary storage location input end, described temporary storage location output terminal connects described shift unit input end, and described summation circuit carries out displacement to the result that k PE processing unit once calculates and is added with the Output rusults next time of k PE processing unit.
Another object of the present invention is to provide a kind of two-dimentional multiplier, this two-dimentional multiplier comprises the one dimension multiplier described in k Claims 2 or 3, 2k-2 CS module, k-1 summation circuit AC1 and summation circuit AC2, k described one dimension multiplier is in parallel, first described one dimension multiplier outputs connects the shift unit of first described summation circuit AC1, k-1 described summation circuit AC1 connects, kth-1 described summation circuit AC1 connects with described summation circuit AC2, second output terminal to kth-1 described one dimension multiplier is connected with a described k-1 summation circuit respectively, second input B to the described one dimension multiplier of kth-1 holds, X end connects a described CS module respectively, the input end of first described one dimension multiplier directly inputs, its operational formula is: C = C 0 + C 1 2 k 2 d + &CenterDot; &CenterDot; &CenterDot; + C n - 1 2 ( n - 1 ) k 2 d .
Further technical scheme of the present invention is: described summation circuit AC1 comprises shift unit and adder unit, described shift unit output terminal connects described adder unit input end, described summation circuit AC1 is shifted to input and is added with the described one dimension multiplier Output rusults that is connected and exports, described shift unit ring shift right kd position.
Further technical scheme of the present invention is: described summation circuit comprises shift unit, adder unit and temporary storage location, described shift unit output terminal connects described adder unit input end, described adder unit output terminal connects described temporary storage location, described temporary storage location output terminal connects the input end of described adder unit, and described shift unit is to input numerical value ring shift right k 2d position.
Further technical scheme of the present invention is: described CS module is used for carrying out ring shift right kd position to the numerical value of input.
Further technical scheme of the present invention is:
The invention has the beneficial effects as follows: by carrying out the calculating of shifting processing and J function when calculating, make processing unit fast operation, computation complexity is low, and the performance of scrambler is improved.The present invention is a kind of multiplier proposed based on systolic array architecture, is therefore easy to realize in VLSI system, has low delay, high performance nature.
Accompanying drawing explanation
Fig. 1 is the DL-PIPO GNB multiplier circuit of foundation of the present invention;
Fig. 2 is the structural drawing of the processing unit PE that the embodiment of the present invention provides;
Fig. 3 is the one dimension multiplier circuit that the embodiment of the present invention provides;
Fig. 4 is the two-dimentional multiplier circuit that the embodiment of the present invention provides.
Embodiment
Fig. 2 shows the processing unit for elliptic curves cryptosystem device multiplier provided by the invention, and this multiplier processing unit PE comprises computing unit, input end B in, input end C in, input end X in, output terminal B out, output terminal C outand output terminal X out, described input end B in, input end C inand input end X ininput computing unit respectively, from the described output terminal B of described computing unit after computing out, output terminal C outand output terminal X outexport, B in described computing unit in, X incarry out ring shift left d position, its ring shift left d position is: B out=B in< < d, X out=X in< < d, B in computing unit in, X inoperation values and C incarry out ring shift right d position to be added, its formula is: C out=C in> > d+L (B in, X in), wherein, C inthe result of a upper processing unit PE, for the C of first processing unit PE inbe initially zero, C outbe that processing unit PE calculates and exports the result of product, as the input of next processing unit PE, d is expressed as numerical digit length, the hop count that k is expressed as point, and L is that computing identifies.By carrying out the calculating of shifting processing and j function when calculating, make processing unit fast operation, computation complexity is low, and the performance of scrambler is improved.
Fig. 3 shows and another object of the present invention is to provide a kind of one dimension multiplier, this one dimension multiplier comprises a k multiplier processing unit PE and according to claim 1 summation circuit AC, described summation circuit AC is connected after described k processing unit PE connects, the input end of each PE is exported by the calculating of last PE to obtain, and input three parameters of first PE are B respectively 0, B 1..., B n-1, 0,0 ..., 0, X 0, X 1..., X n-1, wherein, X is shifted by A and obtains, and its output computing formula is: C = C 0 + C 1 2 kd + &CenterDot; &CenterDot; &CenterDot; + C n - 1 2 ( n - 1 ) kd = ( ( ( C n - 1 ) 2 kd + C n - 2 ) 2 kd + &CenterDot; &CenterDot; &CenterDot; ) 2 kd + C 0 .
Described summation circuit AC comprises adder unit, temporary storage location and shift unit, described shift unit output terminal connects described adder unit input end, described adder unit output terminal connects described temporary storage location input end, described temporary storage location output terminal connects described shift unit input end, and described summation circuit carries out displacement to the result that k PE processing unit once calculates and is added with the Output rusults next time of k PE processing unit.
Fig. 4 shows and another object of the present invention is to provide a kind of two-dimentional multiplier, this two-dimentional multiplier comprises the one dimension multiplier described in k Claims 2 or 3, 2k-2 CS module, k-1 summation circuit AC1 and summation circuit AC2, k described one dimension multiplier is in parallel, first described one dimension multiplier outputs connects the shift unit of first described summation circuit AC1, k-1 described summation circuit AC1 connects, kth-1 described summation circuit AC1 connects with described summation circuit AC2, second output terminal to kth-1 described one dimension multiplier is connected with a described k-1 summation circuit respectively, second input B to the described one dimension multiplier of kth-1 holds, X end connects a described CS module respectively, the input end of first described one dimension multiplier directly inputs, its operational formula is: C = C 0 + C 1 2 k 2 d + &CenterDot; &CenterDot; &CenterDot; + C n - 1 2 ( n - 1 ) k 2 d .
Described summation circuit AC1 comprises shift unit and adder unit, described shift unit output terminal connects described adder unit input end, described summation circuit AC1 is shifted to input and is added with the described one dimension multiplier Output rusults that is connected and exports, described shift unit ring shift right kd position.
Described summation circuit comprises shift unit, adder unit and temporary storage location, described shift unit output terminal connects described adder unit input end, described adder unit output terminal connects described temporary storage location, described temporary storage location output terminal connects the input end of described adder unit, and described shift unit is to input numerical value ring shift right k 2d position.
Described CS module is used for carrying out ring shift right kd position to the numerical value of input.
Use decomposition method to obtain two kinds of new numerical digit GNB multipliers below.
Get as GF (2 m) normal basis (Normal basis, NB), wherein β ∈ GF (2 m).β is GF (2 m) in a regular element, such set is GF (2 m) normal basis.Getting m and T is positive integer, makes p=mT+1 be a prime number and gcd (mT/k, m)=1, and wherein k is the multiplication exponent number of 2 mould p.Getting α is at GF (2 m) in the unit primitive root on mT+1 rank.? in, for any T rank unit primitive root τ, generate two bit field GF (2 based on GF (2) m) normal basis this base is also called (Gaussian normal basis, GNB) at the bottom of T-shaped Gauss's normal basis.The complexity (on Time and place) of GNB multiplier depends on their model T > 1.NIST suggested five kind of two bit field, and these five kinds of first fields are m=163, and 233,283,409 and 571.The T of these five kinds of first fields is even number, is respectively 4,2,6,4 and 10.
It is based on the multiplication matrix R in document [18] that GNB multiplication calculates (m-1) * T.Get A=(a 0, a 1..., a m-1), B=(b 0, b 1..., b m-1) be two at GF (2 m) on T-shaped GNB element.They are at GF (2 m) in product can be expressed as:
Wherein, S ( i , B ) = ( B < < R ( i , 1 ) ) &CirclePlus; ( B < < R ( i , 2 ) ) &CirclePlus; &CenterDot; &CenterDot; &CenterDot; &CirclePlus; ( B < < R ( i , T ) ) , 1 &le; i &le; m - 1 - - - ( 2 )
Here (X < < i) represents X ∈ GF (2 m) carry out i ring shift left.Wherein X ⊙ Y=(x 0y 0..., x m-1y m-1), represent and step-by-step and, step-by-step XOR are carried out to the coefficient of X and Y.Finite field multiplier can be designed to position level (space complexity O (m) and time complexity O (m)), numerical digit rank (space complexity O (md) and time complexity O (m/d)) and bit parallel (space complexity O (m 2) and time complexity be O (1)) framework.
Recently, several bit-parallel input parallel output (digit-level parallel-in parallel-out of low complex degree, DL-PIPO) GNB multiplier is proposed by document [18] [19] [20], and its Literature [20] is optimum.DL-PIPO framework as shown in Figure 1.We can see, in this multiplier, two operand A and B (in advance stored in register <X>, in <Y>) all should retain in whole computation process, and result should in process obtain after the individual clock period simultaneously.Notice for a given field size, numerical digit width d should by choose reasonable to lower Time & Space Complexity.The time complexity of the GNB multiplier of numerical digit level is area complexity be dm AND logic gate and xor logic door.The identical sublist utilizing document [20] to propose reaches formula elimination algorithm, and area complexity reduces further, as long as xor logic door, wherein n p &le; min { v p T 2 , m 2 } , v p = d ( m - 1 ) 2 .
A.1-D digital level heart contraction structure
From matrix R (m-1) × Tsymmetrical structure in (1) can draw, formula S (i, B) can be written to as follows: S ( m - k , B ) = S ( k , B ) > > k , 1 &le; k &le; m - 1 2 - - - ( 3 )
Therefore, for replacing matrix R (m-1) × T, we can define matrix for:
Wherein, u k, it is the row k of matrix u.In FIG, DL-PIPOGNB multiplier architecture is illustrated.Suppose that input element A (being loaded in advance in register <X>) is expressed as again < X > = ( x 0 , x m - 1 , x m - 2 , &CenterDot; &CenterDot; &CenterDot; , x 2 , x 1 ) = A &OverBar; > > 1 , Wherein A &OverBar; = &Sigma; i = 0 m - 1 &alpha; m - 1 - i &beta; 2 i . Then, matrix is utilized u m - 1 2 &times; T , A and B product can be obtained by formula: C = AB = &Sigma; i = 0 m - 1 J 2 i ( X > > i , B > > i ) - - - ( 5 )
Wherein, J (X, Y)=X ⊙ P (Y), P ( Y ) = ( y 1 , s &prime; ( 1 , Y ) , s &prime; ( 2 , Y ) , &CenterDot; &CenterDot; &CenterDot; , s &prime; ( 2 , Y ) , s &prime; ( 1 , Y ) ) , s &prime; ( k , Y ) = &Sigma; i &Element; u k y i , 1 &le; k &le; m - 1 2 . For each coordinate, J (X, Y) function, by suitable displacement input parameter, obtains result of calculation.These functions are weighted sums of each of input B (being loaded in advance in register <Y>), and by matrix determine with the position of input B.Matrix u is expressed as a P block again, and this module for calculating the linear combination of B, and realizes by using XOR to set.
Get then, we can be write as the product in (5):
C = &Sigma; i = 0 q - 1 L 2 id ( X > > id , B > > id ) - - - ( 6 )
Wherein, L ( X , B ) = &Sigma; j = 0 d - 1 J 2 j ( X > > j , B > > j ) - - - ( 7 )
Suppose that n and k is that two integers meet q=kn.Notice if q can not be divided exactly by k, we need to meet q=kn in the least significant bit (LSB) zero padding of X and B to make it.By partial product C ibe defined as:
C i = &Sigma; j = 0 k - 1 L 2 jd ( X i > > jd , B i > > jd ) - - - ( 8 )
Here mention according to top, integer k and index i sum digit width d, Wo Menyou: X i=X > > kid, B i=B > > kid.Product C in formula (6) can be compressed into n partial product:
C = C 0 + C 1 2 kd + &CenterDot; &CenterDot; &CenterDot; + C n - 1 2 ( n - 1 ) kd = ( ( ( C n - 1 ) 2 kd + C n - 2 ) 2 kd + &CenterDot; &CenterDot; &CenterDot; ) 2 kd + C 0 - - - ( 9 )
Wherein product C is preferentially represented by its most significant digit (most significant digit first, MSD-first).In order to calculate the partial product C in formula (8) i, suppose X &OverBar; i = X i > > ( k - 1 ) d = X > > kid + ( k - 1 ) d , B &OverBar; i = B i > > ( k - 1 ) d = B > > kid + ( k - 1 ) d Determine before being.Each partial product C ijust can again be expressed as:
C i = ( ( ( L ( X &OverBar; i , B &OverBar; i ) ) 2 d + L ( X &OverBar; i < < d , B &OverBar; i < < d ) ) 2 d + &CenterDot; &CenterDot; &CenterDot; ) 2 d + L ( X &OverBar; i < < ( k - 1 ) d , B &OverBar; i < < ( k - 1 ) d ) - - - ( 10 )
Algorithm 1 describes the use (9) of proposition and the 1-D heart contraction GNB multiplication of (10).The 1-D digital level heart contraction GNB multiplier of proposition is described according to algorithm 1, Fig. 2,3.Fig. 3 illustrates the numerical digit level heart contraction multiplier of proposition.We can find, the structure of proposition is made up of k processing unit (processing element, PE), i.e. PE 0, PE 1..., PE k-1with a summation circuit (accumulation circuit, AC).The multiplication shown in the core circuit computing formula (7) of Fig. 1.Therefore, we can use the circuits built PE circuit as shown in Figure 2 of Fig. 1.Each PE is by the step 8 in calculating implementation algorithm 1 and 9, AC circuit realiration step 11.
We explain the multiplication step shown in figure 3.Consider PE operation in fig. 2 and algorithm 1, PE joutput B C is amassed for calculating section ibe expressed as B i, j.In the initial step, register <C> is initialized to 0, X iand B iwherein 0≤i < n-1 realizes calculating by circulative shift operation.In first clock period, two element X n-1and B n-1input as the 1-D heart contraction multiplier proposed goes calculating section to amass C n-1.In the next clock cycle, two element X n-2and B n-2be used as being input in the heart contraction multiplier of proposition and amass C in order to calculating section n-2, by that analogy.Each partial results, C is, calculating and existing in register <C> through k PE, it needs k+1 the clock period altogether.Therefore, for the 1-D heart contraction multiplier proposed, GNB multiplication C=AB completes after k+n clock period.The clock periodicity of the heart contraction multiplier weighing the 1-D digital level proposed is removed in our proposition presented below.
Proposition 1.For the T-shaped GNB in each GF (2m) territory, the 1-D digital level bit parallel of proposition exports more than heart contraction multiplier needs the individual clock period, d is the numerical digit width selected here.In other words, the delay of the heart contraction multiplier of the 1-D digital level of proposition is
Prove: GNB multiplication is divided into q section and calculates.In Fig. 2,3, provide the digital level heart contraction structure of proposition, suppose that we have a k PEs and AC, wherein q=kn. is therefore, and GNB multiplication also can be divided into n partial results, wherein with X iand B ipartial product C ibe used as the input of heart contraction array multiplier PEs.Whole GNB multiplication needs k+n clock period.For given q=kn, if k (quantity of PEs) is very little, so n (quantity of PEs input element) will become very large, and therefore, time delay (k+n) will become very large.In order to obtain minimum delay and realize for the very large m of value, in Galois field, multiplication has good performance, and we need to reduce therefore, first order derivative should equal 0, this needs therefore, I selects the delay of the multiplier proposed becomes the individual clock period.If be not a perfect square, so result of calculation even also will lack several clock period.This demonstrates our proposition completely.
Conclusion 1.According to proposition 1, if d=1 and the quantity of PEs by determine, being delayed to of 1-D heart contraction GNB multiplier that so we propose mostly is the individual cycle all the time.
In order to talk clearly the discussion of the 1-D heart contraction GNB multiplier that top proposes for us, we use following example that the operation of PE in the different clocks cycle is described.
Example 1.Get for two 6 type GNB elements in GF (227), we suppose the numerical digit width d=3 selected.Then, Wo Menyou according to (9), product C can be expressed as wherein C i = L ( X i < < 6 , B i < < 6 ) + L 2 3 ( X i < < 3 , B i < < 3 ) + L 2 6 ( X i , B i ) , X i = X > > 9 i + 6 , B i=B > > 9i+6, for i=0,1,2.Table 1 lists the operation of each PE in each cycle all the time.We notice, need 6 clock period for the 1-D digital level heart contraction GNB multiplier proposed.
Utilize 1-D heart contraction array to realize Fig. 2,3, the GNB multiplier presented comprises a k PEs and AC circuit.Each PE circuit is by the Structure composing in Fig. 1, and it comprises dm AND, individual XOR, and the register that three m are.Each PE unit core path postpones aC circuit comprises GF (2m) totalizer of a m position and the register of a m position.The Fig. 2 provided, 3 structures, the delay of the GNB multiplier of proposition is the individual clock period.
B, 2-D digital level heart contraction structure
In this section, in order to obtain high performance realization, we show a 2-D heart contraction GNB multiplier, it can reach higher performance to compare segmentation heart contraction structure in front for decimal place width (or the field width degree beaten).Getting k and n is that two integers meet q=k 2n.Notice if q can not by k 2divide exactly.We can mend 0 to X and B and meet q=k to make it 2n.In order to the differentiate of 2-D digital level heart contraction multiplier, that our compression (6) is n partial results and be:
C = C 0 + C 1 2 k 2 d + &CenterDot; &CenterDot; &CenterDot; + C n - 1 2 ( n - 1 ) k 2 d - - - ( 11 )
Wherein, C i = &Sigma; j = 0 k - 1 C ij 2 kjd - - - ( 12 ) C ij = &Sigma; z = 0 k - 1 L 2 dz ( X ij < < dz , B ij < < dz ) - - - ( 13 )
X ij=X>>k 2id+kjd,B ij=B>>k 2id+kjd
Each partial results C ijall k partial product and, the partial results C in (12) in (7) ik partial results C ijand.Realize calculating section amass C to reach a complete streamline i, we define each partial results C ijall by 1-D heart contraction array structure realize, as Fig. 2 present.In this regard, the calculating C of proposition i2-D heart contraction array multiplier show in figure 3.In figure 3, the 2-D heart contraction multiplier of proposition by k 1-D heart contraction array, (k-1) individual cyclic shift circuits, (k-1) individual AC1 structure, and an AC2 Structure composing.Each CS module provides kd position, and ring shift only need rewiring on hardware implementing to the right.1-D heart contraction array [i] (1-D systolic array [i]) being used in the drawings realizes k partial product C ijadd and.
Proposition 2.Get territory to be made up of the T type GNB of even number, the 2-D heart contraction GNB multiplier of proposition needs maximum time delay to be the individual clock period, the quantity of PEs is
Prove: getting k and n is that two positive integers meet wherein d is the numerical digit width selected.GNB multiplication uses k 2individual PEs goes to build 2-D heart contraction array structure, if the C of this circuit counting in (12) i, so we have 2k clock period.Therefore, the GNB multiplication shown in (11) needs 2k+n clock period.Similar proposition 1, single order lead and need to be zero, namely 2 - 2 q k 3 = 0 , Needs make k = q 3 = m d 3 . So we select the time delay that 2-D heart contraction multiplier can obtain is the individual clock period.When when not being complete cube, calculating needing less clock period, therefore demonstrating our proposition.
Conclusion 2.According to proposition 2, being delayed to of 2-D heart contraction GNB multiplier that we propose mostly is the individual clock period.
Realize Fig. 4 by using 2-D heart contraction array. the 2-D heart contraction multiplier that we propose by individual PEs, individual CS, individual AC1,1 AC2 composition.By using this structure, the minimum delay of GNB multiplier can reach the individual clock period.We notice and compare 1-D heart contraction multiplier, will be lower if select decimal place width (or large field width degree) to postpone.Such as, numerical digit width is selected to be 1, GF (2 409) delay of 2-D heart contraction multiplier under territory 1-D heart contraction multiplier when be 24 clock period being 3 with numerical digit width Late phase with.This means the better effects if of 2-D heart contraction multiplier under these conditions.
[18]A.Reyhani-Masoleh,“Efficient?Algorithms?and?Architectures?for?Field?Multiplication?Using?Gaussian?Normal?Bases,”IEEE?Trans.Computers,vol.55,no.1,pp.34-47,Jan.2006.
[20]R.Azarderakhsh?and?A.Reyhani-Masoleh,“A?Modified?Low?Complexity?Digit-Level?Gaussian?Normal?Basis?Multiplier,”in?Proc.Intl?Workshop?Arithmetic?of?Finite?Fields(WAIFI),vol.6087,pp.25-40,2010.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, all any amendments done within the spirit and principles in the present invention, equivalent replacement and improvement etc., all should be included within protection scope of the present invention.

Claims (7)

1. for a multiplier processing unit PE for elliptic curves cryptosystem device, it is characterized in that, this multiplier processing unit PE comprises computing unit, input end B in, input end C in, input end X in, output terminal B out, output terminal C outand output terminal X out, described input end B in, input end C inand input end X ininput computing unit respectively, from the described output terminal B of described computing unit after computing out, output terminal C outand output terminal X outexport, B in described computing unit in, X incarry out ring shift left d position, its ring shift left d position is: B out=B in< < d, X out=X in< < d, B in computing unit in, X inoperation values and C incarry out ring shift right d position to be added, its formula is: C out=C in> > d+L (B in, X in), wherein, C inthe result of a upper processing unit PE, for the C of first processing unit PE inbe initially zero, C outbe that processing unit PE calculates and exports the result of product, as the input of next processing unit PE, d is expressed as numerical digit length, the hop count that k is expressed as point, and L is that computing identifies.
2. an one dimension multiplier, it is characterized in that, this one dimension multiplier comprises a k multiplier processing unit PE and according to claim 1 summation circuit AC, described summation circuit AC is connected after described k processing unit PE connects, the input end of each PE is exported by the calculating of last PE to obtain, and input three parameters of first PE are B respectively 0, B 1..., B n-1, 0,0 ..., 0, X 0, X 1..., X n-1, wherein, X is shifted by A and obtains, and its output computing formula is: C = C 0 + C 1 2 kd + &CenterDot; &CenterDot; &CenterDot; + C n - 1 2 ( n - 1 ) kd = ( ( ( C n - 1 ) 2 kd + C n - 2 ) 2 kd + &CenterDot; &CenterDot; &CenterDot; ) 2 kd + C 0 .
3. one dimension multiplier according to claim 2, it is characterized in that, described summation circuit AC comprises adder unit, temporary storage location and shift unit, described shift unit output terminal connects described adder unit input end, described adder unit output terminal connects described temporary storage location input end, described temporary storage location output terminal connects described shift unit input end, and described summation circuit carries out displacement to the result that k PE processing unit once calculates and is added with the Output rusults next time of k PE processing unit.
4. a two-dimentional multiplier, it is characterized in that, this two-dimentional multiplier comprises the one dimension multiplier described in k Claims 2 or 3, 2k-2 CS module, k-1 summation circuit AC1 and summation circuit AC2, k described one dimension multiplier is in parallel, first described one dimension multiplier outputs connects the shift unit of first described summation circuit AC1, k-1 described summation circuit AC1 connects, kth-1 described summation circuit AC1 connects with described summation circuit AC2, second output terminal to kth-1 described one dimension multiplier is connected with a described k-1 summation circuit respectively, second input B to the described one dimension multiplier of kth-1 holds, X end connects a described CS module respectively, the input end of first described one dimension multiplier directly inputs, its operational formula is:
5. two-dimentional multiplier according to claim 4, it is characterized in that, described summation circuit AC1 comprises shift unit and adder unit, described shift unit output terminal connects described adder unit input end, described summation circuit AC1 is shifted to input and is added with the described one dimension multiplier Output rusults that is connected and exports, described shift unit ring shift right kd position.
6. two-dimentional multiplier according to claim 5, it is characterized in that, described summation circuit comprises shift unit, adder unit and temporary storage location, described shift unit output terminal connects described adder unit input end, described adder unit output terminal connects described temporary storage location, described temporary storage location output terminal connects the input end of described adder unit, and described shift unit is to input numerical value ring shift right k 2d position.
7. two-dimentional multiplier according to claim 6, is characterized in that, described CS module is used for carrying out ring shift right kd position to the numerical value of input.
CN201410414896.8A 2014-08-20 2014-08-20 A kind of multiplier processing unit and multiplier for elliptic curves cryptosystem device Expired - Fee Related CN104252332B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410414896.8A CN104252332B (en) 2014-08-20 2014-08-20 A kind of multiplier processing unit and multiplier for elliptic curves cryptosystem device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410414896.8A CN104252332B (en) 2014-08-20 2014-08-20 A kind of multiplier processing unit and multiplier for elliptic curves cryptosystem device

Publications (2)

Publication Number Publication Date
CN104252332A true CN104252332A (en) 2014-12-31
CN104252332B CN104252332B (en) 2018-09-18

Family

ID=52187288

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410414896.8A Expired - Fee Related CN104252332B (en) 2014-08-20 2014-08-20 A kind of multiplier processing unit and multiplier for elliptic curves cryptosystem device

Country Status (1)

Country Link
CN (1) CN104252332B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050267926A1 (en) * 2004-05-27 2005-12-01 King Fahd University Of Petroleum And Minerals Finite field serial-serial multiplication/reduction structure and method
CN101968732A (en) * 2010-10-09 2011-02-09 中国人民解放军信息工程大学 Bit parallel systolic array shifted polynomial basis multiplier with function of error detection
CN102929574A (en) * 2012-10-18 2013-02-13 复旦大学 Pulse multiplying unit design method on GF (Generator Field) (2163) domain
CN103186360A (en) * 2013-04-03 2013-07-03 哈尔滨工业大学深圳研究生院 Fast arithmetic multi-bit serial pulse dual-base binary finite field multiplier
TW201404108A (en) * 2012-07-09 2014-01-16 Univ Ching Yun Semi-systolic Gaussian normal basis multiplier

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050267926A1 (en) * 2004-05-27 2005-12-01 King Fahd University Of Petroleum And Minerals Finite field serial-serial multiplication/reduction structure and method
CN101968732A (en) * 2010-10-09 2011-02-09 中国人民解放军信息工程大学 Bit parallel systolic array shifted polynomial basis multiplier with function of error detection
TW201404108A (en) * 2012-07-09 2014-01-16 Univ Ching Yun Semi-systolic Gaussian normal basis multiplier
CN102929574A (en) * 2012-10-18 2013-02-13 复旦大学 Pulse multiplying unit design method on GF (Generator Field) (2163) domain
CN103186360A (en) * 2013-04-03 2013-07-03 哈尔滨工业大学深圳研究生院 Fast arithmetic multi-bit serial pulse dual-base binary finite field multiplier

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
JENG-SHYANG PAN等: "Low-Latency Digit-Serial Systolic Double Basis Multiplier over GF(2(m)) Using Subquadratic Toeplitz Matrix-Vector Product Approach", 《IEEE TRANSACTIONS ON COMPUTERS》 *
REZA AZARDERAKHSH等: "《Arithmetic of Finite Fields》", 30 July 2010 *
杨玲 等: "基于阵列结构的ECC算法核心运算模块设计", 《微电子学》 *
罗鹏 等: "基于分治算法的ECC乘法器结构及实现", 《计算机工程》 *
陈传鹏 等: "改进的素数域椭圆曲线密码处理器", 《武汉大学学报(工学版)》 *

Also Published As

Publication number Publication date
CN104252332B (en) 2018-09-18

Similar Documents

Publication Publication Date Title
Erdem et al. A general digit-serial architecture for montgomery modular multiplication
CN103793199B (en) A kind of fast rsa password coprocessor supporting dual domain
Kumar Implementation and analysis of power, area and delay of array, Urdhva, Nikhilam Vedic multipliers
Rashidi et al. Efficient and low‐complexity hardware architecture of Gaussian normal basis multiplication over GF (2m) for elliptic curve cryptosystems
Namin et al. A word-level finite field multiplier using normal basis
Azarderakhsh et al. Systolic Gaussian normal basis multiplier architectures suitable for high-performance applications
CN107992283A (en) A kind of method and apparatus that finite field multiplier is realized based on dimensionality reduction
Hossain et al. Efficient fpga implementation of modular arithmetic for elliptic curve cryptography
Huang et al. Non-XOR approach for low-cost bit-parallel polynomial basis multiplier over GF (2m)
Surendran et al. Implementation of fast multiplier using modified Radix-4 booth algorithm with redundant binary adder for low energy applications
Timarchi et al. A novel high-speed low-power binary signed-digit adder
Piestrak Design of multi-residue generators using shared logic
Kadu et al. Hardware implementation of efficient elliptic curve scalar multiplication using vedic multiplier
CN104252332A (en) Multiplier and multiplier processing element for ellipse cipher apparatus
Xie et al. Low latency systolic multipliers for finite field GF (2 m) based on irreducible polynomials
Son et al. Design and implementation of scalable low-power Montgomery multiplier
Lee et al. Low complexity digit-serial multiplier over GF (2^ m) using Karatsuba technology
Modugu et al. A fast low-power modulo 2 n+ 1 multiplier design
Bobade et al. VLSI architecture for an area efficient Elliptic Curve Cryptographic processor for embedded systems
Moayedi et al. Design and evaluation of novel effective Montgomery modular multiplication architecture
Rezai et al. A new CMM-NAF modular exponentiation algorithm by using a new modular multiplication algorithm
Fariddin et al. Design of High Speed and Area efficient modified Kogge Stone Multiplier Using ZFL
Renita et al. Implementation and performance analysis of elliptic curve cryptography using an efficient multiplier
Realpe-Muñoz et al. High-Performance Architectures for Finite Field Inversion Over GF (2163)
Trujillo-Olaya et al. Half-matrix normal basis multiplier over GF ($ p^{m} $)

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180918

Termination date: 20190820