CN104252332B

CN104252332B - A kind of multiplier processing unit and multiplier for elliptic curves cryptosystem device

Info

Publication number: CN104252332B
Application number: CN201410414896.8A
Authority: CN
Inventors: 潘正祥; 杨春生; 李秋莹; 闫立军; 蔡正富
Original assignee: Airmate Electrical Shenzhen Co Ltd; Shenzhen Graduate School Harbin Institute of Technology
Current assignee: Airmate Electrical Shenzhen Co Ltd; Shenzhen Graduate School Harbin Institute of Technology
Priority date: 2014-08-20
Filing date: 2014-08-20
Publication date: 2018-09-18
Anticipated expiration: 2034-08-20
Also published as: CN104252332A

Abstract

The present invention relates to a kind of multiplier processing unit PE for elliptic curves cryptosystem device, including computing unit, input terminal B_in, input terminal C_in, input terminal X_in, output end B_out, output end C_outAnd output end X_out, the input terminal B_in, input terminal C_inAnd input terminal X_inComputing unit is inputted respectively, from the output end B of the computing unit after calculation processing_out, output end C_outAnd output end X_outIt exports, B in the computing unit_in、X_inRing shift left d is carried out, ring shift left d is：B_out=B_in＜＜ d, X_out=X_in＜＜ d, B in computing unit_in、X_inOperation values and C_inD additions of ring shift right are carried out, formula is：C_out=C_in>>d+L(B_in, X_in), wherein C_inIt is a upper processing unit PE as a result, for first processing unit PEC_inIt is initially zero, C_outIt is that processing unit PE calculates output product as a result, input as next processing unit PE, d is expressed as numerical digit length, and the hop count that k is expressed as point, L is operation mark.By the calculating for carrying out shifting processing and J functions when calculating so that processing unit arithmetic speed is fast, and computation complexity is low so that the performance of scrambler improves.

Description

A kind of multiplier processing unit and multiplier for elliptic curves cryptosystem device

Technical field

The invention belongs to the multiplier processing that digital coding field more particularly to a kind of finite field are suitable for elliptic curves cryptosystem device Unit and multiplier.

Background technology

Recent years, the effective of finite field operations, high-performance and low complex design and its application have been obtained for very much Concern.For example, the algorithm and system of scrambler need to meet American National Standard and Institute for Research and Technology (National Institute of Standards and Technology, NIST) and American Institute of Electrical and Electronics Engineers The safety requirements that (Institute ofElectrical and Electronics Engineers, IEEE) is proposed, to reduce Potential attack, it is ensured that hardware security.One importance of scrambler is to reduce cost while resisting side channel analysis (side-channel attacks).In industrial quarters, error detection research field is much paid attention to, such as document [3-6], It can find out from the attack cryptographic system based on error analysis and side channel analysis.In practical applications, original to design often It needs to increase overhead, therefore they need effectively to design, and can tolerate and undertake this overhead.Recently, oval Scrambler meets public key cryptography requirement as a kind of effective technology, in many high-performance and security restriction application aspect Implemented.For example, the algorithm can make full use of mobile wireless ad hoc networks (Mobile Ad hoc NETworks, MANETs), confidence level and integrity checking are effectively provided.It is this to check without the concern for whether physical layer safety has danger Dangerous elliptic curve ciphers device is a kind of method based on elliptic curve Algebraic Structure in finite field, and the arithmetic operation of this method is determined The validity of the cryptographic system based on elliptic curve cipher device is determined.Therefore, many research work have paid attention in arithmetic element Effective, low complex degree become reconciled performance design, these units are close for elliptic curve cipher device and public key encryption algorithm (RSA) In code system.Nearest Gauss normal basis multiplier (Gaussian normal basis, GNB) has been widely applied to calculate ellipse Point multiplication (also referred to as scalar multiplication) in circular curve scrambler.It is worth noting that, this operation not only needs effectively Performance, and in temporal constraint application, its realization must be high-performance.

In two large-scale bit fields, domain multiplication can be designed by systolic arrays method and obtain the super of high speed and rule Large scale integrated circuit is realized.Systolic arrays will not encounter irregular circuit design.In other words, in two bit fields not With selection, their hardware configuration is modular closely similar.Its simultaneity, the balance of input and output and simple regular Design the features such as, be allowed to be suitable for performance application.Although in needing high-speed structures to apply, pulsation framework has obtained extensively It uses, but is typically premised on its area complexity is acceptable.For example, document [16] proposes a kind of optimization base pulsation multiplication Device, the multiplier have very strong systematicness, can be realized with data serial mode.This ripple multiplier is in document [17] High-performance is obtained on configurable hardware to realize.

[3] A.Yazdani, H.Sepahvand, M.Crow, and M.Ferdowsi, " Fault Detection and Mitigation in Multilevel Converter STATCOMs, " IEEE Trans.Ind.Electron., vol.58, No.4, pp.1307-1315,2011.

[4]M.A.A.Claudio-Sanchez, D.Theilliol, L.Vela- Valdes, P.Sibaja-Teran, L.Hernandez-Gonzalez, and J.Aguayo-Alquicira, " A Failure- Detection Strategy for IGBT Based on Gate-Voltage Behavior Applied to a Motor Drive System, " IEEE Trans.Ind.Electron., vol.58, no.5, pp.1625-1633,2011.

[5] T.A.Najafabadi, F.R.Salmasi, and P.Jabehdar-Maralani, " Detection and Isolation of Speed-, DC-Link Voltage-, and Current-Sensor Faults Based on an Adaptive Observer in Induction-Motor Drives, " IEEE Trans.Ind.Electron., vo1.58, No.5, pp.1662-1672,2011.

[6] S.Cruz, M.Ferreira, A.Mendes, and A.J.M.Cardoso, " Analysis and Diagnosis of Open-Circuit Faults in Matrix Converters, " IEEE Trans.Ind.Electron., vol.58, no.5, pp.1648-1661,2011.

[16] S.Kwon, " A Low Complexity and a Low Latency Bit Parallel Systolic Multiplier over GF (2m) Using an Optimal Normal Basis of Type II, " in Proc.IEEE Symp.Computer Arithmetic (Arith-16), pp.196-202,2003.

[17] J.Fan, D.Bailey, L.Batina, T.Guneysu, C.Paar, and I.Verbauwhede, " Breaking Elliptic Curves Cryptosystems using Reconfigurable Hardware, " in Proc.of 20th Int1 Conf.on Field Programmable Logic and Applications(FPL 2010), 2010, pp.133-138.

[18] A.Reyhani-Masoleh, " Efficient Algorithms and Architecturesfor Field Multiplication Using Gaussian Normal Bases, " IEEE Trans.Computers, Vol.55, no.1, pp.34-47, Jan.2006.

[20] R.Azarderakhsh and A.Reyhani-Masoleh, " A Modified Low Complexity Digit-Level Gaussian Normal Basis Multiplier, " in Proc.Intl Workshop Arithmetic of Finite Fields (WAIFI), vol.6087, pp.25-40,2010.

Invention content

The present invention provides a kind of multiplier processing unit for elliptic curves cryptosystem device, it is intended to solve existing processing unit and calculate The problem that speed is slow, operation time is grown.

The invention is realized in this way a kind of processing unit for elliptic curves cryptosystem device multiplier, multiplier processing is single First PE includes computing unit, input terminal B_in, input terminal C_in, input terminal X_in, output end B_out, output end C_outAnd output end X_out, The input terminal B_in, input terminal C_inAnd input terminal X_inComputing unit is inputted respectively, from the computing unit after calculation processing The output end B_out, output end C_outAnd output end X_outIt exports, B in the computing unit_in、X_inRing shift left d is carried out, Its ring shift left d is：B_out=B_in＜＜ d, X_out=X_in＜＜ d, B in computing unit_in、X_inOperation values and C_inIt is followed Ring moves to right d additions, and formula is：C_out=C_in＞＞ d+L (X_in, B_in), wherein C_inIt is the knot of a upper processing unit PE Fruit, for the C of first processing unit PE_inIt is initially zero, C_outIt is that processing unit PE calculates output product as a result, as under The input of one processing unit PE, d are expressed as numerical digit length, and the hop count that k is expressed as point, L is operation mark.

Another object of the present invention is to provide a kind of one-dimensional multiplier, which includes k claim 1 institute The multiplier processing unit PE stated an and summation circuit AC, the summation circuit is connected after the k processing unit PE series connection The input terminal of AC, each PE are to export to obtain by the calculating of last PE, and three parameters of input of first PE are B respectively₀, B₁..., B_n-1, 0,0 ..., 0, X₀, X₁..., X_n-1, wherein by cyclic shift obtains to the right after A backwards, output calculation formula is X：Wherein, A is multiplicand.

The present invention further technical solution be：The summation circuit AC includes that addition unit, temporary storage location and displacement are single Member, the shift unit output end connect the addition unit input terminal, and the addition unit output end connects the temporary list First input terminal, the temporary storage location output end connect the shift unit input terminal, and the summation circuit is to k PE processing unit The result once calculated carry out displacement and with the output results added next time of k PE processing unit.

Another object of the present invention is to provide a kind of two-dimentional multiplier, the two dimension multiplier include k claim 2 or One-dimensional multiplier, 2k-2 CS module, k-1 summation circuit AC1 described in 3 and summation circuit an AC2, k a described one It is in parallel to tie up multiplier, the first one-dimensional multiplier outputs connect the shift unit of the first summation circuit AC1, and k-1 is a The summation circuit AC1 series connection, the summation circuit AC1 of kth -1 connect with the summation circuit AC2, second to kth -1 The output end of a one-dimensional multiplier is connect with a summation circuit AC1 respectively, and second described one-dimensional to kth -1 The ends input B, the ends X of multiplier are separately connected a CS module, and the input terminal of the first one-dimensional multiplier directly inputs, Its operational formula is：

The present invention further technical solution be：The summation circuit AC1 includes shift unit and addition unit, the shifting Bit location output end connects the addition unit input terminal, the summation circuit AC1 to input carry out displacement and be connected described in One-dimensional multiplier output results added output, shift unit ring shift right kd.

The present invention further technical solution be：The summation circuit AC2 includes shift unit, addition unit and temporary list Member, the shift unit output end connect the addition unit input terminal, and the addition unit output end connects the temporary list Member, the temporary storage location output end connect the input terminal of the addition unit, and the shift unit is to inputting numerical value ring shift right k²D；The summation circuit AC1 includes shift unit and addition unit, and the shift unit output end connects the addition list First input terminal.

The present invention further technical solution be：The CS modules are used to carry out ring shift right kd to the numerical value of input.

The beneficial effects of the invention are as follows：By the calculating for carrying out shifting processing and J functions when calculating so that processing unit Arithmetic speed is fast, and computation complexity is low so that the performance of scrambler improves.The present invention is one proposed based on systolic array architecture Kind multiplier, therefore it is easy to realize that there is low latency, high performance nature in VLSI systems.

Description of the drawings

Fig. 1 is the DL-PIPO GNB multiplier circuits of foundation of the present invention；

Fig. 2 is the structure chart of processing unit PE provided in an embodiment of the present invention；

Fig. 3 is one-dimensional multiplier circuit provided in an embodiment of the present invention；

Fig. 4 is two-dimentional multiplier circuit provided in an embodiment of the present invention.

Specific implementation mode

Fig. 2 shows the processing unit provided by the present invention for elliptic curves cryptosystem device multiplier, the multiplier processing units PE includes computing unit, input terminal B_in, input terminal C_in, input terminal X_in, output end B_out, output end C_outAnd output end X_out, institute State input terminal B_in, input terminal C_inAnd input terminal X_inComputing unit is inputted respectively, from the computing unit after calculation processing The output end B_out, output end C_outAnd output end X_outIt exports, B in the computing unit_in、X_inRing shift left d is carried out, Ring shift left d is：B_out=B_in＜＜ d, X_out=X_in＜＜ d, B in computing unit_in、X_inOperation values and C_inIt is recycled D additions are moved to right, formula is：C_out=C_in＞＞ d+L (B_in, X_in), wherein C_inA upper processing unit PE as a result, For the C of first processing unit PE_inIt is initially zero, C_outIt is that processing unit PE calculates output product as a result, as next The input of processing unit PE, d are expressed as numerical digit length, and the hop count that k is expressed as point, L is operation mark.By being carried out when calculating The calculating of shifting processing and j functions so that processing unit arithmetic speed is fast, and computation complexity is low so that the performance of scrambler carries It is high.

Fig. 3 shows that another object of the present invention is to provide a kind of one-dimensional multiplier, the one-dimensional multiplier includes k power Profit requires the multiplier processing unit PE described in a 1 and summation circuit AC, after the k processing unit PE series connection described in connection The input terminal of summation circuit AC, each PE are to export to obtain by the calculating of last PE, and three parameters of input of first PE are respectively B₀, B₁..., B_n-1, 0,0 ..., 0, X₀, X₁..., X_n-1, wherein X is shifted to obtain by A, and output calculation formula is：

The summation circuit AC includes addition unit, temporary storage location and shift unit, the shift unit output end connection The addition unit input terminal, the addition unit output end connect the temporary storage location input terminal, the temporary storage location output End connects the shift unit input terminal, and the result that the summation circuit once calculates k PE processing unit is shifted simultaneously With the output results added next time of k PE processing unit.

Fig. 4 shows that, another object of the present invention is to provide a kind of two-dimentional multiplier, which includes k power Profit requires one-dimensional multiplier described in 2 or 3,2k-2 CS module, k-1 summation circuit AC1 and summation circuit an AC2, k A one-dimensional multiplier is in parallel, and the first one-dimensional multiplier outputs connect the displacement list of the first summation circuit AC1 Member, the k-1 summation circuit AC1 series connection, the summation circuit AC1 of kth -1 connect with a summation circuit AC2, Second output end to -1 one-dimensional multiplier of kth is connect with a summation circuit respectively, second to kth -1 The ends input B, the ends X of a one-dimensional multiplier are separately connected a CS module, the input of the first one-dimensional multiplier End directly inputs, and operational formula is：

The summation circuit AC1 includes shift unit and addition unit, and the shift unit output end connects the addition Unit input terminal, the summation circuit AC1 carries out displacement to input and to export results added with the one-dimensional multiplier that is connected defeated Go out, shift unit ring shift right kd.

The summation circuit AC2 includes shift unit, addition unit and temporary storage location, the shift unit output end connection The addition unit input terminal, the addition unit output end connect the temporary storage location, the temporary storage location output end connection The input terminal of the addition unit, the shift unit is to inputting numerical value ring shift right k²D.

The CS modules are used to carry out ring shift right kd to the numerical value of input.

Underneath with decomposition method to obtain two kinds of new numerical digit GNB multipliers.

It takesAs the normal basis (Normal basis, NB) of GF (2m), wherein β ∈ GF (2^m).β is GF (2^m) in a regular element, such set is GF (2^m) normal basis.It is positive integer to take m and T so that p =mT+1 is a prime number and gcd (mT/k, m)=1, wherein k are the multiplication exponent numbers of 2 mould p.It is in GF (2 to take α^m) in one The unit primitive root of a mT+1 ranks.In, for any T ranks unit primitive root τ,It generates one and is based on the two of GF (2) Bit field GF (2^m) normal basisThe base is also referred to as T-type Gauss normal basis bottom (Gaussian Normal basis, GNB).The complexity (time and spatially) of GNB multipliers depends on their model T ＞ 1.NIST is built Five kind of two bit field has been discussed, this five kinds first fields are m=163,233,283,409 and 571.The T of this five kinds first fields is even number, point It Wei 4,2,6,4 and 10.

It is the multiplication matrix R based in document [18] that GNB multiplication, which calculates,_(m-1)*T.Take A=(a₀, a₁..., a_m-1), B= (b₀, b₁..., b_m-1) be two in GF (2^m) on T-type GNB elements.They are in GF (2^m) in product can be expressed as：

Wherein,

Here (X ＜＜ i) indicates to carry out i cyclic shift to the left to X ∈ GF (2m).Wherein X ⊙ Y=(x₀y₀..., x_m- ₁y_m-1),It represents and step-by-step and step-by-step XOR operation is carried out to the coefficient of X and Y.It is limited Domain multiplication may be designed to position grade (space complexity O (m) and time complexity O (m)), numerical digit rank (space complexity O (md) and Time complexity O (m/d)) and parallel-by-bit (space complexity O (m²) and time complexity be O (1)) framework.

Recently, number bit-parallel input parallel output (the digit-level parallel-in of low complex degree Parallel-out, DL-PIPO) GNB multipliers by document [18] [20] propose, Literature [20] is optimal.DL- PIPO frameworks are as shown in Figure 1.It will be seen that in this multiplier, two operands A and B (have been stored in deposit in advance Device<X>,<Y>In) should all retain in entire calculating process, and result should pass throughWhen 1≤d≤m Clock obtains simultaneously after the period.Notice for a given field size, numerical digit width d should be reasonably selected with lower the time and Space complexity.The time complexity of the GNB multipliers of numerical digit grade is

Area complexity be dm AND logic gate and

Logic gate.Formula elimination algorithm is reached using the identical sublist that document [20] proposes, Area complexity further decreases, as long asLogic gate, wherein

A.1-D digital level heart contraction structure

From matrix R_(m-1)×TSymmetrical structure in (1) can show that formula S (i, B) can be written as follows：

Therefore, for instead of matrix R_(m-1)×T, we can define matrixFor：

Wherein, u_k,It is the row k of matrix u.In Fig. 1, DL-PIPO GNB multiplier architectures are illustrated.It is false If input element A (is already loaded into register in advance<X>In) be expressed as again WhereinThen, matrix is utilizedA and B products can be obtained by formula：

Wherein, J (X, Y)=X ⊙ P (Y), P (Y)=(y₁, s ' (1, Y), s ' (2, Y) ..., s ' (2, Y), s ' (1, Y)),For each coordinate, J (X, Y) functions are obtained by displacement input parameter appropriate Result of calculation.These functions are that input B (is loaded into register in advance<Y>In) the weighted sum of each, and by matrixIt is determined with the position of input B.Matrix u is expressed as a P block again, which is used to calculate the linear combination of B, and It is realized by using XOR tree.

It takes1≤d≤m, then, we can write the product in (5) as：

Wherein,

Assuming that n and k, which are two integers, meets q=kn.Notice if q cannot be divided exactly by k, it would be desirable to X and B most Low order zero padding is so that it meets q=kn.By partial product C_iIt is defined as：

It is mentioned here according to top, integer k and index i sum number bit widths d, Wo Menyou：X_i=X ＞＞ kid, B_i=B ＞＞ kid.Product C in formula (6) can be compressed into n partial product：

Wherein product C is preferential by its most significant digit (most significant digit first, MSD-first) It indicates.In order to calculate the partial product C in formula (8)_i,

Assuming thatIt is it Preceding decision.Each partial product C can be expressed as again：

Algorithm 1 describes the use (9) of proposition and the 1-D heart contraction GNB multiplication of (10).According to algorithm 1, Fig. 2,3 are retouched The 1-D digital level heart contraction GNB multipliers proposed are stated.Fig. 3 illustrates the numerical digit grade heart contraction multiplier of proposition.I It can be found that propose structure by k processing unit (processing element, PE) composition, i.e. PE₀, PE₁..., PE_k-1With a summation circuit (accumulation circuit, AC).Shown in the core circuit calculation formula (7) of Fig. 1 Multiplication.Therefore, we can build PE circuits as shown in Figure 2 with the circuit of Fig. 1.Each PE realizes algorithm 1 by calculating In step 8 and 9, AC circuits realize step 11.

We explain the multiplication step shown in figure 3.In view of PE in fig. 2 is operated and algorithm 1, PE_jOutput B C is accumulated for calculating section_iIt is expressed as B_{I, j}.In the initial step, register<C>It is initialized to 0, X_iAnd B_iWherein 0≤i ＜ n- 1 is calculated by circulative shift operation realization.In first clock cycle, two element X_n-1And B_n-11- as proposition The input of D heart contraction multipliers goes calculating section to accumulate C_n-1.In the next clock cycle, two element X_n-2And B_n-2By with It is input in the heart contraction multiplier of proposition and accumulates C to calculating section_n-2, and so on.Each partial results, C_iS, By k PE be calculated and there are registers<C>In, it needs the k+1 clock cycle altogether.Therefore, for proposing 1-D heart contraction multipliers, GNB multiplication C=AB completes after k+n clock cycle.Following proposition is presented in we Remove the clock periodicity of the heart contraction multiplier of the 1-D digital levels of measurement proposition.

Proposition 1.For the T-type GNB in the domain each GF (2m), the 1-D digital level parallel-by-bits of proposition export heart contraction As many as multiplier needsA clock cycle, d is the numerical digit width of selection here.In other words, the 1-D numbers of proposition The delay of the heart contraction multiplier of rank is

It proves：GNB multiplication is divided into q sections of calculating.The digital level heart contraction structure of proposition is provided in Fig. 2,3, Assuming that we have k PEs and AC, therefore, GNB multiplication can also be divided into n partial results to wherein q=kn.,WhereinWith X_iAnd B_iPartial product C_iBy with It is used as the input of heart contraction array multiplier PEs.Entire GNB multiplication needs k+n clock cycle.For given q= Kn, if k (quantity of PEs) very little, n (quantity of PEs input elements) will become very big, and therefore, delay (k+n) will become It obtains very big.The delay and realization of minimum are for being worth prodigious m in order to obtain, and multiplication has good performance in finite field, I Need to reduceTherefore, first derivative should be equal to 0,This is neededTherefore, I SelectionThe delay of the multiplier of proposition becomesA clock cycle.IfIt is not one completely flat Side, then result of calculation even will also lack several clock cycle.This demonstrates our proposition completely.

Conclusion 1.According to proposition 1, if the quantity of d=1 and PEs byDetermine, then it is proposed that the 1-D hearts It is dirty shrink GNB multipliers delay be at mostA period always.

In order to talk clearly top be directed to it is proposed that 1-D heart contraction GNB multipliers discussion, we use following example Son illustrates operations of the PE in the different clocks period.

Example 1.It takes For two 6 type GNB elements in GF (227),We assume that the numerical digit width d=3 of selection.Then, Wo Menyou According to (9), product C can be expressed asWherein

For i=0,1,2.Table 1 lists the operation of each PE in each period always.It was noticed that for carrying The 1-D digital level heart contraction GNB multipliers gone out need 6 clock cycle.

Fig. 2,3 are realized using 1-D heart contraction arrays, and the GNB multipliers of presentation include k PEs and AC circuit.Often A PE circuits are made of the structure in Fig. 1, it includes dm AND,And three m For register.Each PE unit core paths postponeAC circuits include a m GF (2m) adders and one m registers of position.Fig. 2 for providing, 3 structures, the delay of the GNB multipliers of proposition areA clock cycle.

B, 2-D digital levels heart contraction structure

In this section, in order to obtain high performance realization, we show a 2-D heart contraction GNB multiplier, for Compared to segmentation heart contraction structure in front, it can reach higher performance to decimal bit width (or the field width degree beaten).Take k It is that two integers meet q=k with n²n.It notices if q cannot be by k²Divide exactly.We can mend X and B 0 so that it meets q= k²n.In order to 2-D digital level heart contraction multiplier derivations, we compress (6) be n partial results and be：

Wherein,

X_ij=X ＞＞ k²Id+kjd, B_ij=B ＞＞ k²id+kjd

Each partial results C_ijAll it is k partial product and the partial results C in (12) in (7)_iIt is k partial results C_ijSum.Realize that calculating section accumulates C to reach a complete assembly line_i, we define each partial results C_ijAll by the 1-D hearts Dirty systolic array structure is realized, as Fig. 2 is presented.In this regard, the calculating C of proposition_i2-D heart contraction array multiplications Device is shown in figure 3.In figure 3, by k 1-D heart contraction array, (k-1) is a to follow the 2-D heart contractions multiplier of proposition Ring shift circuit, (k-1) a AC1 structures and an AC2 structure are constituted.Each CS modules provide kd cyclic shifts to the right It need to the rewiring in hardware realization.1-D heart contractions array [i] (1-D systolic array [i]) is used in figure Realize k partial product C_ijAdduction.

Proposition 2.Domain is taken to be made of the T types GNB of even number, when the 2-D heart contraction GNB multipliers of proposition need maximum Between delay beThe quantity of a clock cycle, PEs is

It proves：It is that two positive integers meet to take k and nWherein d is the numerical digit width of selection.GNB multiplication uses k²A PEs removes structure 2-D heart contraction array structures, if C of this circuit counting in (12)_i, then when we have 2k The clock period.Therefore, the GNB multiplication shown in (11) needs 2k+n clock cycle.Similar proposition 1, Single order lead and need to be zero, i.e.,It is required thatSo we select2-D The time delay that heart contraction multiplier can obtain isA clock cycle.WhenIt is not complete cube When, calculating will need the less clock cycle, hence it is demonstrated that our proposition.

Conclusion 2.According to proposition 2, it is proposed that the delays of 2-D heart contraction GNB multipliers be at mostWhen a The clock period.

Fig. 4 are realized by using 2-D heart contraction arrays it is proposed that 2-D heart contractions multiplier byA AC1,1 AC2 composition.By using this structure, The minimum delay of GNB multipliers can reachA clock cycle.It was noted that comparing 1-D heart contraction multiplication Device, if selection decimal bit width (or big field width degree) delay will be lower.For example, it is 1, GF to select numerical digit width (2⁴⁰⁹) 2-D heart contraction multipliers under domain delay be 24 clock cycle with numerical digit width be 3 in the case of 1-D hearts The delay of contractive multiplication device is identical.This means that the effect of 2-D heart contractions multiplier is more preferable under these conditions.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention All any modification, equivalent and improvement etc., should all be included in the protection scope of the present invention made by within refreshing and principle.

Claims

1. a kind of multiplier processing unit PE for elliptic curves cryptosystem device, which is characterized in that multiplier processing unit PE includes Computing unit, input terminal B_in, input terminal C_in, input terminal X_in, output end B_out, output end C_outAnd output end X_out, the input Hold B_in, input terminal C_inAnd input terminal X_inComputing unit is inputted respectively, from the described defeated of the computing unit after calculation processing Outlet B_out, output end C_outAnd output end X_outIt exports, B in the computing unit_in、X_inRing shift left d is carried out, cycle is left Moving d is：B_out=B_in＜＜ d, X_out=X_in＜＜ d, B in computing unit_in、X_inOperation values and C_inCarry out ring shift right d Position is added, and formula is：C_out=C_in＞＞ d+L (B_in, X_in), wherein C_inIt is a upper processing unit PE as a result, for The C of one processing unit PE_inIt is initially zero, C_outIt is that processing unit PE calculates output product as a result, single as next processing The input of first PE, d are expressed as numerical digit length, and L identifies for operation,Wherein, J (X, Y) =X ⊙ P (Y), input B are loaded into register Y, and P (Y) is used to calculate the linear combination of B.

2. a kind of one-dimensional multiplier, which is characterized in that the one-dimensional multiplier includes k multiplier processing described in claim 1 A unit PE and summation circuit AC, the summation circuit AC, the input of each PE are connected after the k processing unit PE series connection End is to export to obtain by the calculating of last PE, and three parameters of input of first PE are B respectively₀, B₁..., B_n-1, 0,0 ..., 0, X₀, X₁..., X_n-1, wherein X is obtained by cycle shifting to the right one after A backwards, and output calculation formula is：

Wherein, A is to be multiplied operand.

3. one-dimensional multiplier according to claim 2, which is characterized in that the summation circuit AC include addition unit, temporarily Memory cell and shift unit, the shift unit output end connect the addition unit input terminal, the addition unit output end The temporary storage location input terminal is connected, the temporary storage location output end connects the shift unit input terminal, the summation circuit To the result that k PE processing unit once calculates carry out displacement and with the output results added next time of k PE processing unit.

4. a kind of two dimension multiplier, which is characterized in that the two dimension multiplier includes that one-dimensional described in k Claims 2 or 3 multiplies Musical instruments used in a Buddhist or Taoist mass, 2k-2 CS module, k-1 summation circuit AC1 and an one-dimensional multiplier of summation circuit AC2, k are in parallel, first A one-dimensional multiplier outputs connect the shift unit of the first summation circuit AC1, the k-1 summation circuit AC1 Series connection, the summation circuit AC1 of kth -1 connect with a summation circuit AC2, and second described one-dimensional to kth -1 The output end of multiplier is connect with a summation circuit AC1 respectively, second to -1 one-dimensional multiplier of kth defeated Enter the ends B, the ends X are separately connected a CS module, the input terminal of the first one-dimensional multiplier directly inputs, operational formula For：

5. it is according to claim 4 two dimension multiplier, which is characterized in that the summation circuit AC1 include shift unit and Addition unit, the shift unit output end connect the addition unit input terminal, and the summation circuit AC1 moves input Position simultaneously exports, shift unit ring shift right kd with the one-dimensional multiplier output results added that is connected.

6. two dimension multiplier according to claim 5, which is characterized in that the summation circuit AC2 includes shift unit, adds Method unit and temporary storage location, the shift unit output end connect the addition unit input terminal, the addition unit output end The temporary storage location is connected, the temporary storage location output end connects the input terminal of the addition unit, and the shift unit is to defeated Enter numerical value ring shift right k²D；The summation circuit AC1 includes shift unit and addition unit, the shift unit output end Connect the addition unit input terminal.