CN106020949B

CN106020949B - A kind of fast parallel calculation method of optimal extension field element multiplication

Info

Publication number: CN106020949B
Application number: CN201610305021.3A
Authority: CN
Inventors: 咸鹤群; 程相国; 张曙光; 张曼
Original assignee: Qingdao University
Current assignee: Qingdao University
Priority date: 2016-05-09
Filing date: 2016-05-09
Publication date: 2019-08-06
Anticipated expiration: 2036-05-09
Also published as: CN106020949A

Abstract

The present invention provides a kind of fast parallel calculation methods of optimal extension field element multiplication comprising: the first step is the optimal extension field element arithmetic unit java class of the multiplying design specialized of optimal extension field element；Second step, the multiplication calculating main body function and drop time designed in RenderScript calculates main body function, the two functions are by java class object by the calculating core of RenderScript enforcement engine concurrent invocation, and when defined function is directed to the first address of single internal storage location or a collection of internal storage location with homogeneity characteristic.This method realizes quick parallel polynomial modular multiplication using RenderScript programming interface and parallel processing mechanism in Android platform.

Description

A kind of fast parallel calculation method of optimal extension field element multiplication

Technical field

The present invention relates to field of computer technology more particularly to a kind of optimal expansions based on Renderscript programming framework Domain multiplication calculation method.

Background technique

Finite field F_p ^mReferred to as an optimal extension field OEF (optimal extension field), if p=2ⁿ- c (its Middle P is prime number, and for range within 2^64, Integer n and c meet log2 | c |≤n/2), and there are an irreducible function f (z)=z^m-ω.Element in the finite field is the multinomial that most high order is no more than m-1, and multinomial coefficient is F_pIn element.Most The adding, subtract, multiplying of element in excellent expansion domain, square, the basic operations such as invert be mould f (z) multinomial operation, the fortune of multinomial coefficient At last in F_pDefined in, i.e. the arithmetical operation of mould p.

When c is 1 or -1, which is referred to as I type；If ω=2, optimal extension field is referred to as II type 's.It is some common optimal extension field parameters shown in table 1.

Table 1.OEF parameter citing

p	f	parameters	Type
				2⁷+3	z¹³-5	N=7, c=-3, m=13, ω=5	-
2¹³-1	z¹³-2	N=13, c=1, m=13, ω=2	I, II
				2⁵⁷-13	z³-2	N=57, c=13, m=3, ω=2	II
2³¹-19	z⁶-2	N=31, c=19, m=6, ω=2	II
				2⁵⁷-13	z³-2	N=57, c=13, m=3, ω=2	II
2⁶¹-1	z³-37	N=61, c=1, m=3, ω=37	I
				2³¹-1	z⁶-7	N=31, c=1, m=6, ω=7	I

Optimal extension field can be used to construct elliptic curve, it is advantageous that: the size of prime number p can be chosen at no more than one Within the scope of the word length of a word, the safety of cryptographic system is improved by the collective effect of m and p.So, in evaluator The problems such as can simplify or avoid big integer arithmetic carry when coefficient operation, to improve software realization efficiency.

In the basic operation of optimal extension field element, the complexity of polynomial multiplication is higher, while multiplication is the fortune such as invert again The basis of calculation.

Darrel Hankerson etc. is provided in " Guide to Elliptic Curve Cryptography " book The principle description of optimal extension field element multiplication.Optimal extension field element a is multiplied with b can be used the common multinomial of coefficient module p Formula multiplication modulo irreducible function f (z) is realized.Formula are as follows:

Wherein

This method only gives principle description, without reference to details is realized, does not account for co-efficient multiplication and modular arithmetic The problems such as spilling in journey, carry.

Karatsuba-Ofman method uses (divide-and-conquer) algorithm idea, it is possible to reduce co-efficient multiplication The number of operation.By way of multinomial a respectively being resolved into two multinomials with b and being added, reduced using multiplication apportionment ratio Multiplication number.Such as:

A (z) b (z)=(A₁z^l+A₀)(B₁z^l+B₀)

=A₁B₁z^2l+[(A₁+A₀)(B₁+B₀)-A₁B₁-A₀B₀]z^l+A₀B₀

WhereinA₀, A₁, B₀, B₁It is all the multinomial that number is no more than l.The process can be carried out with recurrence, I.e. by A₀, A₁, B₀, B₁It decomposes again.In Karatsuba-Ofman method, multinomial is successively repeatedly decomposed, when executing calculating It needs to rely on the calculated result of subsequent decomposition, can not accomplish that the data of each calculating process are uncorrelated, therefore can not be at parallelization Reason, cannot embody multi-core parallel computation advantage.In addition, multinomial cutting procedure realizes complex, the direct shadow of dividing method Execution efficiency is rung, for lesser m value, it is larger to load (overhead).

The calculating process of optimal extension field element multiplication, which can be understood as two multinomials, to carry out common multinomial first and multiplies Then method executes mould f (z) operation.Since f (z) has f (z)=z^m-- the special shape of ω can be incited somebody to action when calculating mould f (z) All z^mIt is replaced with ω, it thus can be to avoid polynomial division, to accomplish to drop item by item secondary.

Such as: in irreducible function f (z)=z⁵- 2 and p=31 is to calculate two members of a and b in the optimal extension field of parameter The multiplication of element, in which:

A=z⁴+5z²+3z+7

B=9z²+z

Its ordinary polynomials multiplication result is c=9z⁶+z⁵+14z⁴+z³+7z²(coefficient module p), drop time is by institute to+7z item by item Some z⁵Replace with 2, i.e. c=9z × z⁵+z⁵+14z⁴+z³+7z²+ 7z=18z+2+14z⁴+z³+7z²+ 7z=14z⁴+z³+7z²+ 25z+2

On the elliptic curve of optimal extension field construction, the realization of primary times of point or add operation needs to call up to tens Secondary optimal extension field element operation.Therefore, that improves the execution speed of optimal extension field operation, especially multiplying executes speed Degree, the execution efficiency for improving elliptic curve cryptosystem have very important significance, and find quickly and effectively optimal expansion Field element calculation method is very necessary.

It is largely based on serial algorithm in current optimal extension field calculation method.This is used in Android platform When a little methods, what is utilized is the computing resource of CPU.On PC platform used parallel computing CUDA frame can not It is used in android system.

RenderScript is the programming frame for running 3D rendering in a set of Android platform and handling intensive calculating task Frame, what is be mainly directed towards is the calculating task with parallel data processing feature.The operating mechanism of RenderScript is can to incite somebody to action Calculating task parallelization, assigns them to all available processor units in mobile device, for example, multicore CPU, GPU or DSP.When developer is developed using RenderScript programming framework, the framework difference of target device can also be ignored, because RenderScript code uses compiling at runtime and caching technology, can find automatically and using all kinds of places on target device Manage device resource.If the target device of operation program does not have any GPU or DSP, RenderScript engine can appoint calculating Business transfers to CPU to complete completely, and therefore, RenderScript programming framework has high device independence and portability. RenderScript can significantly improve the application journey of image processing class, computer vision class and high-performance calculation class The speed of service of sequence increases the executive capability and computing capability of Android native language.RenderScript is marked using c99 Standard is the programming framework of a type C grammer.Since Android4.3 version, RenderScript becomes in android system Unique parallel computation programming framework.

Following steps can be summarized as using the conventional method of RenderScript programming framework in Android platform:

Firstly, creating the calculating core document of the entitled rs of suffix in Android engineering created, and it is stored in Under the src catalogue of engineering, it is fixed that pragma statement, corresponding java class declaration and main body calculating function are contained in this document Justice.It is by the important means of calculating task parallelization that main body, which calculates function, and in Android application program, main body calculates function By Android apply in upper layer java class object called in the concurrent mode of more examples.Each concurrent function example is independent Execution calculating task, it will usually access the internal storage location being isolated from each other.

Second step be under same catalogue creation be used to calling main body calculate function upper layer java class, so as to in the first step Rs file establish connection.

Finally, creating a RenderScript class object in Android application program, and then use the Object Creation the Java class object in upper layer described in two steps distributes it resource and initializes.By creating and using Allocation class pair As data are swapped and are replicated between java program internal memory space and RenderScript engine memory headroom. RenderScript class and Allocation class are all classes preset in Android development platform, as long as making in java program Importing relevant packet i.e. with import order can be used.

The present invention designs a kind of new storage organization and calculating by improving to the existing serial algorithm of optimal extension field Method is realized quickly parallel multinomial using the RenderScript programming interface and parallel processing mechanism in Android platform Formula modular multiplication.

Summary of the invention

The technical problem to be solved by the present invention is to be improved to the existing serial algorithm of optimal extension field, design a kind of new Storage organization and calculation method, utilize the RenderScript programming interface and parallel processing mechanism in Android platform, it is real Now quick parallel polynomial modular multiplication.

In order to solve the above technical problems, the present invention provides a kind of fast parallel calculating sides of optimal extension field element multiplication Method comprising:

The first step is the optimal extension field element arithmetic unit java class of the multiplying design specialized of optimal extension field element；

Second step, the multiplication calculating main body function and drop time designed in RenderScript calculate main body function, the two Function is by java class object by the calculating core of RenderScript enforcement engine concurrent invocation, what when defined function was directed to It is the first address of single internal storage location or a collection of internal storage location with homogeneity characteristic.

The first step further comprises:

A step, is the java class of optimal extension field element operation design specialized, defines three class members's one-dimension array variables, Two of them are used to store two multinomials for participating in operation (multiplication), another storage (multiplication) calculated result；

B step is optimal extension field element arithmetic unit class constructing definitions function, and each member is initialized in constructed fuction and is become Amount；

C step, defines polynomial multiplication method for optimal extension field element arithmetic unit class, programs frame using Renderscript The memory management interfaces bind that frame provides, passes to RenderScript for the corresponding Allocation class object of two multipliers and holds Row engine is stored in two array type variables of RenderScript enforcement engine memory headroom.

The second step further specifically includes:

A ' step, the multiplication designed in RenderScript calculate main body function, define two parameters: one be using One element of array of Allocation object storage；Another parameter is the offset of parallel calling, by RenderScript Enforcement engine automatic assignment, for each called main body function example, the value of offset parameter is different, main Body function determines position of the unit in result array according to the value of offset, so that it is determined that the calculation method of the cell value；

B ' step, the drop time designed in RenderScript calculate main body function, define two parameters: one be using One element of array of Allocation object storage；Another parameter is the offset of parallel calling, by RenderScript Enforcement engine automatic assignment；

C ' step, design multiplication calculate the mould p multiplication mechanism that can be avoided Overflow handling in main body function.

A step is further specially to define three Allocation class objects to be used as class members's variable, be used for Renderscrip engine transmits data, defines three class members's lint-long integer variables, and optimal storage expands field parameter m, ω and p.

The b step is further specially the value according to parameter m, and equal-sized two Allocation classes are respectively created Object, creation are used to store the third Allocation class object of calculated result.

The c step is further specially that the multiplication in RenderScript is called to calculate the concurrent operation of main body function, so It calls the drop time in RenderScript to calculate main body function afterwards and completes the operation of mould irreducible function.Finally, using The memory management interfaces copyTo that Renderscript programming framework provides will be stored in third Allocation class object Calculated result is passed back in the member variable of java class.

In a ' step, multiplication is calculated in main body function, is determined according to offset and is read from two multiplier multinomials Take corresponding term coefficient, carry out mould p multiplication be added to obtain cell value result.

In the b ' step, in the implementation procedure that drop time calculates main body function, which is determined according to the value of offset Position of the member in result array, and then judge that its respective items number whether more than m, is no more than m for those respective items numbers Item find the number element in array corresponding with oneself identical item after drop time, be stored in this unit after being added with it.

In c ' step, when calculating two number s and r mould p multiplication, multiplier s is expressed as binary form；Use one Accumulator variable t and be arranged its initial value be 0；Right-to-left traverses the binary string of s by turn, and one judgement of every access once should Whether position is 1, and r value adds up into t if 1, and t mould p is saved, and then sets r+r mould p for r value；It is completed wait traverse Afterwards, stored in t be s be multiplied with r mould p's as a result, if machine word-length be w, as long as the value of p is not more than^w-1, above-mentioned Process would not generate overflow problem, i.e., the end value of all operations does not all exceed the expression range an of word length.

Beneficial effects of the present invention:

Based on a kind of optimal extension field element multiplying parallel by renderscript programming framework provided by the invention Calculation method meets: (1) fast implementing optimal extension field element multiplying；It (2) can be real in any android system equipment It is existing.

Detailed description of the invention

Fig. 1 multiplication concurrent operation data cell schematic diagram；

Fig. 2 drop time calculates main body function data cell schematics.

Specific embodiment

The present invention provides a kind of fast parallel calculation methods of optimal extension field element multiplication comprising:

The first step further comprises:

C step, defines polynomial multiplication method for optimal extension field element arithmetic unit class, programs frame using Renderscript The memory management interfaces bind that frame provides, passes to RenderScript for the corresponding Allocation class object of two multipliers and holds Row engine is stored in two array type variables of RenderScript enforcement engine memory headroom；

Such as: for multiplication result multinomial c=9z above⁶+z⁵+14z⁴+z³+7z²+ 7z corresponds to The value of each element is respectively { 9,1,14,1,7,7,0 }, each element quilt in the coefficient array of Allocation object storage One drop time calculates main body function and individually handles, since the parameter of its optimal extension field is f (z)=z^b- 2 and p=31, therefore only 14,1,7,7,0 these elements are handled by function.If some drop time calculates, main body function is called and offset parameter is 1, The function instance processes be exactly 7z this, according to above-mentioned rule, which can store from Allocation object 9z is found in coefficient array⁶This coefficient value 9, and be added with its mould p, it writes the result into currentElement.Obtain 16z's Coefficient 16.

In c ' step, the mould p multiplying for needing frequently to use multinomial coefficient in main body function is calculated in multiplication, to the greatest extent Pipe can be selected within the scope of the word length no more than a word (such as 64) when choosing p value, but two F_pIn The product of element is likely to the word length range beyond a word.Therefore it is not available simple multiplying and realizes multinomial coefficient Multiplication operation.The present invention solves the problems, such as this with the following method: when calculating two number s and r mould p multiplication, multiplier s being indicated For binary form；Using an accumulator variable t and be arranged its initial value be 0；Right-to-left traverses the binary system of s by turn String, every access one judges whether the primary position is 1, and r value adds up into t if 1, and t mould p is saved, and then sets r value It is set to r+r mould p；After the completion of traversing, what is stored in t is that s is multiplied the result of mould p with r.If machine word-length is w, as long as The value of p is not more than 2w-1, and the above process would not generate overflow problem, i.e., the end value of all operations does not all exceed one The expression range of word length.

Present invention offer can provide new operation method to the optimal extension field element multiplication in Android platform, lead to It crosses parallel method and significantly improves calculated performance.

Intermediate result carry overflow problem is avoided during processing system digital-to-analogue p multiplication, was calculated to simplify Journey has saved the calculating time.

The optimal extension field multiplying that the present invention and Android platform are realized using serial approach compares, test It was found that parallel method used in the present invention is with the obvious advantage.

The present invention will be described in detail below with reference to the drawings of preferred embodiments, whereby to the present invention how applied technology method Technical problem is solved, and the realization process for reaching technical effect can fully understand and implement.

The fast parallel calculation method of a kind of optimal extension field element multiplication provided by the invention, Step 1: being optimal extension field The optimal extension field element arithmetic unit java class of the multiplying design specialized of element.Class is defined in file and is ordered by using import It enables and imports android.renderscript.Allocation, android.renderscript.Element and android. Tri- program bags of renderscript.RenderScript, to use built-in RenderScript object.Arithmetic unit Java The definition detailed process of class are as follows:

1.1) member variable number group a and b are defined in optimal extension field element arithmetic unit java class, participate in multiplication fortune for storing The two optimal extension field elements calculated define member's variable number group c and are used to store multiplication result.Each element of three arrays It is all the integer that length is 64 bits.In addition the member variable of three integer types is defined, for recording above-mentioned array space hold Situation, i.e., used subscript maximum value.Three class members's lint-long integer variables are defined, optimal storage expands field parameter m, ω and p.

For an optimal extension field element, multinomial coefficient is stored in array according to the sequence from low order to high order.Such as:

Multinomial 9x⁶+5x⁴Storage mode of+the 6x+3 in array is a []={ 9,0,5,0,0,6,3 }

The member variable that 3 Allocation object types are defined in optimal extension field element arithmetic unit java class, is used respectively To transmit multiplier and operation result data to RenderScript computing engines.In the creation of arithmetic unit class object, need to specify Optimal extension field parameter m, ω and p.

1.2) optimal extension field element multiplication calculation method is defined: the memory pipe provided using Renderscript programming framework Interface bind is managed, the content of the corresponding Allocation class object of two multipliers is assigned to Renderscript and calculates main body letter Two aray variables in number memory headroom.

Master is calculated using the calling multiplication of the forEach_functionName interface concurrent of ScriptC_mono class object Body function, functionName are the titles that multiplication calculates main body function.Using third Allocation class object as parameter It is passed to the calling process, so that the independent unit of account as each concurrent function example of each element therein.Operation is complete Cheng Hou, what is stored in third Allocation class object is exactly the multiplication result of ordinary polynomials multiplication.

Using the calling of the forEach_functionName interface concurrent of ScriptC_mono class object, time calculating master drops Body function, functionName are the titles that drop time calculates main body function.Still using third Allocation class object as Parameter is passed to the calling process, so that the independent unit of account as each concurrent function example of each element therein.Fortune After the completion of calculation, what is stored in the preceding m element of third Allocation class object array is exactly mould f (z) multiplication calculation result. After the completion of operation, result is stored in member variable array c and is returned.

The wherein calling main body function by the way of forEach_functionName, can allow multiple main body function examples By concurrent calling, to achieve the purpose that boosting algorithm execution efficiency.

Step 2: the multiplication in design RenderScript calculates main body function and drop time calculates main body function, the two Function is by java class object by the calculating core of RenderScript enforcement engine concurrent invocation, what when defined function was directed to It is the first address of single internal storage location or a collection of internal storage location with homogeneity characteristic.

2.1) multiplication calculates main body function

Multiplication in RenderScript calculates there are two the parameters of main body function, and one is using Allocation object An element in array；Another parameter is the offset of parallel calling, i.e. the array element of first parameter instruction is whole Offset in a array.Since the main body function is first by RenderScript enforcement engine concurrent invocation Parameter need by Allocation object transmit memory obtain, and second parameter be in concurrent invocation by RenderScript enforcement engine automatic assignment.

The specific execution step of multiplication calculating main body function are as follows: be directed to offset x, found from two multiplier arrays all The A [i] of these element centerings is distinguished after mould p is multiplied again < A [i], B [j] > by element of the sum of the subscript equal to x with B [j] Mould p is added, and is as a result stored in currentElement, that is, C [x].Attached drawing 1 is multiplication concurrent operation data cell schematic diagram of the invention.

The wherein implementation procedure of mould p multiplication operation are as follows:

A [i] * B [j] modp is calculated, B [j] is expressed as binary form (b_w-1, b_w-2..., b₂, b₁, b₀), wherein w is Word length.Spilling in multiplication process in order to prevent, we select p to make p no more than 2^w-1.Initialize an accumulator variable t simultaneously It is 0 that its initial value, which is arranged,；If the lowest order of B [j] is 1, A [i] is added with t mould p, and result is saved in t.By A [i] is stored in A [i] after being added with itself mould p, is repeated the above process until B [j] is equal to zero after B [j] is moved to right one.

2.2) drop time calculates main body function

There are two the parameters of the calculating main body function in RenderScript, and one is deposited using Allocation object The multiplication of storage calculates an element in C mouthfuls of main body Function Array, another parameter is the offset of parallel calling, by RenderScript enforcement engine automatic assignment.If offset value x be less than optimal extension field parameter m, by C [X+m] element value with Optimal extension field parameter ω mould p is multiplied, and then carries out mould p with the value of currentElement C [x] and is added, is as a result stored in C [x].

Attached drawing 2 is the data cell schematic diagram that drop time calculates that main body function executes calculating.

Drop time calculates main body function and multiplication calculates main body function and all carries out operation, and its just for a data cell The calculating process of calculating process and other any data cells is completely irrelevant, and scheduling mode is that multiple function examples are adjusted parallel Degree.The function example of multiple concurrent schedulings is all accessed at accessing shared data region with read-only mode, does not cause any conflict Or the inconsistent situation of data.

The optimal extension field multiplying that the present invention and Android platform are realized using serial approach compares, test It was found that parallel method used in the present invention is with the obvious advantage.Different brands handpiece portion measured data is as shown in table 2 and table 3:

The test data of the different mobile phones of table 2

3 mobile phone configuration situation of table

All above-mentioned this intellectual properties of primarily implementation, there is no this new products of implementation of setting limitation other forms And/or new method.Those skilled in the art will utilize this important information, above content modification, to realize similar execution feelings Condition.But all modifications or transformation belong to the right of reservation based on new product of the present invention.

The above described is only a preferred embodiment of the present invention, being not that the invention has other forms of limitations, appoint What those skilled in the art changed or be modified as possibly also with the technology contents of the disclosure above equivalent variations etc. Imitate embodiment.But without departing from the technical solutions of the present invention, according to the technical essence of the invention to above embodiments institute Any simple modification, equivalent variations and the remodeling made, still fall within the protection scope of technical solution of the present invention.

Claims

1. a kind of fast parallel calculation method of optimal extension field element multiplication characterized by comprising

Second step, the multiplication calculating main body function and drop time designed in RenderScript calculate main body function, the two functions It is by java class object by the calculating core of RenderScript enforcement engine concurrent invocation, when defined function is directed to list The first address of a internal storage location or a collection of internal storage location with homogeneity characteristic；

The first step further comprises:

A step, is the java class of optimal extension field element operation design specialized, defines three class members's one-dimension array variables, wherein Two participate in two multinomials of multiplication for storing, another storage multiplication calculation result；

B step, is optimal extension field element arithmetic unit class constructing definitions function, each member variable is initialized in constructed fuction；

C step is defined polynomial multiplication method for optimal extension field element arithmetic unit class, is mentioned using Renderscript programming framework The corresponding Allocation class object of two multipliers is passed to RenderScript execution and drawn by the memory management interfaces bind of confession It holds up, is stored in two array type variables of RenderScript enforcement engine memory headroom；

The second step further comprises:

2. the excellent fast parallel calculation method for expanding field element multiplication as described in claim 1, it is characterised in that: a step It is further specially to define three Allocation class objects as class members's variable, for being transmitted with Renderscrip engine Data, define three class members's lint-long integer variables, and optimal storage expands field parameter m, ω and p.

3. the excellent fast parallel calculation method for expanding field element multiplication as claimed in claim 2, it is characterised in that: the b step Further it is specially the value according to parameter m, equal-sized two Allocation class objects is respectively created, creation is used to store The third Allocation class object of calculated result.

4. the excellent fast parallel calculation method for expanding field element multiplication as described in claim 1, it is characterised in that: the c step Further it is specially that the multiplication in RenderScript is called to calculate the concurrent operation of main body function, then calls RenderScript In drop time calculate main body function and complete the operation of mould irreducible function, finally, being provided using Renderscript programming framework Memory management interfaces copyTo the calculated result stored in third Allocation class object is passed back to the member of java class In variable.

5. the excellent fast parallel calculation method for expanding field element multiplication as described in claim 1, it is characterised in that: in a ' In step, multiplication is calculated in main body function, is determined according to offset and is read corresponding term coefficient from two multiplier multinomials, carried out Mould p is multiplied and is added to obtain cell value result.

6. the excellent fast parallel calculation method for expanding field element multiplication as claimed in claim 2, it is characterised in that: in the b ' In step, in the implementation procedure that drop time calculates main body function, position of the unit in result array is determined according to the value of offset It sets, and then judges its respective items number whether more than m, for after item searching drop time of those respective items numbers no more than m times The number element in array corresponding with oneself identical item, is stored in this unit after being added with it.

7. the excellent fast parallel calculation method for expanding field element multiplication as described in claim 1, it is characterised in that: in c ' step In, when calculating two number s and r mould p multiplication, multiplier s is expressed as binary form；Using an accumulator variable t and set Setting its initial value is 0；Right-to-left traverses the binary string of s by turn, and every access one judges whether the primary position is 1, if 1 Then r value is added up into t, and t mould p is saved, then sets r+r mould p for r value；After the completion of traversing, that store in t is s Be multiplied mould p's as a result, if machine word-length is w, as long as the value of p, no more than 2w-1, the above process would not generate excessive with r It goes wrong, i.e., the end value of all operations does not all exceed the expression range an of word length.