CN106020949A - Fast parallel calculation method of optimal extension field element multiplication - Google Patents
Fast parallel calculation method of optimal extension field element multiplication Download PDFInfo
- Publication number
- CN106020949A CN106020949A CN201610305021.3A CN201610305021A CN106020949A CN 106020949 A CN106020949 A CN 106020949A CN 201610305021 A CN201610305021 A CN 201610305021A CN 106020949 A CN106020949 A CN 106020949A
- Authority
- CN
- China
- Prior art keywords
- multiplication
- renderscript
- main body
- field element
- extension field
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/542—Event management; Broadcasting; Multicasting; Notifications
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Devices For Executing Special Programs (AREA)
Abstract
The invention provides a fast parallel calculation method of optimal extension field element multiplication. The method comprises the following steps: firstly, designing a specialized optimal extension field element arithmetic unit Java class for multiplication of optimal extension field elements; and secondly, designing a multiplication main function and a descending main function in RenderScript, wherein the two functions are calculation cores which are concurrently called by a Java class object through a RenderScript execution engine, and when the functions are defined, the first address of a single internal memory unit or the first addresses of a batch of internal memory units with homogeneous features is or are aimed at. According to the method provided by the invention, a RenderScript programming interface and a parallel processing mechanism in an Android platform are utilized, so that the fast parallel polynomial module multiplication is realized.
Description
Technical field
The present invention relates to field of computer technology, particularly relate to a kind of optimum expansion based on Renderscript programming framework
Territory multiplication computational methods.
Background technology
Finite field Fp mIt is referred to as an optimal extension field OEF (optimal extension field), if p=2n-c (its
Middle P is prime number, and scope is within 2^64, and Integer n and c meet log2 | c |≤n/2), and there is irreducible function f
(z)=zm-ω.Element in this finite field is the multinomial that high order is less than m-1, and multinomial coefficient is FpIn element.?
The adding of element in excellent expansion territory, subtract, take advantage of, square, the elementary operation such as invert be the multinomial operation of mould f (z), the fortune of multinomial coefficient
At last at FpDefined in, i.e. the arithmetical operation of mould p.
When c is 1 or-1, this optimal extension field is referred to as I type;If ω=2, then optimal extension field is referred to as II type
's.Shown in table 1 is some conventional optimal extension field parameters.
Table 1.OEF parameter is illustrated.
p | f | parameters | Type |
27+3 | z13-5 | N=7, c=-3, m=13, ω=5 | - |
213-1 | z13-2 | N=13, c=1, m=13, ω=2 | I, II |
257-13 | z3-2 | N=57, c=13, m=3, ω=2 | II |
231-19 | z6-2 | N=31, c=19, m=6, ω=2 | II |
257-13 | z3-2 | N=57, c=13, m=3, ω=2 | II |
261-1 | z3-37 | N=61, c=1, m=3, ω=37 | I |
231-1 | z6-7 | N=31, c=1, m=6, ω=7 | I |
Optimal extension field can be used to construct elliptic curve, it is advantageous that: the size of prime number p can be chosen at less than one
In the range of the word length of individual word, improved the safety of cryptographic system by the common effect of m and p.So, in evaluator
Can simplify or avoid the problems such as big integer arithmetic carry during coefficient computing, thus improve software and realize efficiency.
In the elementary operation of optimal extension field element, the complexity of polynomial multiplication is higher, and multiplication is again the fortune such as invert simultaneously
The basis calculated.
Darrel Hankerson etc. is given in " Guide to Elliptic Curve Cryptography " book
The principle of optimal extension field element multiplication describes.Optimal extension field element a with b is multiplied can the most multinomial with coefficient of utilization mould p
Formula multiplication modulo irreducible function f (z) realizes.Formula is:
Wherein
The method only gives principle and describes, and without reference to realizing details, does not accounts for co-efficient multiplication and modular arithmetic
The problems such as spilling in journey, carry.
Karatsuba-Ofman method uses (divide-and-conquer) algorithm idea, it is possible to reduce co-efficient multiplication
The number of times of computing.By multinomial a and b is each resolved into the form that two multinomials are added, multiplication apportionment ratio is utilized to reduce
Multiplication number of times.Such as:
A (z) b (z)=(A1zl+A0)(B1zl+B0)
=A1B1z2l+[(A1+A0)(B1+B0)-A1B1-A0B0]zl+A0B0
WhereinA0, A1, B0, B1It it is all the number of times multinomial that is less than l.This process can be carried out with recurrence,
Will A0, A1, B0, B1Again decompose.In Karatsuba-Ofman method, multinomial successively repeatedly decomposes, when performing to calculate
Need to rely on the result of calculation of subsequent decomposition, it is impossible to accomplish that the data of each calculating process are uncorrelated, therefore cannot be at parallelization
Reason, it is impossible to embody multi-core parallel computation advantage.It addition, multinomial cutting procedure realizes complex, the direct shadow of dividing method
Ringing execution efficiency, for less m value, load (overhead) is bigger.
First the calculating process of optimal extension field element multiplication can be understood as two multinomials carries out common multinomial and takes advantage of
Method, then performs mould f (z) computing.Owing to f (z) has f (z)=zm--the specific form of ω, can be by when calculating mould f (z)
All zmIt is replaced with ω, thus can avoid polynomial division, thus accomplish to drop item by item time.
Such as: at irreducible function f (z)=z5-2 and p=31 is in the optimal extension field of parameter, calculates two units of a and b
The multiplication of element, wherein:
A=z4+5z2+3z+7
B=9z2+z
Its ordinary polynomials multiplication result is c=9z6+z5+14z4+z3+7z2(coefficient module p), fall time is by institute to+7z item by item
Some z5Replace with 2, i.e. c=9z × z5+z5+14z4+z3+7z2+ 7z=18z+2+14z4+z3+7z2+ 7z=14z4+z3+7z2+
25z+2
On the elliptic curve of optimal extension field structure, the most again or the realization of additive operation, needs call up to tens
Secondary optimal extension field element computing.Therefore, that improves the execution speed of optimal extension field computing, especially multiplying performs speed
Degree, has very important significance for improving the execution efficiency of elliptic curve cryptosystem, finds optimum fast and effectively expansion
Field element computational methods are the most necessary.
In current optimal extension field computational methods, major part is based on serial algorithm.Android platform uses this
During a little method, utilization is the calculating resource of CPU.The parallel computing CUDA framework used on PC platform cannot
Android system uses.
RenderScript is to run 3D in a set of Android platform to render and process the programming frame of intensive calculating task
Frame, be mainly directed towards is the calculating task with parallel data processing feature.The operating mechanism of RenderScript is can be by
Calculate tasks in parallel, assign them to all available processor units in mobile device, CPU, GPU of such as multinuclear or
DSP.When developer utilizes RenderScript programming framework to develop, it is also possible to ignore the framework difference of target device, because
RenderScript code have employed compiling at runtime and caching technology, can automatically find and use all kinds of places on target device
Reason device resource.If the target device of the program of operation does not has any GPU or DSP, then RenderScript engine can be appointed calculating
Business transfers to CPU to complete completely, and therefore, RenderScript programming framework has high device independence and portability.
RenderScript can improve the application journey of image processing class, computer vision class and high-performance calculation class significantly
The speed of service of sequence, adds executive capability and the computing capability of Android native language.RenderScript uses c99 mark
Standard, is the programming framework of a kind C grammer.From the beginning of Android4.3 version, RenderScript becomes in android system
Unique parallel computation programming framework.
Use the conventional method of RenderScript programming framework can be summarized as following steps in Android platform:
First, the Android engineering created creates the calculating core document of the entitled rs of suffix, and is stored in
Under the src catalogue of engineering, this document contains pragma statement, corresponding java class declaration and main body calculating function fixed
Justice.It is the important means by calculating tasks in parallel that main body calculates function, and in Android application program, main body calculates function
Upper strata java class object in being applied by Android calls in the way of many examples are concurrent.Each concurrent function example is independent
Execution calculate task, it will usually access the internal storage location that is isolated from each other.
Second step be create under same catalogue for call main body calculate function upper strata java class, in order to in the first step
Rs file set up contact.
Finally, create a RenderScript class object at Android application program, and then use this Object Creation the
Upper strata java class object described in two steps, to its Resources allocation and initialize.By creating and using Allocation class pair
As, data are swapped and replicated between java program internal memory space and RenderScript engine memory headroom.
RenderScript class and Allocation class are all classes preset in Android development platform, as long as making in java program
Import relevant bag with import order can use.
The present invention is improved by serial algorithm existing to optimal extension field, designs a kind of new storage organization and calculating
Method, utilizes the RenderScript DLL in Android platform and parallel processing mechanism, it is achieved the most multinomial
Formula modular multiplication.
Summary of the invention
The technical problem to be solved is that serial algorithm existing to optimal extension field improves, and design is a kind of new
Storage organization and computational methods, utilize the RenderScript DLL in Android platform and parallel processing mechanism, real
Now quick parallel polynomial modular multiplication.
For solving above-mentioned technical problem, the invention provides the fast parallel calculating side of a kind of optimal extension field element multiplication
Method, comprising:
The first step, for optimal extension field element java class arithmetical unit of the multiplying design specialized of optimal extension field element;
Second step, the multiplication in design RenderScript calculates main body function and fall time calculates main body function, the two
Function by java class object by the calculating core of RenderScript enforcement engine concurrent invocation, during defined function for
It it is the first address of single internal storage location or a collection of internal storage location with homogeneity characteristic.
The described first step farther includes:
A walks, and for the java class of optimal extension field element computing design specialized, defines three class members's one-dimension array variablees,
Two of which is used for storing two multinomials participating in computing (being multiplied), and another stores (multiplication) result of calculation;
B walks, and for optimal extension field element class arithmetical unit constructing definitions function, initializes each member and become in constructed fuction
Amount;
C walks, and defines polynomial multiplication method for optimal extension field element class arithmetical unit, utilizes Renderscript to program frame
Memory management interfaces bind that frame provides, passes to RenderScript by Allocation class object corresponding for two multipliers and holds
Row engine, is saved in two array type variablees of RenderScript enforcement engine memory headroom.
Described second step specifically includes further:
A ' step, the multiplication in design RenderScript calculates main body function, defines two parameters: one is to use
One element of array of Allocation object storage;Another parameter is the side-play amount of parallel calling, by RenderScript
Enforcement engine automatic assignment, for each invoked main body function example, the value of its offset parameter is different, main
Body function determines this unit position in result array according to the value of side-play amount, so that it is determined that the computational methods of this cell value;
B ' step, the fall time in design RenderScript calculates main body function, defines two parameters: one is to use
One element of array of Allocation object storage;Another parameter is the side-play amount of parallel calling, by RenderScript
Enforcement engine automatic assignment;
C ' step, design multiplication calculates in main body function it can be avoided that the mould p multiplication of Overflow handling is machine-processed.
Described a step is specially further three Allocation class objects of definition as class members's variable, for
Renderscrip engine transmission data, define three class members's lint-long integer variablees, and optimal storage expands field parameter m, ω and p.
Described b step is specially the value according to parameter m further, is respectively created equal-sized two Allocation classes
Object, creates the 3rd the Allocation class object being used for storing result of calculation.
Described c step is the most specially called the multiplication in RenderScript and is calculated main body function concurrent operation, so
After call the fall in RenderScript time and calculate main body function and complete mould irreducible function computing.Finally, use
Memory management interfaces copyTo that Renderscript programming framework provides is by storage in the 3rd Allocation class object
Result of calculation is passed back in the member variable of java class.
In described a ' step, multiplication calculates in main body function, determines according to side-play amount and reads from two multiplier multinomials
Take the term coefficient of correspondence, carry out mould p and be multiplied and addition obtains cell value result.
In described b ' step, during fall time calculates the execution of main body function, determine this list according to the value of side-play amount
Unit position in result array, and then judge that its respective items number of times, whether more than m, is less than m for those respective items number of times
Item find the number of times item corresponding element in array identical with oneself after fall time, be stored in this unit with it after being added.
In c ' step, when calculating two numbers s and r mould p multiplication, multiplier s is expressed as binary form;Use one
Accumulator variable t to arrange its initial value be 0;Right-to-left travels through the binary string of s by turn, often accesses one and judges once should
Whether position is 1, if r value is added up into t by 1, and is preserved by t mould p, then r value is set to r+r mould p;Treat that traversal completes
After, s with r that be deposited in t is multiplied the result of mould p, if machine word-length is w position, as long as the value of p is not more thanw-1, above-mentioned
Process would not produce overflow problem, and the end value of the most all computings is all without the expression scope more than a word length.
Beneficial effects of the present invention:
A kind of based on renderscript programming framework the optimal extension field element multiplying that the present invention provides is counted parallel
Calculation method, meets: (1) quickly realizes optimal extension field element multiplying;(2) can be real on any android system equipment
Existing.
Accompanying drawing explanation
Fig. 1 multiplication concurrent operation data cell schematic diagram;
Time calculating main body function data cell schematics drops in Fig. 2.
Detailed description of the invention
The invention provides the fast parallel computational methods of a kind of optimal extension field element multiplication, comprising:
The first step, for optimal extension field element java class arithmetical unit of the multiplying design specialized of optimal extension field element;
Second step, the multiplication in design RenderScript calculates main body function and fall time calculates main body function, the two
Function by java class object by the calculating core of RenderScript enforcement engine concurrent invocation, during defined function for
It it is the first address of single internal storage location or a collection of internal storage location with homogeneity characteristic.
The described first step farther includes:
A walks, and for the java class of optimal extension field element computing design specialized, defines three class members's one-dimension array variablees,
Two of which is used for storing two multinomials participating in computing (being multiplied), and another stores (multiplication) result of calculation;
B walks, and for optimal extension field element class arithmetical unit constructing definitions function, initializes each member and become in constructed fuction
Amount;
C walks, and defines polynomial multiplication method for optimal extension field element class arithmetical unit, utilizes Renderscript to program frame
Memory management interfaces bind that frame provides, passes to RenderScript by Allocation class object corresponding for two multipliers and holds
Row engine, is saved in two array type variablees of RenderScript enforcement engine memory headroom;
A ' step, the multiplication in design RenderScript calculates main body function, defines two parameters: one is to use
One element of array of Allocation object storage;Another parameter is the side-play amount of parallel calling, by RenderScript
Enforcement engine automatic assignment, for each invoked main body function example, the value of its offset parameter is different, main
Body function determines this unit position in result array according to the value of side-play amount, so that it is determined that the computational methods of this cell value;
B ' step, the fall time in design RenderScript calculates main body function, defines two parameters: one is to use
One element of array of Allocation object storage;Another parameter is the side-play amount of parallel calling, by RenderScript
Enforcement engine automatic assignment;
C ' step, design multiplication calculates in main body function it can be avoided that the mould p multiplication of Overflow handling is machine-processed.
Described a step is specially further three Allocation class objects of definition as class members's variable, for
Renderscrip engine transmission data, define three class members's lint-long integer variablees, and optimal storage expands field parameter m, ω and p.
Described b step is specially the value according to parameter m further, is respectively created equal-sized two Allocation classes
Object, creates the 3rd the Allocation class object being used for storing result of calculation.
Described c step is the most specially called the multiplication in RenderScript and is calculated main body function concurrent operation, so
After call the fall in RenderScript time and calculate main body function and complete mould irreducible function computing.Finally, use
Memory management interfaces copyTo that Renderscript programming framework provides is by storage in the 3rd Allocation class object
Result of calculation is passed back in the member variable of java class.
In described a ' step, multiplication calculates in main body function, determines according to side-play amount and reads from two multiplier multinomials
Take the term coefficient of correspondence, carry out mould p and be multiplied and addition obtains cell value result.
In described b ' step, during fall time calculates the execution of main body function, determine this list according to the value of side-play amount
Unit position in result array, and then judge that its respective items number of times, whether more than m, is less than m for those respective items number of times
Item find the number of times item corresponding element in array identical with oneself after fall time, be stored in this unit with it after being added.
Such as: for multiplication result multinomial c=9z above6+z5+14z4+z3+7z2+ 7z, it corresponds to
Allocation object storage coefficient array in each element value be respectively 9,1,14,1,7,7,0}, each element quilt
One fall time calculates main body function individual processing, owing to the parameter of its optimal extension field is f (z)=zb-2 and p=31, therefore only have
14,1,7,7,0 these elements are processed by function.If certain fall time calculates, main body function is called and offset parameter is 1, then
This function instance processes be exactly 7z this, according to above-mentioned rule, this function example can store from Allocation object
Coefficient array finds 9z6This coefficient value 9, and be added with its mould p, write the result in currentElement.Obtain 16z's
Coefficient 16.
In c ' step, in multiplication calculates main body function, need to frequently use the mould p multiplying of multinomial coefficient, to the greatest extent
In the range of pipe can be selected in the word length less than a word choosing p value when (such as 64), but two FpIn
The product of element is likely to the word length scope beyond a word.Therefore simple multiplying cannot be used to realize multinomial coefficient
Multiplication operation.The present invention adopts and solves this problem with the following method: when calculating two numbers s and r mould p multiplication, represented by multiplier s
For binary form;Use accumulator variable t and to arrange its initial value be 0;Right-to-left travels through the binary system of s by turn
String, often accesses one and judges once whether this position is 1, if r value is added up into t by 1, and preserved by t mould p, then r value set
It is set to r+r mould p;After traversal completes, s with r that be deposited in t is multiplied the result of mould p.If machine word-length is w position, as long as
The value of p is not more than 2w-1, and said process would not produce overflow problem, and the end value of the most all computings is all without more than one
The expression scope of word length.
The present invention provides can provide new operation method to the optimal extension field element multiplication in Android platform, logical
Cross parallel method and significantly improve calculated performance.
During processing coefficient mould p multiplication, avoid intermediate object program carry overflow problem, thus simplify and calculated
Journey, has saved the calculating time.
The optimal extension field multiplying that the present invention uses serial approach to realize with Android platform compares, test
Finding, parallel method used in the present invention is with the obvious advantage.
Embodiment is below used to describe embodiments of the present invention in detail, whereby to the present invention how application technology means
Solve technical problem, and the process that realizes reaching technique effect can fully understand and implement according to this.
The fast parallel computational methods of a kind of optimal extension field element multiplication that the present invention provides, step one, for optimal extension field
Optimal extension field element java class arithmetical unit of the multiplying design specialized of element.By using import to order in class definition file
Order import android.renderscript.Allocation, android.renderscript.Element and
Tri-program bags of android.renderscript.RenderScript, in order to use built-in RenderScript object.Fortune
The definition detailed process calculating device java class is:
1.1) optimal extension field element java arithmetical unit apoplexy due to endogenous wind definition member's variable number group a and b, is used for storing participation multiplication fortune
Two the optimal extension field elements calculated, definition member's variable number group c is used for storing multiplication result.Each element of three arrays
It it is all the integer of a length of 64 bits.Additionally define the member variable of three integer types, be used for recording above-mentioned array space hold
Situation, the subscript maximum the most used.Defining three class members's lint-long integer variablees, optimal storage expands field parameter m, ω and p.
For an optimal extension field element, its multinomial coefficient is stored in array according to the order from low order to high order.Such as:
Multinomial 9x6+5x4+ 6x+3 storage mode in array is a []={ 9,0,5,0,0,6,3}
Defined in optimal extension field element java arithmetical unit class, the member variable of 3 Allocation object types, uses respectively
Come to RenderScript computing engines transmission multiplier and operation result data.When arithmetical unit, class object created, need to specify
Optimal extension field parameter m, ω and p.
1.2) definition optimal extension field element multiplication computational methods: utilize the internal memory pipe that Renderscript programming framework provides
Reason interface bind, is assigned to Renderscript by the content of Allocation class object corresponding for two multipliers and calculates main body letter
Two aray variables in number memory headroom.
The multiplication that calls using the forEach_functionName interface concurrent of ScriptC_mono class object calculates main
Body function, functionName is the title that multiplication calculates main body function.Using the 3rd Allocation class object as parameter
This invoked procedure incoming so that the unit of account as each concurrent function example of each element independence therein.Computing is complete
Cheng Hou, store in the 3rd Allocation class object is exactly the multiplication result of ordinary polynomials multiplication.
The fall time of calling using the forEach_functionName interface concurrent of ScriptC_mono class object calculates main
Body function, functionName is the title that fall time calculates main body function.Still using the 3rd Allocation class object as
Parameter this invoked procedure incoming so that the unit of account as each concurrent function example of each element independence therein.Fortune
After calculation completes, in front m the element of the 3rd Allocation class object array, storage is exactly mould f (z) multiplication result of calculation.
After computing completes, result is stored in member variable array c and returns.
The mode wherein using forEach_functionName calls main body function, can allow multiple main body function example
Called by concurrent, thus reach the purpose of boosting algorithm execution efficiency.
Multiplication in step 2, design RenderScript calculates main body function and fall time calculates main body function, the two
Function by java class object by the calculating core of RenderScript enforcement engine concurrent invocation, during defined function for
It it is the first address of single internal storage location or a collection of internal storage location with homogeneity characteristic.
2.1) multiplication calculates main body function
Multiplication in RenderScript calculates the parameter of main body function two, and one is to use Allocation object
An element in array;Another parameter is the side-play amount of parallel calling, and the array element of i.e. first parameter instruction is whole
Side-play amount in individual array.Owing to this main body function is by RenderScript enforcement engine concurrent invocation, therefore first
Parameter need by Allocation object transmission internal memory obtain, and second parameter be when concurrent invocation by
RenderScript enforcement engine automatic assignment.
Multiplication calculates the concrete step that performs of main body function: for side-play amount x, find all from two multiplier arrays
The subscript sum element equal to x to < A [i], B [j] >, by A [i] and the B [j] of these element centerings after mould p is multiplied respectively again
Mould p is added, and result is stored in currentElement i.e. C [x].Accompanying drawing 1 is the multiplication concurrent operation data cell schematic diagram of the present invention.
The execution process of its middle mold p multiplication operation is:
Calculate A [i] * B [j] modp, B [j] is expressed as binary form (bw-1, bw-2..., b2, b1, b0), wherein w is
Word length.In order to prevent the spilling in multiplication process, we select p to make p be not more than 2w-1.Initialize accumulator variable t also
Arranging its initial value is 0;If the lowest order of B [j] is 1, then A [i] and t mould p is added, and result is saved in t.By A
[i] is stored in after being added with self mould p in A [i], repeats said process until B [j] is equal to zero after B [j] is moved to right one.
2.2) fall time calculates main body function
The parameter of this calculating main body function in RenderScript has two, and one is to use Allocation object to deposit
The multiplication of storage calculates an element in main body Function Array C mouth, and another parameter is the side-play amount of parallel calling, by
RenderScript enforcement engine automatic assignment.If offset value x is less than optimal extension field parameter m, then by C [X+m] element value with
Optimal extension field parameter ω mould p is multiplied, and then the value with currentElement C [x] carries out mould p and is added, and result is stored in C [x].
Accompanying drawing 2 is the data cell schematic diagram that fall time calculates that main body function performs to calculate.
Fall time calculates main body function and multiplication calculates main body function and all carries out computing, and its just for a data cell
Calculating process is completely irrelevant with the calculating process of other any data cells, and its scheduling mode is that multiple function example is adjusted parallel
Degree.The function example of multiple concurrent schedulings all accesses with read-only mode when accessing shared data region, does not cause any conflict
Or the inconsistent situation of data.
The optimal extension field multiplying that the present invention uses serial approach to realize with Android platform compares, test
Finding, parallel method used in the present invention is with the obvious advantage.Different brands handpiece portion measured data is as shown in table 2 and table 3:
The test data of the different mobile phone of table 2
Table 3 mobile phone configuring condition
All above-mentioned primary these intellectual properties of enforcement, do not set this new product of enforcement limiting other forms
And/or new method.Those skilled in the art will utilize this important information, and foregoing is revised, to realize similar execution feelings
Condition.But, all modifications or transformation belong to the right of reservation based on new product of the present invention.
The above, be only presently preferred embodiments of the present invention, is not the restriction that the present invention makees other form, appoints
What those skilled in the art changed possibly also with the technology contents of the disclosure above or be modified as equivalent variations etc.
Effect embodiment.But every without departing from technical solution of the present invention content, the technical spirit of the foundation present invention is to above example institute
Any simple modification, equivalent variations and the remodeling made, still falls within the protection domain of technical solution of the present invention.
Claims (9)
1. the fast parallel computational methods of an optimal extension field element multiplication, it is characterised in that including:
The first step, for optimal extension field element java class arithmetical unit of the multiplying design specialized of optimal extension field element;
Second step, the multiplication in design RenderScript calculates main body function and fall time calculates main body function, the two function
By the java class object calculating core by RenderScript enforcement engine concurrent invocation, during defined function, it is directed to list
Individual internal storage location or there is the first address of a collection of internal storage location of homogeneity characteristic.
2. the fast parallel computational methods of optimal extension field element multiplication as claimed in claim 1, it is characterised in that described first
Step farther includes:
A walks, and for the java class of optimal extension field element computing design specialized, defines three class members's one-dimension array variablees, wherein
Two are used for storing two multinomials participating in computing (being multiplied), and another stores (multiplication) result of calculation;
B walks, and for optimal extension field element class arithmetical unit constructing definitions function, initializes each member variable in constructed fuction;
C walks, and defines polynomial multiplication method for optimal extension field element class arithmetical unit, utilizes Renderscript programming framework to carry
Memory management interfaces bind of confession, passes to Allocation class object corresponding for two multipliers RenderScript and performs to draw
Hold up, be saved in two array type variablees of RenderScript enforcement engine memory headroom.
The fast parallel computational methods of excellent expansion field element multiplication the most as claimed in claim 1 or 2, it is characterised in that described
Two steps farther include:
A ' step, the multiplication in design RenderScript calculates main body function, defines two parameters: one is to use
One element of array of Allocation object storage;Another parameter is the side-play amount of parallel calling, by RenderScript
Enforcement engine automatic assignment, for each invoked main body function example, the value of its offset parameter is different, main
Body function determines this unit position in result array according to the value of side-play amount, so that it is determined that the computational methods of this cell value;
B ' step, the fall time in design RenderScript calculates main body function, defines two parameters: one is to use
One element of array of Allocation object storage;Another parameter is the side-play amount of parallel calling, by RenderScript
Enforcement engine automatic assignment;
C ' step, design multiplication calculates in main body function it can be avoided that the mould p multiplication of Overflow handling is machine-processed.
4. the fast parallel computational methods of the excellent expansion field element multiplication as described in claims 1 to 3, it is characterised in that: described a
Step the most specially three Allocation class objects of definition are as class members's variable, for passing with Renderscrip engine
Delivery data, defines three class members's lint-long integer variablees, and optimal storage expands field parameter m, ω and p.
5. the fast parallel computational methods of the excellent expansion field element multiplication as described in Claims 1-4, it is characterised in that: described b
Step is specially the value according to parameter m further, is respectively created equal-sized two Allocation class objects, creates and is used for depositing
3rd Allocation class object of storage result of calculation.
6. the fast parallel computational methods of the excellent expansion field element multiplication as described in claim 1 to 5, it is characterised in that: described c
Step is the most specially called the multiplication in RenderScript and is calculated main body function concurrent operation, then calls
Fall in RenderScript time calculates main body function and completes mould irreducible function computing.Finally, Renderscript is used
The result of calculation of storage in 3rd Allocation class object is passed back by memory management interfaces copyTo that programming framework provides
In the member variable of java class.
7. the fast parallel computational methods of the excellent expansion field element multiplication as described in claim 1 to 6, it is characterised in that: described
In a ' step, multiplication calculates in main body function, determines the term coefficient reading correspondence from two multiplier multinomials according to side-play amount,
Carry out mould p to be multiplied and addition obtains cell value result.
8. the fast parallel computational methods of the excellent expansion field element multiplication as described in claim 1 to 7, it is characterised in that: described
In b ' step, during fall time calculates the execution of main body function, determine that this unit is in result array according to the value of side-play amount
Position, and then judge its respective items number of times whether more than m, for those respective items number of times item less than m find fall time with
The corresponding element in array of item that rear number of times is identical with oneself, is stored in this unit with it after being added.
9. the fast parallel computational methods of the excellent expansion field element multiplication as described in claim 1 to 8, it is characterised in that: at c '
In step, when calculating two numbers s and r mould p multiplication, multiplier s is expressed as binary form;Use accumulator variable t also
Arranging its initial value is 0;Right-to-left travels through the binary string of s by turn, often accesses one and judges once whether this position is 1, if
It is 1 to add up r value into t, and t mould p is preserved, then r value is set to r+r mould p;After traversal completes, deposit in t is
Be multiplied the result of mould p for s and r, if machine word-length is w position, as long as the value of p is not more than 2w-1, said process would not produce
Raw overflow problem, the end value of the most all computings is all without the expression scope more than a word length.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610305021.3A CN106020949B (en) | 2016-05-09 | 2016-05-09 | A kind of fast parallel calculation method of optimal extension field element multiplication |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610305021.3A CN106020949B (en) | 2016-05-09 | 2016-05-09 | A kind of fast parallel calculation method of optimal extension field element multiplication |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106020949A true CN106020949A (en) | 2016-10-12 |
CN106020949B CN106020949B (en) | 2019-08-06 |
Family
ID=57098983
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610305021.3A Expired - Fee Related CN106020949B (en) | 2016-05-09 | 2016-05-09 | A kind of fast parallel calculation method of optimal extension field element multiplication |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106020949B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101271570A (en) * | 2008-05-07 | 2008-09-24 | 威盛电子股份有限公司 | Apparatus and method for large integer multiplication operation |
CN103731254A (en) * | 2012-10-14 | 2014-04-16 | 张仁平 | Correcting and applying system of fast algorithm library for number theory (NTL) |
US20140244703A1 (en) * | 2013-02-26 | 2014-08-28 | Nvidia Corporation | System, method, and computer program product for implementing large integer operations on a graphics processing unit |
-
2016
- 2016-05-09 CN CN201610305021.3A patent/CN106020949B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101271570A (en) * | 2008-05-07 | 2008-09-24 | 威盛电子股份有限公司 | Apparatus and method for large integer multiplication operation |
CN103731254A (en) * | 2012-10-14 | 2014-04-16 | 张仁平 | Correcting and applying system of fast algorithm library for number theory (NTL) |
US20140244703A1 (en) * | 2013-02-26 | 2014-08-28 | Nvidia Corporation | System, method, and computer program product for implementing large integer operations on a graphics processing unit |
Non-Patent Citations (2)
Title |
---|
ANTON KARGL,等: "Fast Arithmetic on ATmega128 for Elliptic Curve Cryptography", 《CRYPTOLOGY EPRINT ARCHIVE: REPORT 2008/442》 * |
EUN-HEE GOO, 等: "A Study on Android-Based Real Number Field Elliptic Curve Key Table Generation", 《COMPUTER APPLICATION FOR SECURITY, CONTROL AND SYSTEM ENGINEERING》 * |
Also Published As
Publication number | Publication date |
---|---|
CN106020949B (en) | 2019-08-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Chen et al. | ThunderGP: HLS-based graph processing framework on FPGAs | |
George et al. | Sparse Cholesky factorization on a local-memory multiprocessor | |
Kumar et al. | Scalable load balancing techniques for parallel computers | |
Chowdhury et al. | Oblivious algorithms for multicores and networks of processors | |
CN112559163B (en) | Method and device for optimizing tensor calculation performance | |
Sarıyüce et al. | Regularizing graph centrality computations | |
Beaumont et al. | A realistic model and an efficient heuristic for scheduling with heterogeneous processors | |
Zayer et al. | A gpu‐adapted structure for unstructured grids | |
Bernaschi et al. | A factored sparse approximate inverse preconditioned conjugate gradient solver on graphics processing units | |
Larusic et al. | Experimental analysis of heuristics for the bottleneck traveling salesman problem | |
Duff et al. | Experiments with sparse Cholesky using a sequential task-flow implementation | |
CN106020949A (en) | Fast parallel calculation method of optimal extension field element multiplication | |
Deitz et al. | Abstractions for dynamic data distribution | |
Curtis et al. | An efficient solution to the subset‐sum problem on GPU | |
Van Reeuwijk et al. | Spar: A set of extensions to Java for scientific computation | |
Igual et al. | Scheduling algorithms‐by‐blocks on small clusters | |
Smith et al. | Beyond time complexity: data movement complexity analysis for matrix multiplication | |
Weiss et al. | Computation of matrix chain products on parallel machines | |
Jeannot | Process mapping on any topology with TopoMatch | |
Hagerup et al. | FORK: A high-level language for PRAMs | |
Barsamian et al. | Efficient strict-binning particle-in-cell algorithm for multi-core SIMD processors | |
Carneiro et al. | Productivity-aware design and implementation of distributed tree-based search algorithms | |
Chenhan et al. | Performance models and workload distribution algorithms for optimizing a hybrid CPU–GPU multifrontal solver | |
Nedozhogin et al. | Scalability Pipelined Algorithm of the Conjugate Gradient Method on Heterogeneous Platforms | |
Larsen | Generating Efficient Code for Futhark’s Segmented Redomap |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20190806 Termination date: 20210509 |
|
CF01 | Termination of patent right due to non-payment of annual fee |