CN108510429A - A kind of multivariable cryptographic algorithm parallelization accelerated method based on GPU - Google Patents
A kind of multivariable cryptographic algorithm parallelization accelerated method based on GPU Download PDFInfo
- Publication number
- CN108510429A CN108510429A CN201810228547.5A CN201810228547A CN108510429A CN 108510429 A CN108510429 A CN 108510429A CN 201810228547 A CN201810228547 A CN 201810228547A CN 108510429 A CN108510429 A CN 108510429A
- Authority
- CN
- China
- Prior art keywords
- multivariable
- gpu
- value
- variable
- kernel function
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3818—Decoding for concurrent execution
- G06F9/3822—Parallel decoding, e.g. parallel decode units
Abstract
The invention discloses a kind of multivariable cryptographic algorithm parallelization accelerated method based on GPU, this method include the following steps:S1, all progress same order operations to multivariable equation;S2, the domains GF2 comultiplication table is generated;S3, the texture memory that item number table and multiplication table are mapped to GPU;S4, each piece of main kernel function of data call multivariable is calculated and executes Reduce operations;S5, principal function is write to dispatch the main kernel function of multivariable;S6, program is executed, output encryption and decryption is as a result, release resource.The present invention is mainly by carrying out same orders by all of multivariable and being optimized to the cryptographic algorithm of multivariable cipher system in conjunction with the thought of Map Reduce, and by taking SpongeMPH hash function algorithms as an example, the realization below CUDA platforms is given compared with performance.Experiment shows that the program improves the operational efficiency of algorithm, can be used for accelerating the cryptographic algorithm based on multivariable cipher system.
Description
Technical field
The present invention relates to the technical fields of cryptographic algorithm, are calculated more specifically to a kind of multivariable password based on GPU
Method parallelization accelerated method.
Background technology
Graphics processing unit GPU is originally designed for carrying out image procossing, in recent years due to the limitation of CPU power consumption and
To calculating the rapid growth of demand, and the computing capability of GPU is fast-developing with the speed of far super Moore's Law, promotes GPU wide
It is general to apply in scientific algorithm field.
Multivariable cryptographic algorithm is the cryptography scheme constituted using multivariable polynomial in finite field.And it solves in finite field
Multivariable polynomial equation group problem is a NP-Hard problem, is one of the mentality of designing of anti-quantum attack at present.However it is more
The operand of variable cryptographic algorithm is larger, causes less efficient to be a main aspect for limiting the practicality.So how to carry
The execution efficiency of high GPU is one of the direction of those skilled in the art's research.
Invention content
The shortcomings that it is a primary object of the present invention to overcome the prior art and deficiency, provide a kind of multivariable based on GPU
Cryptographic algorithm parallelization accelerated method realizes the parallel of multivariable cryptographic algorithm using the thought of GPU combinations Map-Reduce
Change, to improve its execution efficiency.
In order to achieve the above object, the present invention uses following technical scheme:
A kind of multivariable cryptographic algorithm parallelization accelerated method based on GPU of the present invention, includes the following steps:
S1, all progress same order operations to multivariable equation;
Generation member table and logarithmic table on S2, generation finite field pass through the two tables and carry out table lookup operation realization finite field
Multiplication improve the consistency in GPU thread calculating process, the first table of the generation refers to the generation member g on q rank finite fields F
The preceding q-1 natural number 0,1,2 generated ..., table, that is, table [i]=g of the Power powers of q-2 and 0 compositioniAnd enable table [p-
1]=1, table [p]=g;The logarithmic table refers to having arc_table [a]=i for the arbitrary element a in finite field,
Middle table [i]=a, and it is a big negative to enable the value of arc_table [0] so that 0*a=table [arc_table [0]+
Arc_table [a]] in arc_table [0]+arc_table [a] it is permanent be negative, and the value of table [negative] is 0;
S3, the texture memory that item table, coefficient table, the first table of generation and logarithmic table are mapped to GPU, the item table refers to changeable
The subscript that the variable of each single item is respectively constituted in amount equation, when a certain item is a1x1x3x4, wherein x1x3x4For variable, then item table exists
Corresponding position deposits 1,3,4, and the coefficient table refers to the coefficient of each single item in multivariable equation, is corresponded with item table;
S4, Reduce sum operations are calculated and are executed to each piece of main kernel function of data call multivariable, it is described
The parameter of the main kernel function of multivariable includes the address of pending data, the address of value of current variable of a polynomial and intermediate interim
The address of data storage;The content of the main kernel function of multivariable, which each of is included in GPU in basic thread, to be obtained
The value of each variable calculates the complete value of each single item operation, then carries out Reduce sum operations, obtains each polynomial knot
Fruit is simultaneously saved in current polynomial variable array
S5, principal function is write to dispatch the main kernel function of multivariable, principal function includes setting piecemeal size, applies in GPU
Space and texture memory binding are deposited, block data is constantly passed into main kernel function, finally copies back result of calculation
Host end memory discharges resource;
S6, program is executed, output encryption and decryption is as a result, release resource.
As a preferred technical solution, in step S1, same orderization operation is specially:
It is multiplied with the item of low order for 1 nuisance variable to make it equal to order of a polynomial by introducing value, thus
Each polynomial item and summation are calculated with identical operation in disposable kernel function call, is avoided because branched structure is led
The performance of GPU is caused to decline;
Meanwhile the redundant term that introducing value is 0 so that each member of equation number quantity is the multiple of Block;The Block
It is the definition on CUDA, is then work-gruop on OpenCl.
As a preferred technical solution, in step S2, generate finite field on multiplication table the step of it is as follows:
For in the domains mod n, if g is to generate member, the greatest common divisor of n and g are 1, then can be by extending Euclid
Algorithm finds out the value for generating member g, to enumerate g0,g1,…,g(p-2)To obtain multiplication table and inverse table.
Further include following the description as a preferred technical solution, in step S3:
First data are pre-processed at the ends CPU, that is, use redundancy 0*xtxtxtFiller table and coefficient table so that each
Member of equation number is just the Thread Count that each block of block numbers * of GPU possess, convenient for Reduce summations in step s 4
Operation, wherein assuming to contain variable x in multi-variable system0, x1,…,xt-1, and the additional custom variable x that add value is 1t;
Then these arrays are copied to GPU end memories with asynchronous operation, then is bound with texture memory.
As a preferred technical solution, in step S4, wherein the pending data address, current multinomial each
The address of the value storage of variable is unsigned character vector pointer (uint8_t*);
The address of the intermediate ephemeral data storage is 32 integer pointers (uint32_t*) of no symbol.
As a preferred technical solution, in step S4, the main kernel function of multivariable is calculated and executes Reduce
Sum operation includes following the description:
The value of a variable array of block piecemeals copy of S41, each GPU;
S42, input data is handled, is then the absord operations carried out in sponge structure in SpongeMPH;
S43, respective items and coefficient are searched from texture memory according to current global thread id, and is calculated using multiplication table
Its product value;
S44, the value of each equation is calculated using the mode of Reduce summations, is each Warp summations by half first
Mode quickly calculates the sum of 32 threads, then solves each polynomial value simultaneously using atom sum operation;
S45, equation is summed with memory management function again after result copy back in variable array.
It is described to write principal function to dispatch the main kernel function of multivariable as a preferred technical solution, in the step S5
Specifically include following the description:
Number of threads in S51, the block numbers of the main kernel function of setting, each block;
S52, apply for corresponding memory headroom from the ends GPU, the data asynchronous flow operation in step S51 is copied into GPU
In video memory, then bind into texture memory;
S53, it transmits corresponding text data every time and the value of variable array that last computation obtains is to main kernel function, leads to
It crosses and constantly calls the kernel function to constantly update the value for obtaining variable of a polynomial;
S54, final cryptographic Hash from GPU video memorys is copied back to host end memory, and uses cudaFree, free order
Discharge resource.
Step S6 specifically includes following the description as a preferred technical solution,:
After principal function writes, directly invoke operation, target device according to the setting of principal function and scheduling strategy reasonably
Cycle executes copy data to target device, allows each thread to run kernel program, operation result is copied back from target device
Three operations of host, after waiting all data processed results all to complete, are output to specified position by result and discharge resource.
Compared with prior art, the present invention having the following advantages that and advantageous effect:
Present invention employs the Parallelizing Techniques scheme based on GPU, grasped by same order to multivariable equation and filling
Make, makes full use of the parallel organization characteristic of GPU in conjunction with the multiplication table of texture memory, and pass through summation and atomic operation by half
It quickly calculates the value of separate equation after each iteration, overcomes multivariable cryptographic algorithm low shadow of calculating speed on CPU platforms
The problem of ringing its application scenarios, it is close to improve multivariable to achieve the effect that the acceleration of 15 times of CPU calculating speeds
The practicability of code algorithm.The invention has considered parallel granularity, and Memory Allocation simultaneously takes full advantage of GPU to carry out multinomial quick
Summation optimization, it is ensured that performance of the multivariable password when carrying out Parallel Implementation using GPU in the present invention.
Description of the drawings
Fig. 1 is SpongeMPH hash function flow charts;
Fig. 2 is the flow chart of the present invention.
Specific implementation mode
Present invention will now be described in further detail with reference to the embodiments and the accompanying drawings, but embodiments of the present invention are unlimited
In this.
Embodiment
The present embodiment is by taking multivariable hash function SpongeMPH as an example, the flow of the SpongeMPH hash functions flow
As shown in Figure 1:
A) padding paddings are carried out first so that the data length of input is the integral multiple of block length
B) cycle reads the data of block length, and carries out exclusive or with the preceding r*k bits of current state, then uses multivariable
Function MPE calculates the value of update current state, and state SL is obtained until all digital independents finish end
C) a MPE update SL is recalled, final value S0 is obtained, final result is obtained finally by S0.
SpongeMPH concrete implementations are provided compared with performance in CUDA platforms based on the program.According to this embodiment
Step can also be used for fast implementing for other multivariable cryptographic algorithms with minor modifications.
As described in Figure 2, multivariable cryptographic algorithm parallelization accelerated method of the present embodiment based on GPU, includes the following steps:
S1, all progress same order operations to multivariable equation.In order to make full use of the multinuclear characteristic and its meter of GPU
The step of calculation feature accelerates multivariable cryptographic algorithm, and invention introduces same order.By introducing value be 1 nuisance variable with
The item of low order is multiplied to make it equal to order of a polynomial, so as in disposable kernel function call with the same behaviour
Make to calculate each polynomial item and summation, avoid because branched structure causes the performance of GPU to decline.Meanwhile introducing value is 0
Redundant term so that each member of equation number quantity is that (Block is the definition on CUDA to Block, is then on OpenCl
Work-gruop multiple).
In this implementation column, each block data size is 320 bits, is deposited in memory using the variable of 40 8 bits
It puts.The state size of SpongeMPH is 40*8bit, i.e. 40 equations, 40 variables, wherein each equation possesses 40 1 ranks
, 842 2 ranks and 400 3 ranks.40 variable labels that SpongeMPH is used are x0,x1... ..., x39, wherein xi
For 8 bits, x is enabled40=1, remember that i-th of equation is:
Ft(x0,x1... ..., x39)=
∑t≤i≤j≤k≤nαtijxixjxk+∑t≤i≤j≤nβtixixj+∑t≤i≤nγtixi+δt
Then the operation of same order can be expressed as:Ft(x0,x1... ..., x39)=∑t≤i≤j≤k≤nαtijxixjxk+∑t≤i≤j≤n
βtixixjx40+∑t≤i≤nγtix40x40+δtx40x40x40
The coefficient of all equation each single item is stored in an array var, for each single item x of equationixjxkBy its subscript
It is stored to index respectively, among tri- arrays of indey, indez, then each final equation can be expressed as:
Var and index, indey, indez are filled respectively with 0 and 1 simultaneously so that each member of equation is the GPU of equation
Each block Thread Counts multiple, be set as 128 using the Thread Count of each block here, therefore be after filling:
Multiplication table on S2, generation finite field.Multiplication of the multivariable cryptographic algorithm in calculating process is to be based on finite field
On operation, therefore generating multiplication table contributes to the consistency in GPU thread calculating process, reduces unnecessary branched structure and again
It is multiple to calculate, to improve calculating speed.
The SpongeMPH versions used in the present embodiment are GF28Operation on domain is used as construction multiplication using member 3 is generated
The basis of table, and store corresponding data using table and arc_table.Wherein table [i]=3i, table [i]=
Table [arc_table [x]] is then for element a, and b is in GF28Multiplication on domain can be expressed as:
If 1) a, b!=0, a*b=table [(arc_table [a]+arc_table [b]) %0xFF];
2) otherwise a*b=table [negative]=0;
It is sufficiently small negative wherein to enable arc_table [0], enables tmp=arc_table [a]+arc_table [b], then
(tmp can be passed through>0) * (tmp%0xFF) is obtained as a result, to by 1) and 2) unite.
S3, the texture memory that item table, coefficient table, multiplication table are mapped to GPU.Multivariable cryptographic algorithm is in each operation
It is required for reading the item number of each single item in journey, and item number table may be very big, in order to improve reading speed, is deposited using texture memory
Storage.In addition, progress multiplication is also needed to table look-up in calculating process, since multiplying may be unevenly distributed, therefore use in texture
It deposits and is conducive to improve search efficiency to store.
Further, first data are pre-processed at the ends CPU, that is, uses redundancy 0*x40x40x40Filler table and coefficient
Table so that each member of equation number is just the Thread Count that each block of block numbers * of GPU possess, and is convenient in step 4
Reduce sum operations.Then these arrays are copied to GPU end memories with asynchronous operation, then is bound with texture memory.
S4, each piece of main kernel function of data call multivariable is calculated and executes Reduce sum operations.It is changeable
The parameter for measuring main kernel function includes the address of pending data, the address of the value of current variable of a polynomial and intermediate ephemeral data
The address of storage.Wherein, the address of pending data address, the value storage of current each variable of multinomial is no symbol word
Vector pointer (uint8_t*) is accorded with, the address of intermediate ephemeral data storage is 32 integer pointers (uint32_t*) of no symbol.It is interior
The content of kernel function each of is included in GPU the value that each variable is obtained in basic thread, calculates each single item operation
Then complete value carries out Reduce sum operations, obtain each polynomial result and be saved in current polynomial variable number
In group.
In the present embodiment, there are three kernel function parameters, be respectively the group address (input data) for depositing pending data,
The address (output data finally can also be updated into this array) of the array of current variable of a polynomial value, intermediate ephemeral data
The address of array.Each thread in GPU corresponds to a member of equation, which is mainly made of following operation:
(a) the block piecemeals of each GPU copy the value of a variable array, are by each block in the present embodiment
Value to shared drive of preceding 40 threads to copy a variable respectively in;
(b) input data is handled, is then the absord operations carried out in sponge structure in SpongeMPH;
(c) respective items and coefficient are searched from texture memory according to current global thread id, and is calculated using multiplication table
Its product value;
(d) value of each equation is calculated using the mode of Reduce summations, is each Warp summations by half first
Mode quickly calculates the sum of 32 threads, then solves each polynomial value simultaneously using atom sum operation;
(e) result after equation being summed with preceding 40 threads of another GPU kernel program Map again copies back variable number
In group;
In the present embodiment, shown in the pseudo table 1 of the main kernel function;The pseudocode of Reduce functions is as shown in table 2.
Table 1
Table 2
S5, principal function is write to dispatch the main kernel function of multivariable.Principal function includes setting piecemeal size, is applied in GPU
Space is deposited, block data is constantly passed to main kernel function, result of calculation is finally copied back host by texture memory binding
End memory, release resource etc..
Number of threads first in the block numbers, each block of setting kernel function:In the present embodiment, due to each side
Cheng Youxiao item numbers are 1284, and in conjunction with the structure of GPU, setting block numbers are 440, and the Thread Count in single block is 128, is led to
Cross the void item for being 0 to each equation Filling power so that the block numbers that each equation uses just are 11.
Then to apply for corresponding memory headroom from the ends GPU, by the data asynchronous flow operation in above-mentioned steps copy into
In GPU video memorys, then bind into texture memory.Then corresponding text data is worn every time and 40 variables that last computation obtains
Value (being initialized as 0) give main kernel function, then the value of acquisition variable of a polynomial is constantly updated with kernel function.
Final cryptographic Hash is copied from GPU video memorys and understands host end memory by last part, and use cudaFree,
The orders such as free discharge resource.
S6, program is executed, output encryption and decryption is as a result, release resource.After main program writes, so that it may directly to run,
Target device can reasonably recycle execution copy data to target device according to the setting of main program and scheduling strategy, allow each line
Operation result is copied back three operations of host by Cheng Yunhang kernel programs from target device.It is all complete etc. all data processed results
At later, result is output to specified position and discharges resource.
CUDA main programs are opened in IDE, or use Command Line Interface, and operation directly is compiled to it.Root
According to CUDA main programs, cryptographic Hash can be output in screen or specified file.
In the present embodiment, the pseudocode for dispatching kernel function is as shown in table 3.
Table 3
Experimental result
This example running environment is:CPU model Core i7 6700k, memory 16G, operating system ArchLinux
(64), GPU model Nvidia GTX1070, video memory 11G, used SDK versions are CUDA Toolkit 9.0, are used
Integrated Development Environment be nsight.
This example the following is the performance of GPU-SpongeMPH and CPU-SpongeMPH when input data size is 40MB
Compare as shown in table 4:
Table 4
The above embodiment is a preferred embodiment of the present invention, but embodiments of the present invention are not by above-described embodiment
Limitation, it is other it is any without departing from the spirit and principles of the present invention made by changes, modifications, substitutions, combinations, simplifications,
Equivalent substitute mode is should be, is included within the scope of the present invention.
Claims (8)
1. a kind of multivariable cryptographic algorithm parallelization accelerated method based on GPU, which is characterized in that include the following steps:
S1, all progress same order operations to multivariable equation;
Generation member table and logarithmic table on S2, generation finite field, pass through the two tables and carry out multiplying for table lookup operation realization finite field
Method improves the consistency in GPU thread calculating process, and the first table of the generation refers to that the generation member g on q rank finite fields F is generated
Preceding q-1 natural number 0,1,2 ..., the power side of q-2 with 0 constitute table, that is, table [i]=giAnd enable table [p-1]=
1, table [p]=g;The logarithmic table refer to have arc_table [a]=i for the arbitrary element a in finite field, wherein
Table [i]=a, and it is a big negative to enable the value of arc_table [0] so that 0*a=table [arc_table [0]+arc_
Table [a]] in arc_table [0]+arc_table [a] it is permanent be negative, and the value of table [negative] is 0;
S3, the texture memory that item table, coefficient table, the first table of generation and logarithmic table are mapped to GPU, the item table refers to multivariable side
The subscript that the variable of each single item is respectively constituted in journey, when a certain item is a1x1x3x4, wherein x1x3x4For variable, then item table is corresponding
Deposit 1,3,4 in position;The coefficient table refers to the coefficient of each single item in multivariable equation, is corresponded with item table;
S4, Reduce sum operations are calculated and are executed to each piece of main kernel function of data call multivariable, it is described changeable
The parameter for measuring main kernel function include the address of pending data, current variable of a polynomial value address and intermediate ephemeral data
The address of storage;The content of the main kernel function of multivariable each of be included in GPU in basic thread obtain it is each
The value of a variable calculates the complete value of each single item operation, then carries out Reduce sum operations, obtains each polynomial result simultaneously
It is saved in current polynomial variable array
S5, principal function is write to dispatch the main kernel function of multivariable, principal function includes setting piecemeal size, application GPU memory skies
Between and texture memory binding, block data is constantly passed into main kernel function, result of calculation is finally copied back into host
End memory discharges resource;
S6, program is executed, output encryption and decryption is as a result, release resource.
2. the multivariable cryptographic algorithm parallelization accelerated method based on GPU according to claim 1, which is characterized in that step
In S1, same orderization operation is specially:
It is multiplied with the item of low order for 1 nuisance variable to make it equal to order of a polynomial by introducing value, to primary
Property kernel function call in identical operation calculate each polynomial item and summation, avoid because branched structure causes
The performance of GPU declines;
Meanwhile the redundant term that introducing value is 0 so that each member of equation number quantity is the multiple of Block;The Block is
Definition on CUDA is then work-gruop on OpenCl.
3. the multivariable cryptographic algorithm parallelization accelerated method based on GPU according to claim 1, which is characterized in that step
In S2, generate finite field on multiplication table the step of it is as follows:
For in the domains mod n, if g is to generate member, the greatest common divisor of n and g are 1, then can pass through Extended Euclidean Algorithm
The value for generating member g is found out, to enumerate g0,g1,…,g(p-2)To obtain multiplication table and inverse table.
4. the multivariable cryptographic algorithm parallelization accelerated method based on GPU according to claim 1, which is characterized in that step
Further include following the description in S3:
First data are pre-processed at the ends CPU, that is, use redundancy 0*xtxtxtFiller table and coefficient table so that each equation
Item number be just GPU the Thread Counts that possess of each block of block numbers *, convenient for Reduce summation behaviour in step s 4
Make, wherein assuming to contain variable x in multi-variable system0, x1,…,xt-1, and the additional custom variable x that add value is 1t;So
It copies these arrays to GPU end memories with asynchronous operation afterwards, then is bound with texture memory.
5. the multivariable cryptographic algorithm parallelization accelerated method based on GPU according to claim 1, which is characterized in that step
In S4, wherein the pending data address, the address that the value of current each variable of multinomial is stored are unsigned character
Vector pointer (uint8_t*);
The address of the intermediate ephemeral data storage is 32 integer pointers (uint32_t*) of no symbol.
6. according to claim 1 or the 5 multivariable cryptographic algorithm parallelization accelerated methods based on GPU, which is characterized in that step
In rapid S4, it includes following the description that the main kernel function of multivariable, which is calculated and executes Reduce sum operations,:
The value of a variable array of block piecemeals copy of S41, each GPU;
S42, input data is handled, is then the absord operations carried out in sponge structure in SpongeMPH;
S43, respective items and coefficient are searched from texture memory according to current global thread id, and calculates it using multiplication table and multiplies
Product value;
S44, the value of each equation is calculated using the mode of Reduce summations, is modes of each Warp with summation by half first
The sum of 32 threads is quickly calculated, then solves each polynomial value simultaneously using atom sum operation;
S45, equation is summed with memory management function again after result copy back in variable array.
7. the multivariable cryptographic algorithm parallelization accelerated method based on GPU according to claim 1, which is characterized in that described
It is described to write principal function and specifically include following the description to dispatch the main kernel function of multivariable in step S5:
Number of threads in S51, the block numbers of the main kernel function of setting, each block;
S52, apply for corresponding memory headroom from the ends GPU, the data asynchronous flow operation in step S51 is copied into GPU video memorys
In, then bind into texture memory;
S53, corresponding text data is transmitted every time and the value of variable array that last computation obtains is to main kernel function, by not
It is disconnected to call the kernel function to constantly update the value for obtaining variable of a polynomial;
S54, final cryptographic Hash from GPU video memorys is copied back to host end memory, and is discharged using cudaFree, free order
Resource.
8. the multivariable cryptographic algorithm parallelization accelerated method based on GPU according to claim 1, which is characterized in that step
S6 specifically includes following the description:
After principal function writes, operation is directly invoked, target device is reasonably recycled according to the setting of principal function and scheduling strategy
Copy data are executed to target device, allows each thread to run kernel program, operation result is copied back into host from target device
Three operations, after waiting all data processed results all to complete, are output to specified position by result and discharge resource.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810228547.5A CN108510429B (en) | 2018-03-20 | 2018-03-20 | Multivariable cryptographic algorithm parallelization acceleration method based on GPU |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810228547.5A CN108510429B (en) | 2018-03-20 | 2018-03-20 | Multivariable cryptographic algorithm parallelization acceleration method based on GPU |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108510429A true CN108510429A (en) | 2018-09-07 |
CN108510429B CN108510429B (en) | 2021-11-02 |
Family
ID=63375986
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810228547.5A Active CN108510429B (en) | 2018-03-20 | 2018-03-20 | Multivariable cryptographic algorithm parallelization acceleration method based on GPU |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108510429B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109918125A (en) * | 2019-03-20 | 2019-06-21 | 浪潮商用机器有限公司 | GPU configuration method and device based on OpenPOWER framework |
CN112131583A (en) * | 2020-09-02 | 2020-12-25 | 上海科技大学 | GPU-based model counting and constraint solving method |
Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1870499A (en) * | 2005-01-11 | 2006-11-29 | 丁津泰 | Method for generating multiple variable commom key password system |
CN101008937A (en) * | 2007-02-06 | 2007-08-01 | 中国科学院研究生院 | Computer implementation method of multiplier over finite field and computer implementation method of large matrix element elimination |
CN101916185A (en) * | 2010-08-27 | 2010-12-15 | 上海交通大学 | Automatic parallelization acceleration method of serial programs running under multi-core platform |
CN101977109A (en) * | 2010-10-21 | 2011-02-16 | 李晨 | Linear mixed high ordered equation public key algorithm |
CN102006170A (en) * | 2010-11-11 | 2011-04-06 | 西安理工大学 | Ring signature method for anonymizing information based on MQ problem in finite field |
CN102214086A (en) * | 2011-06-20 | 2011-10-12 | 复旦大学 | General-purpose parallel acceleration algorithm based on multi-core processor |
CN102811125A (en) * | 2012-08-16 | 2012-12-05 | 西北工业大学 | Certificateless multi-receiver signcryption method with multivariate-based cryptosystem |
CN103490877A (en) * | 2013-09-05 | 2014-01-01 | 北京航空航天大学 | Parallelization method for ARIA symmetric block cipher algorithm based on CUDA |
CN103745447A (en) * | 2014-02-17 | 2014-04-23 | 东南大学 | Fast parallel achieving method for non-local average filtering |
US20140173608A1 (en) * | 2012-12-14 | 2014-06-19 | Electronics And Telecommunications Research Institute | Apparatus and method for predicting performance attributable to parallelization of hardware acceleration devices |
CN103973431A (en) * | 2014-04-16 | 2014-08-06 | 华南师范大学 | AES parallel implementation method based on OpenCL |
CN104020983A (en) * | 2014-06-16 | 2014-09-03 | 上海大学 | KNN-GPU acceleration method based on OpenCL |
US20150324707A1 (en) * | 2014-05-12 | 2015-11-12 | Palo Alto Research Center Incorporated | System and method for selecting useful smart kernels for general-purpose gpu computing |
CN105743644A (en) * | 2016-01-26 | 2016-07-06 | 广东技术师范学院 | Mask encryption device of multivariable quadratic equation |
CN105933111A (en) * | 2016-05-27 | 2016-09-07 | 华南师范大学 | Bitslicing-KLEIN rapid implementation method based on OpenCL |
CN107392429A (en) * | 2017-06-22 | 2017-11-24 | 东南大学 | Under the direction of energy that a kind of GPU accelerates method is pushed away before trigonometric equation group |
-
2018
- 2018-03-20 CN CN201810228547.5A patent/CN108510429B/en active Active
Patent Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1870499A (en) * | 2005-01-11 | 2006-11-29 | 丁津泰 | Method for generating multiple variable commom key password system |
CN101008937A (en) * | 2007-02-06 | 2007-08-01 | 中国科学院研究生院 | Computer implementation method of multiplier over finite field and computer implementation method of large matrix element elimination |
CN101916185A (en) * | 2010-08-27 | 2010-12-15 | 上海交通大学 | Automatic parallelization acceleration method of serial programs running under multi-core platform |
CN101977109A (en) * | 2010-10-21 | 2011-02-16 | 李晨 | Linear mixed high ordered equation public key algorithm |
CN102006170A (en) * | 2010-11-11 | 2011-04-06 | 西安理工大学 | Ring signature method for anonymizing information based on MQ problem in finite field |
CN102214086A (en) * | 2011-06-20 | 2011-10-12 | 复旦大学 | General-purpose parallel acceleration algorithm based on multi-core processor |
CN102811125A (en) * | 2012-08-16 | 2012-12-05 | 西北工业大学 | Certificateless multi-receiver signcryption method with multivariate-based cryptosystem |
US20140173608A1 (en) * | 2012-12-14 | 2014-06-19 | Electronics And Telecommunications Research Institute | Apparatus and method for predicting performance attributable to parallelization of hardware acceleration devices |
CN103490877A (en) * | 2013-09-05 | 2014-01-01 | 北京航空航天大学 | Parallelization method for ARIA symmetric block cipher algorithm based on CUDA |
CN103745447A (en) * | 2014-02-17 | 2014-04-23 | 东南大学 | Fast parallel achieving method for non-local average filtering |
CN103973431A (en) * | 2014-04-16 | 2014-08-06 | 华南师范大学 | AES parallel implementation method based on OpenCL |
US20150324707A1 (en) * | 2014-05-12 | 2015-11-12 | Palo Alto Research Center Incorporated | System and method for selecting useful smart kernels for general-purpose gpu computing |
CN104020983A (en) * | 2014-06-16 | 2014-09-03 | 上海大学 | KNN-GPU acceleration method based on OpenCL |
CN105743644A (en) * | 2016-01-26 | 2016-07-06 | 广东技术师范学院 | Mask encryption device of multivariable quadratic equation |
CN105933111A (en) * | 2016-05-27 | 2016-09-07 | 华南师范大学 | Bitslicing-KLEIN rapid implementation method based on OpenCL |
CN107392429A (en) * | 2017-06-22 | 2017-11-24 | 东南大学 | Under the direction of energy that a kind of GPU accelerates method is pushed away before trigonometric equation group |
Non-Patent Citations (2)
Title |
---|
ARMIN AHMADZADEH ET AL.: "A high-performance and energy-efficient exhaustive key search approach via GPU on DES-like cryptosystems", 《JOURNAL OF SUPERCOMPUTING 》 * |
王后珍等: "多变量代数理论及其在密码学中的应用", 《北京工业大学学报》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109918125A (en) * | 2019-03-20 | 2019-06-21 | 浪潮商用机器有限公司 | GPU configuration method and device based on OpenPOWER framework |
CN112131583A (en) * | 2020-09-02 | 2020-12-25 | 上海科技大学 | GPU-based model counting and constraint solving method |
CN112131583B (en) * | 2020-09-02 | 2023-12-15 | 上海科技大学 | Model counting and constraint solving method based on GPU |
Also Published As
Publication number | Publication date |
---|---|
CN108510429B (en) | 2021-11-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Agrawal et al. | FAB: An FPGA-based accelerator for bootstrappable fully homomorphic encryption | |
CN108364065A (en) | Adopt the microprocessor of booth multiplication | |
Su et al. | FPGA-based hardware accelerator for leveled ring-lwe fully homomorphic encryption | |
CN100504758C (en) | Multiple-word multiplication-accumulation circuit and montgomery modular multiplication-accumulation circuit | |
Cao et al. | Accelerating fully homomorphic encryption over the integers with super-size hardware multiplier and modular reduction | |
Geelen et al. | BASALISC: programmable asynchronous hardware accelerator for BGV fully homomorphic encryption | |
CN101479698A (en) | Mulptiplying two numbers | |
CN108510429A (en) | A kind of multivariable cryptographic algorithm parallelization accelerated method based on GPU | |
US20170102942A1 (en) | Variable Length Execution Pipeline | |
Su et al. | A highly unified reconfigurable multicore architecture to speed up NTT/INTT for homomorphic polynomial multiplication | |
CN102122241A (en) | Analog multiplier/divider applicable to prime field and polynomial field | |
Alam et al. | Novel parallel algorithms for fast multi-GPU-based generation of massive scale-free networks | |
US7849125B2 (en) | Efficient computation of the modulo operation based on divisor (2n-1) | |
Bisson et al. | A GPU implementation of the sparse deep neural network graph challenge | |
Ni et al. | A high-performance SIKE hardware accelerator | |
Wang et al. | Saber on ESP32 | |
KR20130128695A (en) | Modular arithmetic unit and secure system having the same | |
Bos et al. | ECC2K-130 on cell CPUs | |
CN106371803B (en) | Calculation method and computing device for Montgomery domain | |
Henry et al. | Solving discrete logarithms in smooth-order groups with CUDA | |
CN108288091A (en) | Adopt the microprocessor of booth multiplication | |
Seo | SIKE on GPU: Accelerating supersingular isogeny-based key encapsulation mechanism on graphic processing units | |
Zheng | Encrypted cloud using GPUs | |
WO2023141936A1 (en) | Techniques and devices for efficient montgomery multiplication with reduced dependencies | |
WO2019178735A1 (en) | Gpu-based parallel acceleration method for multi-variable password algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |