CN108510429A - A kind of multivariable cryptographic algorithm parallelization accelerated method based on GPU - Google Patents

A kind of multivariable cryptographic algorithm parallelization accelerated method based on GPU Download PDF

Info

Publication number
CN108510429A
CN108510429A CN201810228547.5A CN201810228547A CN108510429A CN 108510429 A CN108510429 A CN 108510429A CN 201810228547 A CN201810228547 A CN 201810228547A CN 108510429 A CN108510429 A CN 108510429A
Authority
CN
China
Prior art keywords
multivariable
gpu
value
variable
kernel function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810228547.5A
Other languages
Chinese (zh)
Other versions
CN108510429B (en
Inventor
龚征
廖国鸿
黎伟杰
马昌社
刘志杰
罗裴然
黄家敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China Normal University
Original Assignee
South China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China Normal University filed Critical South China Normal University
Priority to CN201810228547.5A priority Critical patent/CN108510429B/en
Publication of CN108510429A publication Critical patent/CN108510429A/en
Application granted granted Critical
Publication of CN108510429B publication Critical patent/CN108510429B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3818Decoding for concurrent execution
    • G06F9/3822Parallel decoding, e.g. parallel decode units

Abstract

The invention discloses a kind of multivariable cryptographic algorithm parallelization accelerated method based on GPU, this method include the following steps:S1, all progress same order operations to multivariable equation;S2, the domains GF2 comultiplication table is generated;S3, the texture memory that item number table and multiplication table are mapped to GPU;S4, each piece of main kernel function of data call multivariable is calculated and executes Reduce operations;S5, principal function is write to dispatch the main kernel function of multivariable;S6, program is executed, output encryption and decryption is as a result, release resource.The present invention is mainly by carrying out same orders by all of multivariable and being optimized to the cryptographic algorithm of multivariable cipher system in conjunction with the thought of Map Reduce, and by taking SpongeMPH hash function algorithms as an example, the realization below CUDA platforms is given compared with performance.Experiment shows that the program improves the operational efficiency of algorithm, can be used for accelerating the cryptographic algorithm based on multivariable cipher system.

Description

A kind of multivariable cryptographic algorithm parallelization accelerated method based on GPU
Technical field
The present invention relates to the technical fields of cryptographic algorithm, are calculated more specifically to a kind of multivariable password based on GPU Method parallelization accelerated method.
Background technology
Graphics processing unit GPU is originally designed for carrying out image procossing, in recent years due to the limitation of CPU power consumption and To calculating the rapid growth of demand, and the computing capability of GPU is fast-developing with the speed of far super Moore's Law, promotes GPU wide It is general to apply in scientific algorithm field.
Multivariable cryptographic algorithm is the cryptography scheme constituted using multivariable polynomial in finite field.And it solves in finite field Multivariable polynomial equation group problem is a NP-Hard problem, is one of the mentality of designing of anti-quantum attack at present.However it is more The operand of variable cryptographic algorithm is larger, causes less efficient to be a main aspect for limiting the practicality.So how to carry The execution efficiency of high GPU is one of the direction of those skilled in the art's research.
Invention content
The shortcomings that it is a primary object of the present invention to overcome the prior art and deficiency, provide a kind of multivariable based on GPU Cryptographic algorithm parallelization accelerated method realizes the parallel of multivariable cryptographic algorithm using the thought of GPU combinations Map-Reduce Change, to improve its execution efficiency.
In order to achieve the above object, the present invention uses following technical scheme:
A kind of multivariable cryptographic algorithm parallelization accelerated method based on GPU of the present invention, includes the following steps:
S1, all progress same order operations to multivariable equation;
Generation member table and logarithmic table on S2, generation finite field pass through the two tables and carry out table lookup operation realization finite field Multiplication improve the consistency in GPU thread calculating process, the first table of the generation refers to the generation member g on q rank finite fields F The preceding q-1 natural number 0,1,2 generated ..., table, that is, table [i]=g of the Power powers of q-2 and 0 compositioniAnd enable table [p- 1]=1, table [p]=g;The logarithmic table refers to having arc_table [a]=i for the arbitrary element a in finite field, Middle table [i]=a, and it is a big negative to enable the value of arc_table [0] so that 0*a=table [arc_table [0]+ Arc_table [a]] in arc_table [0]+arc_table [a] it is permanent be negative, and the value of table [negative] is 0;
S3, the texture memory that item table, coefficient table, the first table of generation and logarithmic table are mapped to GPU, the item table refers to changeable The subscript that the variable of each single item is respectively constituted in amount equation, when a certain item is a1x1x3x4, wherein x1x3x4For variable, then item table exists Corresponding position deposits 1,3,4, and the coefficient table refers to the coefficient of each single item in multivariable equation, is corresponded with item table;
S4, Reduce sum operations are calculated and are executed to each piece of main kernel function of data call multivariable, it is described The parameter of the main kernel function of multivariable includes the address of pending data, the address of value of current variable of a polynomial and intermediate interim The address of data storage;The content of the main kernel function of multivariable, which each of is included in GPU in basic thread, to be obtained The value of each variable calculates the complete value of each single item operation, then carries out Reduce sum operations, obtains each polynomial knot Fruit is simultaneously saved in current polynomial variable array
S5, principal function is write to dispatch the main kernel function of multivariable, principal function includes setting piecemeal size, applies in GPU Space and texture memory binding are deposited, block data is constantly passed into main kernel function, finally copies back result of calculation Host end memory discharges resource;
S6, program is executed, output encryption and decryption is as a result, release resource.
As a preferred technical solution, in step S1, same orderization operation is specially:
It is multiplied with the item of low order for 1 nuisance variable to make it equal to order of a polynomial by introducing value, thus Each polynomial item and summation are calculated with identical operation in disposable kernel function call, is avoided because branched structure is led The performance of GPU is caused to decline;
Meanwhile the redundant term that introducing value is 0 so that each member of equation number quantity is the multiple of Block;The Block It is the definition on CUDA, is then work-gruop on OpenCl.
As a preferred technical solution, in step S2, generate finite field on multiplication table the step of it is as follows:
For in the domains mod n, if g is to generate member, the greatest common divisor of n and g are 1, then can be by extending Euclid Algorithm finds out the value for generating member g, to enumerate g0,g1,…,g(p-2)To obtain multiplication table and inverse table.
Further include following the description as a preferred technical solution, in step S3:
First data are pre-processed at the ends CPU, that is, use redundancy 0*xtxtxtFiller table and coefficient table so that each Member of equation number is just the Thread Count that each block of block numbers * of GPU possess, convenient for Reduce summations in step s 4 Operation, wherein assuming to contain variable x in multi-variable system0, x1,…,xt-1, and the additional custom variable x that add value is 1t; Then these arrays are copied to GPU end memories with asynchronous operation, then is bound with texture memory.
As a preferred technical solution, in step S4, wherein the pending data address, current multinomial each The address of the value storage of variable is unsigned character vector pointer (uint8_t*);
The address of the intermediate ephemeral data storage is 32 integer pointers (uint32_t*) of no symbol.
As a preferred technical solution, in step S4, the main kernel function of multivariable is calculated and executes Reduce Sum operation includes following the description:
The value of a variable array of block piecemeals copy of S41, each GPU;
S42, input data is handled, is then the absord operations carried out in sponge structure in SpongeMPH;
S43, respective items and coefficient are searched from texture memory according to current global thread id, and is calculated using multiplication table Its product value;
S44, the value of each equation is calculated using the mode of Reduce summations, is each Warp summations by half first Mode quickly calculates the sum of 32 threads, then solves each polynomial value simultaneously using atom sum operation;
S45, equation is summed with memory management function again after result copy back in variable array.
It is described to write principal function to dispatch the main kernel function of multivariable as a preferred technical solution, in the step S5 Specifically include following the description:
Number of threads in S51, the block numbers of the main kernel function of setting, each block;
S52, apply for corresponding memory headroom from the ends GPU, the data asynchronous flow operation in step S51 is copied into GPU In video memory, then bind into texture memory;
S53, it transmits corresponding text data every time and the value of variable array that last computation obtains is to main kernel function, leads to It crosses and constantly calls the kernel function to constantly update the value for obtaining variable of a polynomial;
S54, final cryptographic Hash from GPU video memorys is copied back to host end memory, and uses cudaFree, free order Discharge resource.
Step S6 specifically includes following the description as a preferred technical solution,:
After principal function writes, directly invoke operation, target device according to the setting of principal function and scheduling strategy reasonably Cycle executes copy data to target device, allows each thread to run kernel program, operation result is copied back from target device Three operations of host, after waiting all data processed results all to complete, are output to specified position by result and discharge resource.
Compared with prior art, the present invention having the following advantages that and advantageous effect:
Present invention employs the Parallelizing Techniques scheme based on GPU, grasped by same order to multivariable equation and filling Make, makes full use of the parallel organization characteristic of GPU in conjunction with the multiplication table of texture memory, and pass through summation and atomic operation by half It quickly calculates the value of separate equation after each iteration, overcomes multivariable cryptographic algorithm low shadow of calculating speed on CPU platforms The problem of ringing its application scenarios, it is close to improve multivariable to achieve the effect that the acceleration of 15 times of CPU calculating speeds The practicability of code algorithm.The invention has considered parallel granularity, and Memory Allocation simultaneously takes full advantage of GPU to carry out multinomial quick Summation optimization, it is ensured that performance of the multivariable password when carrying out Parallel Implementation using GPU in the present invention.
Description of the drawings
Fig. 1 is SpongeMPH hash function flow charts;
Fig. 2 is the flow chart of the present invention.
Specific implementation mode
Present invention will now be described in further detail with reference to the embodiments and the accompanying drawings, but embodiments of the present invention are unlimited In this.
Embodiment
The present embodiment is by taking multivariable hash function SpongeMPH as an example, the flow of the SpongeMPH hash functions flow As shown in Figure 1:
A) padding paddings are carried out first so that the data length of input is the integral multiple of block length
B) cycle reads the data of block length, and carries out exclusive or with the preceding r*k bits of current state, then uses multivariable Function MPE calculates the value of update current state, and state SL is obtained until all digital independents finish end
C) a MPE update SL is recalled, final value S0 is obtained, final result is obtained finally by S0.
SpongeMPH concrete implementations are provided compared with performance in CUDA platforms based on the program.According to this embodiment Step can also be used for fast implementing for other multivariable cryptographic algorithms with minor modifications.
As described in Figure 2, multivariable cryptographic algorithm parallelization accelerated method of the present embodiment based on GPU, includes the following steps:
S1, all progress same order operations to multivariable equation.In order to make full use of the multinuclear characteristic and its meter of GPU The step of calculation feature accelerates multivariable cryptographic algorithm, and invention introduces same order.By introducing value be 1 nuisance variable with The item of low order is multiplied to make it equal to order of a polynomial, so as in disposable kernel function call with the same behaviour Make to calculate each polynomial item and summation, avoid because branched structure causes the performance of GPU to decline.Meanwhile introducing value is 0 Redundant term so that each member of equation number quantity is that (Block is the definition on CUDA to Block, is then on OpenCl Work-gruop multiple).
In this implementation column, each block data size is 320 bits, is deposited in memory using the variable of 40 8 bits It puts.The state size of SpongeMPH is 40*8bit, i.e. 40 equations, 40 variables, wherein each equation possesses 40 1 ranks , 842 2 ranks and 400 3 ranks.40 variable labels that SpongeMPH is used are x0,x1... ..., x39, wherein xi For 8 bits, x is enabled40=1, remember that i-th of equation is:
Ft(x0,x1... ..., x39)=
t≤i≤j≤k≤nαtijxixjxk+∑t≤i≤j≤nβtixixj+∑t≤i≤nγtixit
Then the operation of same order can be expressed as:Ft(x0,x1... ..., x39)=∑t≤i≤j≤k≤nαtijxixjxk+∑t≤i≤j≤n βtixixjx40+∑t≤i≤nγtix40x40tx40x40x40
The coefficient of all equation each single item is stored in an array var, for each single item x of equationixjxkBy its subscript It is stored to index respectively, among tri- arrays of indey, indez, then each final equation can be expressed as:
Var and index, indey, indez are filled respectively with 0 and 1 simultaneously so that each member of equation is the GPU of equation Each block Thread Counts multiple, be set as 128 using the Thread Count of each block here, therefore be after filling:
Multiplication table on S2, generation finite field.Multiplication of the multivariable cryptographic algorithm in calculating process is to be based on finite field On operation, therefore generating multiplication table contributes to the consistency in GPU thread calculating process, reduces unnecessary branched structure and again It is multiple to calculate, to improve calculating speed.
The SpongeMPH versions used in the present embodiment are GF28Operation on domain is used as construction multiplication using member 3 is generated The basis of table, and store corresponding data using table and arc_table.Wherein table [i]=3i, table [i]= Table [arc_table [x]] is then for element a, and b is in GF28Multiplication on domain can be expressed as:
If 1) a, b!=0, a*b=table [(arc_table [a]+arc_table [b]) %0xFF];
2) otherwise a*b=table [negative]=0;
It is sufficiently small negative wherein to enable arc_table [0], enables tmp=arc_table [a]+arc_table [b], then (tmp can be passed through>0) * (tmp%0xFF) is obtained as a result, to by 1) and 2) unite.
S3, the texture memory that item table, coefficient table, multiplication table are mapped to GPU.Multivariable cryptographic algorithm is in each operation It is required for reading the item number of each single item in journey, and item number table may be very big, in order to improve reading speed, is deposited using texture memory Storage.In addition, progress multiplication is also needed to table look-up in calculating process, since multiplying may be unevenly distributed, therefore use in texture It deposits and is conducive to improve search efficiency to store.
Further, first data are pre-processed at the ends CPU, that is, uses redundancy 0*x40x40x40Filler table and coefficient Table so that each member of equation number is just the Thread Count that each block of block numbers * of GPU possess, and is convenient in step 4 Reduce sum operations.Then these arrays are copied to GPU end memories with asynchronous operation, then is bound with texture memory.
S4, each piece of main kernel function of data call multivariable is calculated and executes Reduce sum operations.It is changeable The parameter for measuring main kernel function includes the address of pending data, the address of the value of current variable of a polynomial and intermediate ephemeral data The address of storage.Wherein, the address of pending data address, the value storage of current each variable of multinomial is no symbol word Vector pointer (uint8_t*) is accorded with, the address of intermediate ephemeral data storage is 32 integer pointers (uint32_t*) of no symbol.It is interior The content of kernel function each of is included in GPU the value that each variable is obtained in basic thread, calculates each single item operation Then complete value carries out Reduce sum operations, obtain each polynomial result and be saved in current polynomial variable number In group.
In the present embodiment, there are three kernel function parameters, be respectively the group address (input data) for depositing pending data, The address (output data finally can also be updated into this array) of the array of current variable of a polynomial value, intermediate ephemeral data The address of array.Each thread in GPU corresponds to a member of equation, which is mainly made of following operation:
(a) the block piecemeals of each GPU copy the value of a variable array, are by each block in the present embodiment Value to shared drive of preceding 40 threads to copy a variable respectively in;
(b) input data is handled, is then the absord operations carried out in sponge structure in SpongeMPH;
(c) respective items and coefficient are searched from texture memory according to current global thread id, and is calculated using multiplication table Its product value;
(d) value of each equation is calculated using the mode of Reduce summations, is each Warp summations by half first Mode quickly calculates the sum of 32 threads, then solves each polynomial value simultaneously using atom sum operation;
(e) result after equation being summed with preceding 40 threads of another GPU kernel program Map again copies back variable number In group;
In the present embodiment, shown in the pseudo table 1 of the main kernel function;The pseudocode of Reduce functions is as shown in table 2.
Table 1
Table 2
S5, principal function is write to dispatch the main kernel function of multivariable.Principal function includes setting piecemeal size, is applied in GPU Space is deposited, block data is constantly passed to main kernel function, result of calculation is finally copied back host by texture memory binding End memory, release resource etc..
Number of threads first in the block numbers, each block of setting kernel function:In the present embodiment, due to each side Cheng Youxiao item numbers are 1284, and in conjunction with the structure of GPU, setting block numbers are 440, and the Thread Count in single block is 128, is led to Cross the void item for being 0 to each equation Filling power so that the block numbers that each equation uses just are 11.
Then to apply for corresponding memory headroom from the ends GPU, by the data asynchronous flow operation in above-mentioned steps copy into In GPU video memorys, then bind into texture memory.Then corresponding text data is worn every time and 40 variables that last computation obtains Value (being initialized as 0) give main kernel function, then the value of acquisition variable of a polynomial is constantly updated with kernel function.
Final cryptographic Hash is copied from GPU video memorys and understands host end memory by last part, and use cudaFree, The orders such as free discharge resource.
S6, program is executed, output encryption and decryption is as a result, release resource.After main program writes, so that it may directly to run, Target device can reasonably recycle execution copy data to target device according to the setting of main program and scheduling strategy, allow each line Operation result is copied back three operations of host by Cheng Yunhang kernel programs from target device.It is all complete etc. all data processed results At later, result is output to specified position and discharges resource.
CUDA main programs are opened in IDE, or use Command Line Interface, and operation directly is compiled to it.Root According to CUDA main programs, cryptographic Hash can be output in screen or specified file.
In the present embodiment, the pseudocode for dispatching kernel function is as shown in table 3.
Table 3
Experimental result
This example running environment is:CPU model Core i7 6700k, memory 16G, operating system ArchLinux (64), GPU model Nvidia GTX1070, video memory 11G, used SDK versions are CUDA Toolkit 9.0, are used Integrated Development Environment be nsight.
This example the following is the performance of GPU-SpongeMPH and CPU-SpongeMPH when input data size is 40MB Compare as shown in table 4:
Table 4
The above embodiment is a preferred embodiment of the present invention, but embodiments of the present invention are not by above-described embodiment Limitation, it is other it is any without departing from the spirit and principles of the present invention made by changes, modifications, substitutions, combinations, simplifications, Equivalent substitute mode is should be, is included within the scope of the present invention.

Claims (8)

1. a kind of multivariable cryptographic algorithm parallelization accelerated method based on GPU, which is characterized in that include the following steps:
S1, all progress same order operations to multivariable equation;
Generation member table and logarithmic table on S2, generation finite field, pass through the two tables and carry out multiplying for table lookup operation realization finite field Method improves the consistency in GPU thread calculating process, and the first table of the generation refers to that the generation member g on q rank finite fields F is generated Preceding q-1 natural number 0,1,2 ..., the power side of q-2 with 0 constitute table, that is, table [i]=giAnd enable table [p-1]= 1, table [p]=g;The logarithmic table refer to have arc_table [a]=i for the arbitrary element a in finite field, wherein Table [i]=a, and it is a big negative to enable the value of arc_table [0] so that 0*a=table [arc_table [0]+arc_ Table [a]] in arc_table [0]+arc_table [a] it is permanent be negative, and the value of table [negative] is 0;
S3, the texture memory that item table, coefficient table, the first table of generation and logarithmic table are mapped to GPU, the item table refers to multivariable side The subscript that the variable of each single item is respectively constituted in journey, when a certain item is a1x1x3x4, wherein x1x3x4For variable, then item table is corresponding Deposit 1,3,4 in position;The coefficient table refers to the coefficient of each single item in multivariable equation, is corresponded with item table;
S4, Reduce sum operations are calculated and are executed to each piece of main kernel function of data call multivariable, it is described changeable The parameter for measuring main kernel function include the address of pending data, current variable of a polynomial value address and intermediate ephemeral data The address of storage;The content of the main kernel function of multivariable each of be included in GPU in basic thread obtain it is each The value of a variable calculates the complete value of each single item operation, then carries out Reduce sum operations, obtains each polynomial result simultaneously It is saved in current polynomial variable array
S5, principal function is write to dispatch the main kernel function of multivariable, principal function includes setting piecemeal size, application GPU memory skies Between and texture memory binding, block data is constantly passed into main kernel function, result of calculation is finally copied back into host End memory discharges resource;
S6, program is executed, output encryption and decryption is as a result, release resource.
2. the multivariable cryptographic algorithm parallelization accelerated method based on GPU according to claim 1, which is characterized in that step In S1, same orderization operation is specially:
It is multiplied with the item of low order for 1 nuisance variable to make it equal to order of a polynomial by introducing value, to primary Property kernel function call in identical operation calculate each polynomial item and summation, avoid because branched structure causes The performance of GPU declines;
Meanwhile the redundant term that introducing value is 0 so that each member of equation number quantity is the multiple of Block;The Block is Definition on CUDA is then work-gruop on OpenCl.
3. the multivariable cryptographic algorithm parallelization accelerated method based on GPU according to claim 1, which is characterized in that step In S2, generate finite field on multiplication table the step of it is as follows:
For in the domains mod n, if g is to generate member, the greatest common divisor of n and g are 1, then can pass through Extended Euclidean Algorithm The value for generating member g is found out, to enumerate g0,g1,…,g(p-2)To obtain multiplication table and inverse table.
4. the multivariable cryptographic algorithm parallelization accelerated method based on GPU according to claim 1, which is characterized in that step Further include following the description in S3:
First data are pre-processed at the ends CPU, that is, use redundancy 0*xtxtxtFiller table and coefficient table so that each equation Item number be just GPU the Thread Counts that possess of each block of block numbers *, convenient for Reduce summation behaviour in step s 4 Make, wherein assuming to contain variable x in multi-variable system0, x1,…,xt-1, and the additional custom variable x that add value is 1t;So It copies these arrays to GPU end memories with asynchronous operation afterwards, then is bound with texture memory.
5. the multivariable cryptographic algorithm parallelization accelerated method based on GPU according to claim 1, which is characterized in that step In S4, wherein the pending data address, the address that the value of current each variable of multinomial is stored are unsigned character Vector pointer (uint8_t*);
The address of the intermediate ephemeral data storage is 32 integer pointers (uint32_t*) of no symbol.
6. according to claim 1 or the 5 multivariable cryptographic algorithm parallelization accelerated methods based on GPU, which is characterized in that step In rapid S4, it includes following the description that the main kernel function of multivariable, which is calculated and executes Reduce sum operations,:
The value of a variable array of block piecemeals copy of S41, each GPU;
S42, input data is handled, is then the absord operations carried out in sponge structure in SpongeMPH;
S43, respective items and coefficient are searched from texture memory according to current global thread id, and calculates it using multiplication table and multiplies Product value;
S44, the value of each equation is calculated using the mode of Reduce summations, is modes of each Warp with summation by half first The sum of 32 threads is quickly calculated, then solves each polynomial value simultaneously using atom sum operation;
S45, equation is summed with memory management function again after result copy back in variable array.
7. the multivariable cryptographic algorithm parallelization accelerated method based on GPU according to claim 1, which is characterized in that described It is described to write principal function and specifically include following the description to dispatch the main kernel function of multivariable in step S5:
Number of threads in S51, the block numbers of the main kernel function of setting, each block;
S52, apply for corresponding memory headroom from the ends GPU, the data asynchronous flow operation in step S51 is copied into GPU video memorys In, then bind into texture memory;
S53, corresponding text data is transmitted every time and the value of variable array that last computation obtains is to main kernel function, by not It is disconnected to call the kernel function to constantly update the value for obtaining variable of a polynomial;
S54, final cryptographic Hash from GPU video memorys is copied back to host end memory, and is discharged using cudaFree, free order Resource.
8. the multivariable cryptographic algorithm parallelization accelerated method based on GPU according to claim 1, which is characterized in that step S6 specifically includes following the description:
After principal function writes, operation is directly invoked, target device is reasonably recycled according to the setting of principal function and scheduling strategy Copy data are executed to target device, allows each thread to run kernel program, operation result is copied back into host from target device Three operations, after waiting all data processed results all to complete, are output to specified position by result and discharge resource.
CN201810228547.5A 2018-03-20 2018-03-20 Multivariable cryptographic algorithm parallelization acceleration method based on GPU Active CN108510429B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810228547.5A CN108510429B (en) 2018-03-20 2018-03-20 Multivariable cryptographic algorithm parallelization acceleration method based on GPU

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810228547.5A CN108510429B (en) 2018-03-20 2018-03-20 Multivariable cryptographic algorithm parallelization acceleration method based on GPU

Publications (2)

Publication Number Publication Date
CN108510429A true CN108510429A (en) 2018-09-07
CN108510429B CN108510429B (en) 2021-11-02

Family

ID=63375986

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810228547.5A Active CN108510429B (en) 2018-03-20 2018-03-20 Multivariable cryptographic algorithm parallelization acceleration method based on GPU

Country Status (1)

Country Link
CN (1) CN108510429B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109918125A (en) * 2019-03-20 2019-06-21 浪潮商用机器有限公司 GPU configuration method and device based on OpenPOWER framework
CN112131583A (en) * 2020-09-02 2020-12-25 上海科技大学 GPU-based model counting and constraint solving method

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1870499A (en) * 2005-01-11 2006-11-29 丁津泰 Method for generating multiple variable commom key password system
CN101008937A (en) * 2007-02-06 2007-08-01 中国科学院研究生院 Computer implementation method of multiplier over finite field and computer implementation method of large matrix element elimination
CN101916185A (en) * 2010-08-27 2010-12-15 上海交通大学 Automatic parallelization acceleration method of serial programs running under multi-core platform
CN101977109A (en) * 2010-10-21 2011-02-16 李晨 Linear mixed high ordered equation public key algorithm
CN102006170A (en) * 2010-11-11 2011-04-06 西安理工大学 Ring signature method for anonymizing information based on MQ problem in finite field
CN102214086A (en) * 2011-06-20 2011-10-12 复旦大学 General-purpose parallel acceleration algorithm based on multi-core processor
CN102811125A (en) * 2012-08-16 2012-12-05 西北工业大学 Certificateless multi-receiver signcryption method with multivariate-based cryptosystem
CN103490877A (en) * 2013-09-05 2014-01-01 北京航空航天大学 Parallelization method for ARIA symmetric block cipher algorithm based on CUDA
CN103745447A (en) * 2014-02-17 2014-04-23 东南大学 Fast parallel achieving method for non-local average filtering
US20140173608A1 (en) * 2012-12-14 2014-06-19 Electronics And Telecommunications Research Institute Apparatus and method for predicting performance attributable to parallelization of hardware acceleration devices
CN103973431A (en) * 2014-04-16 2014-08-06 华南师范大学 AES parallel implementation method based on OpenCL
CN104020983A (en) * 2014-06-16 2014-09-03 上海大学 KNN-GPU acceleration method based on OpenCL
US20150324707A1 (en) * 2014-05-12 2015-11-12 Palo Alto Research Center Incorporated System and method for selecting useful smart kernels for general-purpose gpu computing
CN105743644A (en) * 2016-01-26 2016-07-06 广东技术师范学院 Mask encryption device of multivariable quadratic equation
CN105933111A (en) * 2016-05-27 2016-09-07 华南师范大学 Bitslicing-KLEIN rapid implementation method based on OpenCL
CN107392429A (en) * 2017-06-22 2017-11-24 东南大学 Under the direction of energy that a kind of GPU accelerates method is pushed away before trigonometric equation group

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1870499A (en) * 2005-01-11 2006-11-29 丁津泰 Method for generating multiple variable commom key password system
CN101008937A (en) * 2007-02-06 2007-08-01 中国科学院研究生院 Computer implementation method of multiplier over finite field and computer implementation method of large matrix element elimination
CN101916185A (en) * 2010-08-27 2010-12-15 上海交通大学 Automatic parallelization acceleration method of serial programs running under multi-core platform
CN101977109A (en) * 2010-10-21 2011-02-16 李晨 Linear mixed high ordered equation public key algorithm
CN102006170A (en) * 2010-11-11 2011-04-06 西安理工大学 Ring signature method for anonymizing information based on MQ problem in finite field
CN102214086A (en) * 2011-06-20 2011-10-12 复旦大学 General-purpose parallel acceleration algorithm based on multi-core processor
CN102811125A (en) * 2012-08-16 2012-12-05 西北工业大学 Certificateless multi-receiver signcryption method with multivariate-based cryptosystem
US20140173608A1 (en) * 2012-12-14 2014-06-19 Electronics And Telecommunications Research Institute Apparatus and method for predicting performance attributable to parallelization of hardware acceleration devices
CN103490877A (en) * 2013-09-05 2014-01-01 北京航空航天大学 Parallelization method for ARIA symmetric block cipher algorithm based on CUDA
CN103745447A (en) * 2014-02-17 2014-04-23 东南大学 Fast parallel achieving method for non-local average filtering
CN103973431A (en) * 2014-04-16 2014-08-06 华南师范大学 AES parallel implementation method based on OpenCL
US20150324707A1 (en) * 2014-05-12 2015-11-12 Palo Alto Research Center Incorporated System and method for selecting useful smart kernels for general-purpose gpu computing
CN104020983A (en) * 2014-06-16 2014-09-03 上海大学 KNN-GPU acceleration method based on OpenCL
CN105743644A (en) * 2016-01-26 2016-07-06 广东技术师范学院 Mask encryption device of multivariable quadratic equation
CN105933111A (en) * 2016-05-27 2016-09-07 华南师范大学 Bitslicing-KLEIN rapid implementation method based on OpenCL
CN107392429A (en) * 2017-06-22 2017-11-24 东南大学 Under the direction of energy that a kind of GPU accelerates method is pushed away before trigonometric equation group

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ARMIN AHMADZADEH ET AL.: "A high-performance and energy-efficient exhaustive key search approach via GPU on DES-like cryptosystems", 《JOURNAL OF SUPERCOMPUTING 》 *
王后珍等: "多变量代数理论及其在密码学中的应用", 《北京工业大学学报》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109918125A (en) * 2019-03-20 2019-06-21 浪潮商用机器有限公司 GPU configuration method and device based on OpenPOWER framework
CN112131583A (en) * 2020-09-02 2020-12-25 上海科技大学 GPU-based model counting and constraint solving method
CN112131583B (en) * 2020-09-02 2023-12-15 上海科技大学 Model counting and constraint solving method based on GPU

Also Published As

Publication number Publication date
CN108510429B (en) 2021-11-02

Similar Documents

Publication Publication Date Title
Agrawal et al. FAB: An FPGA-based accelerator for bootstrappable fully homomorphic encryption
CN108364065A (en) Adopt the microprocessor of booth multiplication
Su et al. FPGA-based hardware accelerator for leveled ring-lwe fully homomorphic encryption
CN100504758C (en) Multiple-word multiplication-accumulation circuit and montgomery modular multiplication-accumulation circuit
Cao et al. Accelerating fully homomorphic encryption over the integers with super-size hardware multiplier and modular reduction
Geelen et al. BASALISC: programmable asynchronous hardware accelerator for BGV fully homomorphic encryption
CN101479698A (en) Mulptiplying two numbers
CN108510429A (en) A kind of multivariable cryptographic algorithm parallelization accelerated method based on GPU
US20170102942A1 (en) Variable Length Execution Pipeline
Su et al. A highly unified reconfigurable multicore architecture to speed up NTT/INTT for homomorphic polynomial multiplication
CN102122241A (en) Analog multiplier/divider applicable to prime field and polynomial field
Alam et al. Novel parallel algorithms for fast multi-GPU-based generation of massive scale-free networks
US7849125B2 (en) Efficient computation of the modulo operation based on divisor (2n-1)
Bisson et al. A GPU implementation of the sparse deep neural network graph challenge
Ni et al. A high-performance SIKE hardware accelerator
Wang et al. Saber on ESP32
KR20130128695A (en) Modular arithmetic unit and secure system having the same
Bos et al. ECC2K-130 on cell CPUs
CN106371803B (en) Calculation method and computing device for Montgomery domain
Henry et al. Solving discrete logarithms in smooth-order groups with CUDA
CN108288091A (en) Adopt the microprocessor of booth multiplication
Seo SIKE on GPU: Accelerating supersingular isogeny-based key encapsulation mechanism on graphic processing units
Zheng Encrypted cloud using GPUs
WO2023141936A1 (en) Techniques and devices for efficient montgomery multiplication with reduced dependencies
WO2019178735A1 (en) Gpu-based parallel acceleration method for multi-variable password algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant