CN113221193A - SM2 digital signature and signature verification quick implementation method and system based on GPU - Google Patents

SM2 digital signature and signature verification quick implementation method and system based on GPU Download PDF

Info

Publication number
CN113221193A
CN113221193A CN202110613751.0A CN202110613751A CN113221193A CN 113221193 A CN113221193 A CN 113221193A CN 202110613751 A CN202110613751 A CN 202110613751A CN 113221193 A CN113221193 A CN 113221193A
Authority
CN
China
Prior art keywords
signature
optimization
elliptic curve
gpu
modular
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110613751.0A
Other languages
Chinese (zh)
Other versions
CN113221193B (en
Inventor
邱卫东
张崴城
王杨德
田昊
郭捷
唐鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202110613751.0A priority Critical patent/CN113221193B/en
Publication of CN113221193A publication Critical patent/CN113221193A/en
Application granted granted Critical
Publication of CN113221193B publication Critical patent/CN113221193B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures

Abstract

A method and a system for rapidly realizing SM2 digital signature and signature verification based on a GPU are characterized in that information to be signed or information to be verified is preprocessed at a CPU end to obtain a preprocessing result comprising a public key private key, a random number, a compression function SM3 precomputation, GPU initialization and a lookup table, then after the preprocessing result is mapped by the GPU end to a Jacobian weighted projection coordinate system, signature processing or signature verification processing of modular operation optimization processing and compression function optimization is further carried out. The invention has simple implementation and stable performance, and the operational throughput rate can reach 9.1 x 105ops, the calculation efficiency of the SM2 signature and signature verification algorithm is greatly improved.

Description

SM2 digital signature and signature verification quick implementation method and system based on GPU
Technical Field
The invention relates to a technology in the field of information security, in particular to a method and a system for quickly realizing SM2 digital signature and signature verification based on a GPU.
Background
The key to realizing an Elliptic Curve Cryptography (ECC) algorithm on general computing hardware is to realize large integer modular multiplication and modular division operations in a finite field and point addition and point multiplication operations in Elliptic Curve group operations. One common practice is to utilize the SSE instruction set provided by Intel. However, performance is not very desirable, limited by the CPU hardware architecture. In recent years, with the performance of the GPU in general computing being improved continuously, optimization and rapid implementation technologies for asymmetric cryptographic algorithms are developed continuously, Fangyu Zheng and the like propose a method for fully utilizing the floating-point computing capability of the GPU, and discuss the implementation of finite-field large integer modular multiplication, modular division, modular exponentiation and modular inverse operation and algorithm optimization under a GPU platform respectively; based on a GPU, Wuqiong Pan, Fangyu Zheng and the like realize a high-speed ECC signature verification server and realize and optimize the point addition and multiplication operation of elliptic curve group operation. However, most of these prior arts only aim at high-speed implementation of two elliptic curve public key algorithms, namely RSA and ECDSA, on the GPU, and a high-speed implementation method and system of the SM2 elliptic curve public key algorithm on the GPU are not found at present.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a method and a system for quickly realizing SM2 digital signature and signature verification based on a GPU, which are simple to implement, stable in performance and capable of achieving the operational throughput rate of 9.1 x 105ops, the calculation efficiency of the SM2 signature and signature verification algorithm is greatly improved.
The invention is realized by the following technical scheme:
the invention relates to a rapid realization method of SM2 digital signature and signature verification based on a GPU, which comprises the steps of preprocessing information to be signed or information to be verified at a CPU end to obtain a preprocessing result comprising a public key private key, a random number, a compression function SM3 precomputation, GPU initialization and a lookup table, and then mapping a Jacobian weighted projection coordinate system on the preprocessing result at the GPU end to further perform signature processing or signature verification processing of modular operation optimization and compression function optimization.
The pretreatment is as follows: at the CPU end, performing GPU initialization and pre-calculation of a public key, a private key, a random number and a compression function SM3 on a first SM2 digital signature task or a second signature verification task so as to facilitate the operation of the subsequent GPU end; and for the signature task, pre-calculating a [ k ] time point lookup table to improve the efficiency of subsequent GPU signature processing.
The signature processing is carried out by matching a lookup table pre-generated by a CPU (Central processing Unit) end with a GPU (graphics processing Unit) end to carry out comb signature method operation so as to obtain a signature result, and the signature checking processing is carried out by carrying out binary expansion method operation at the GPU end so as to obtain a signature checking result.
The modular operation optimization processing comprises the following steps: the method comprises the steps of optimizing modular multiplication operation and modular division operation which consume a large amount of computing resources in a large amount of large-integer modular operation performed on a signature/signature verification task, and reducing the computing complexity, wherein the modular multiplication operation optimization adopts Montgomery reduction Algorithm (Montgomery Reduce Algorithm) and Montgomery multiplication (Montgomery Multiple Algorithm) as a substitute optimization Algorithm of the modular multiplication operation; the mode division operation optimization adopts a mode of combining Fermat's Little Theorem (Fermat's Little theory) with an Extended Euclidean Algorithm (Extended Euclidean Inverse Algorithm) as a substitute optimization Algorithm of the mode division operation.
The signature processing or signature verification processing optimized by the compression function refers to: optimizing a hash algorithm SM3 in an SM2 algorithm by adopting a GPU structure optimization technology, and specifically comprising the following steps: the method comprises the steps of instruction optimization and register multiplexing, wherein the instruction optimization optimizes the logic operation and the circular shift operation of SM3 hash algorithm operation adopted in an SM2 algorithm by using a bitselect function and a rotate function which are built in OpenCL; register multiplexing is carried out in a 64-step message expansion stage by SM3 hash algorithm operation, and 64 words are multiplexed by using a register space with 16 words.
The invention relates to a system for realizing the method, which comprises the following steps: the system comprises an elliptic curve expression equation coordinate system mapping module for reducing the overall operation complexity, a modular operation calculation complexity optimization module for obtaining the optimal operation efficiency of modular multiplication operation and modular division operation, a compression function optimization module for realizing the performance optimization of a message hash algorithm and an elliptic curve multi-point operation optimization module for improving the signature and signature verification operation efficiency, wherein: the elliptic curve expression equation coordinate system mapping module receives data information preprocessed by the CPU end to perform coordinate axis mapping processing on an elliptic curve and output the mapped elliptic curve to the calculation of a subsequent signature/signature verification task, the module operation computation complexity optimization module is responsible for optimizing a large number of large integer module operations contained in the elliptic curve multiple point operation and outputting results of the large integer module operations to the elliptic curve multiple point operation optimization module at a high speed, the compression function optimization module is responsible for optimizing SM3 compression function operations in an SM2 signing/signature task and outputting SM3 compression operation results to the elliptic curve multiple point operation optimization module, the elliptic curve multiple point operation optimization module receives the mapped elliptic curve information to perform optimization processing of the multiple point operations in the SM2 signing/signature task, and the large integer module operation results and the SM3 hash operation results are combined to output final SM2 signing/signature results.
The elliptic curve expression equation coordinate system mapping module performs group multiplication point operation through a Jacobian weighted projection coordinate system in the process of performing coordinate axis mapping on an elliptic curve, so that the massive modular inverse operation under an affine coordinate system is avoided.
The Jacobian weighted projection coordinate system refers to: fpThe above elliptic curve equation is simplified to y in the standard projective coordinate system2=x3+axz4+bz6Wherein a, b ∈ FpAnd 4a is3+27b2Not equal to 0mod p; the set of points on the elliptic curve is: e (F)p)={(x,y,z)|x,y,z∈FpAnd satisfies the curve equation y2=x3+axz4+bz6For (x)1,y1,z1) And (x)2,y2,z2) When there is a certain u e FpAnd u ≠ 0, such that: x is the number of1=u2x2,y1=u3y2,z1=uz2Then, these two triplets are called as equivalent, representing the same point. An elliptic curve is usually represented by an affine coordinate system under which a set of points E (E) on the elliptic curvep)={(X,Y)|X,Y∈FpAnd satisfies the curve equation Y2=X3+ aX + b, wherein (4 a)2+27b2) Not equal to 0(mod p) }, let X ═ X · z2,y=Y·z3The elliptic curve can then be converted from an affine coordinate system to a Jacobian-emphasized projective coordinate system representation. On the contrary, when z is not equal to 0, X is recorded as X/z2,Y=y/z3Then, the Jacobian accentuation projection coordinate system can be converted into an affine coordinate system representation: y is2=X3+ aX + b. Meanwhile, when z is 0, a point in the affine coordinate system corresponding to (1,1,0) is the infinity point O.
The module for optimizing the computation complexity of the modular operation comprises: an optimized modular multiplication arithmetic unit and an optimized modular division arithmetic unit, wherein: the optimization modular multiplication operation unit adopts Montgomery Reduce Algorithm and Montgomery multiplication (Montgomery Multiple Algorithm) to carry out optimization operation of modular multiplication operation to obtain a large-integer high-speed modular multiplication operation result; the optimization modular division operation unit adopts a mode of combining Fermat's Little Theorem (Fermat's Little Theorem) with an Extended Euclidean Algorithm (Extended Euclidean Inverse Algorithm), and when the number of parallel signatures/signature checks is less than 2048, the Fermat's Little Theorem is used at the CPU end for modular division optimization operation; and when the number of the parallel signatures/signature verifications is more than or equal to 2048, performing modular division optimization operation by using an extended Euclidean algorithm at a GPU end to obtain a large-integer high-speed modular division operation result.
The compression function optimization module adopts GPU structure optimization technology to optimize and realize a hash algorithm SM3 in an SM2 algorithm, and comprises: an instruction optimization unit and a register multiplexing unit, wherein: the instruction optimization unit optimizes the logic operation and the cyclic shift operation of the SM3 compression function operation by using a bitselect function and a rotate function which are built in OpenCL; the register multiplexing unit adopts a register space of 16 words to multiplex 64 words to a 64-step message expansion stage in SM3 compressed function operation, and space overhead in the SM3 operation process is optimized.
The elliptic curve multiple point operation optimization module comprises: an elliptic curve fixed point multiplication unit for digital signatures and an elliptic curve unknown point multiplication unit for signature verification, wherein: the elliptic curve fixed point multiplication unit adopts a comb method to calculate to obtain a fixed point multiplication result in the signature process aiming at the [ k ] times of the base point G in the elliptic curve group operation, and the elliptic curve unknown point multiplication unit adopts a binary expansion method to obtain an unknown point multiplication result in the signature verification process aiming at the [ k ] times of the base point G in the elliptic curves of different public keys.
The comb method specifically comprises the following steps: firstly, 256 fixed point multiplications are calculated, 0 × P,1 × P, 255 × P, the length of a scalar k is set to be L-bit, and then the scalar k is divided into byte strings, namely k ═ k ·7||k6||k5||k4||k3||k2||k1||k0Then the word length of each segment is L/8. In the calculation, only k needs to be calculatediIn [ k ]]G pre-stored value k in look-up tablei]G, and then accumulating until the byte string k is traversed.
The binary expansion method specifically comprises the following steps: will base point [ k ] of G]The k in the doubling point problem is converted into a binary byte string k ═ k (k)t-1,...k1,k0)2Let P ═ G and Q ∞ then from k0To kt-1Starting to traverse the binary byte string, judging the current k in each traversaliValue if kiQ ← Q + P if 1, or kiQ ← Q when 0, then P ← 2P and traverse to the next ki+1The value is obtained. After traversing byte string, Q ═ k is obtained]G。
Technical effects
Compared with the prior art, the SM2 elliptic curve public key algorithm is subjected to detailed analysis and optimization under a GPU platform, the positive effect of optimization is verified through an experimental mode, and finally the signature/signature verification throughput under the GPU platform is approximately 9.1 x 105ops, throughput 3 x 10 with currently common FPGA platforms3Compared with ops, the method has a great improvement effect, which means that the method processes more SM2 digital signature/signature verification requests in unit time.
Drawings
FIG. 1 is a schematic diagram of the system of the present invention;
FIG. 2 is a flow chart of the system module signature/signature verification of the present invention;
fig. 3 is a schematic diagram of a lookup table structure.
Detailed Description
As shown in fig. 1, the present embodiment relates to a fast implementation system for implementing an elliptic curve digital signature and signature verification algorithm for SM2 using a GPU platform, which includes: the device comprises an elliptic curve multi-point operation optimization module, a module operation calculation complexity optimization module, an elliptic curve expression equation coordinate system mapping module and a compression function optimization module.
As shown in fig. 2, the system performs fast implementation of the SM2 elliptic curve digital signature and signature verification optimization algorithm by the following means:
step 1) firstly, carrying out OpenCL platform initialization: selecting an OpenCL platform and device through an OpenCL Application Programming Interface (API), creating a device context, creating a Kernel, and initializing a memory space.
And step 2) pre-calculating a signature/signature verification task at the CPU end, wherein the pre-calculation comprises pre-calculating a public key and a private key, a random number and a compression function SM 3. For the signature task, the [ k ] time point lookup table needs to be additionally calculated, specifically: 256 fixed point multiplications, 0P, 1P, 255P are calculated and the data is arranged into the data structure shown in fig. 3.
And 3) carrying out data transmission between the memory and the GPU video memory by using an OpenCL interface function clenqueReadBuffer (), and carrying out operation synchronization between the CPU and the GPU by using a clFlush () function and a clFinish () function.
And 4) using an elliptic curve expression equation coordinate system mapping module contained in the system, and selecting a Jacobian weighted coordinate system to perform group point operation in the process of mapping the coordinate axes of the elliptic curve, thereby avoiding the modular inverse operation which appears in large quantities under an affine coordinate system. The mathematical representation of the Jacobian emphasis coordinate system is: at FpThe above elliptic curve equation is simplified to y in the standard projective coordinate system2=x3+axz4+bz6Wherein a, b ∈ FpAnd 4a is3+27b2Not equal to 0 modp. The set of points on the elliptic curve is: e (F)p)={(x,y,z)|x,y,z∈FpAnd satisfies the curve equation y2=x3+axz4+bz6}. For (x)1,y1,z1) And (x)2,y2,z2) When there is a certain u e FpAnd u ≠ 0, such that: x is the number of1=u2x2,y1=u3y2,z1=uz2Then, these two triplets are called as equivalent, representing the same point. An elliptic curve is usually represented by an affine coordinate system under which a set of points E (E) on the elliptic curvep)={(X,Y)|X,Y∈FpAnd satisfies the curve equation Y2=X3+ aX + b, wherein (4 a)2+27b2) Not equal to 0(mod p) }, let X ═ X · z2,y=Y·z3The elliptic curve can then be converted from an affine coordinate system to a Jacobian-emphasized projective coordinate system representation. On the contrary, when z is not equal to 0, X is recorded as X/z2,Y=y/z3Then, the Jacobian accentuation projection coordinate system can be converted into an affine coordinate system representation: y is2=X3+ aX + b. Meanwhile, when z is 0, a point in the affine coordinate system corresponding to (1,1,0) is the infinity point O.
Step 5) for the signature task, the fixed point multiplication problem is mainly faced, namely [ k ] of the base point G in the elliptic curve group operation]The point doubling problem is calculated by adopting a comb method, and the specific algorithm is as follows: let the scalar k be L-bit long, then divide the scalar value k into byte strings, i.e. k ═ k7||k6||k5||k4||k3||k2||k1||k0Then the word length of each segment is L/8. In the calculation, only k needs to be calculatediIn [ k ]]G pre-stored value k in look-up tablei]G, and then accumulating until the byte string k is traversed.
And 6) for the signature verification task, the problem of unknown point multiplication is mainly faced, and due to the fact that base points G of elliptic curves where different public keys are located are different, pre-calculation processing cannot be carried out during parallel calculation. At the moment, a binary expansion method is adopted, and the space efficiency is exchanged by time cost, namely [ k ] of the base point G]The k in the doubling point problem is converted into a binary byte string k ═ k (k)t-1,...k1,k0)2Let P ═ G and Q ∞ then from k0To kt-1Starting to traverse the binary byte string, judging the current k in each traversaliValue if kiQ ← Q + P if 1, or kiQ ← Q when 0, then P ← 2P, and traverse to the next value ki+1. After traversing byte string, Q ═ k is obtained]G。
In the process of performing known point multiplication and unknown point multiplication, a large number of large integer modular operations are required, wherein both the modular multiplication operation and the modular division operation consume a large number of computing resources. The module calculation complexity optimization module included in the system mainly optimizes the two operations, specifically:
1) for the large integer modular multiplication operation, a Montgomery reduction Algorithm (Montgomery Reduce Algorithm) and Montgomery multiplication (Montgomery Multiple Algorithm) are adopted to carry out a substitution Algorithm of the modular multiplication operation, and a modular exponentiation operation link with extremely high calculation complexity is avoided.
2) For large integer digital division operation, a mode of combining Fermat's small Theorem (Fermat's Little Theorem) with an Extended Euclidean Algorithm (Extended Euclidean Inverse Algorithm) is adopted, and when the number of parallel signatures/signature checks is less than 2048, the Fermat's small Theorem is used at a CPU end for the modular division operation; and when the number of the parallel signatures/checks is more than or equal to 2048, performing modular division operation by using an extended Euclidean algorithm at the GPU end.
When OpenCL encoding of a GPU is specifically operated, a compression function optimization module is used to optimize an SM3 hash function included in an SM2 algorithm, and the specific method includes:
1) instruction optimization: optimizing logic operation and cyclic shift operation of SM3 adopted in SM2 algorithm by using bitselect function and rotate function built in OpenCL;
2) register multiplexing: the compression function for each 16 steps of the SM3 algorithm is only related to the 16 register values of the round. Then 64 words are spatially multiplexed using a 16 word register in the 64 step message expansion phase.
Through the steps, the signature/signature verification fast operation of the SM2 algorithm can be completed, and the operation result uses an OpenCL interface function clenqueWriteBuffer ()/clenqueReadBuffer (), namely, the memory can be read back and the result can be displayed.
In summary, in the implementation process, the elliptic curve multi-point operation optimization module is used to improve the signature and signature verification operation efficiency; using a module operation calculation complexity optimization module to obtain the optimal operation efficiency of module multiplication operation and module division operation; an ellipse curve expression equation coordinate system mapping module is used, so that the overall operation complexity is reduced; and the performance optimization of the message hash algorithm is realized by using a compression function optimization module.
The embodiment is implemented on an AMD R9290 series display card, and the specific platform settings are shown in the following table:
parameter name GPU
GPU model AMD Radeon R9 290
Framework GCN1.1
Video memory capacity 4GB
Default dominant frequency 947MHz
Calculating the number of units (CU) 40
Number of stream processors 2560
Bandwidth of video memory 320GB/s
In the environment, compared with a CPU platform, the speed-up ratio of SM2 signature can reach 74.2 and the speed-up ratio of SM2 signature verification can reach 9.4 under the concurrence of 2048 threads by using the realization of an Intel Xeon E5620 series; under the concurrence of 4096 threads, the SM2 signature acceleration ratio can reach 144.3, and the SM2 signature verification acceleration ratio can reach 18.2; under the concurrence of 8192 threads, the SM2 signature acceleration ratio can reach 288.1, and the SM2 signature verification acceleration ratio can reach 37.2, so that the effectiveness of the SM2 optimized signature/verification algorithm based on the GPU platform is fully demonstrated.
In the environment, compared with an FPGA platform, the implementation of the XilinxVirtex-6 series chip is used, and under the condition of 8192 thread concurrence, the throughput speed-up ratio of the SM2 signature algorithm can reach 45.3, and the speed-up ratio of the signature verification throughput can reach 5.2, which are both superior to the implementation of the FPGA platform.
The foregoing embodiments may be modified in many different ways by those skilled in the art without departing from the spirit and scope of the invention, which is defined by the appended claims and all changes that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims (12)

1. A rapid realization method of SM2 digital signature and signature verification based on a GPU is characterized in that preprocessing is carried out on information to be signed or information to be verified at a CPU end to obtain a preprocessing result containing a public key private key, a random number and a compression function SM3 precomputation, GPU initialization and a lookup table, then after the preprocessing result is mapped by the GPU end to a Jacobian weighted projection coordinate system, further carrying out signature processing or signature verification processing of modular operation optimization and compression function optimization.
2. The method for rapidly implementing SM2 digital signature and signature verification based on GPU of claim 1, wherein the preprocessing comprises: at the CPU end, performing GPU initialization and pre-calculation of a public key, a private key, a random number and a compression function SM3 on a first SM2 digital signature task or a second signature verification task so as to facilitate the operation of the subsequent GPU end; and for the signature task, pre-calculating a [ k ] time point lookup table to improve the efficiency of subsequent GPU signature processing.
3. The method for rapidly implementing SM2 digital signature and signature verification based on GPU of claim 1, wherein the signature process is performed by matching look-up table pre-generated by CPU end with GPU end to perform comb signature operation to obtain signature result, and the signature verification process is performed by binary expansion operation at GPU end to obtain signature verification result.
4. The method for rapidly implementing SM2 digital signature and signature verification based on GPU of claim 1, wherein the modular operation optimization processing is as follows: the method comprises the steps of optimizing modular multiplication and modular division operation which consume a large amount of computing resources in a large amount of large integer modular operation performed by a signature/signature verification task, and reducing the computing complexity, wherein the modular multiplication operation optimization adopts Montgomery reduction algorithm and Montgomery multiplication as a substitute optimization algorithm of the modular multiplication operation; the mode of combining Fermat's theorem and expanding Euclidean algorithm is adopted as the substitute optimization algorithm of the modular division operation in the modular division operation optimization.
5. The method for rapidly implementing SM2 digital signature and signature verification based on GPU of claim 1, wherein the signature processing or signature verification processing optimized by the compression function is that: optimizing a hash algorithm SM3 in an SM2 algorithm by adopting a GPU structure optimization technology, and specifically comprising the following steps: the method comprises the steps of instruction optimization and register multiplexing, wherein the instruction optimization optimizes the logic operation and the circular shift operation of SM3 hash algorithm operation adopted in an SM2 algorithm by using a bitselect function and a rotate function which are built in OpenCL; register multiplexing is carried out in a 64-step message expansion stage by SM3 hash algorithm operation, and 64 words are multiplexed by using a register space with 16 words.
6. A system for implementing the GPU-based SM2 digital signature and signature verification quick implementation method of claim 1, comprising: the system comprises an elliptic curve expression equation coordinate system mapping module for reducing the overall operation complexity, a modular operation calculation complexity optimization module for obtaining the optimal operation efficiency of modular multiplication operation and modular division operation, a compression function optimization module for realizing the performance optimization of a message hash algorithm and an elliptic curve multi-point operation optimization module for improving the signature and signature verification operation efficiency, wherein: the elliptic curve expression equation coordinate system mapping module receives data information preprocessed by the CPU end to perform coordinate axis mapping processing on an elliptic curve and output the mapped elliptic curve to the calculation of a subsequent signature/signature verification task, the module operation computation complexity optimization module is responsible for optimizing a large number of large integer module operations contained in the elliptic curve multiple point operation and outputting results of the large integer module operations to the elliptic curve multiple point operation optimization module at a high speed, the compression function optimization module is responsible for optimizing SM3 compression function operations in an SM2 signing/signature task and outputting SM3 compression operation results to the elliptic curve multiple point operation optimization module, the elliptic curve multiple point operation optimization module receives the mapped elliptic curve information to perform optimization processing of the multiple point operations in the SM2 signing/signature task, and the large integer module operation results and the SM3 hash operation results are combined to output final SM2 signing/signature results.
7. The system as claimed in claim 5, wherein the coordinate system mapping module of the elliptic curve expression equation performs a group multiplication operation through a Jacobian weighted projection coordinate system during the coordinate axis mapping process of the elliptic curve, so as to avoid a large number of modular inverse operations under an affine coordinate system, wherein the Jacobian weighted projection coordinate system is: fpThe above elliptic curve equation is simplified to y in the standard projective coordinate system2=x3+axz4+bz6Wherein a, b ∈ FpAnd 4a is3+27b2Not equal to 0 modp; the set of points on the elliptic curve is: e (F)p)={(x,y,z)|x,y,z∈FpAnd satisfy the curve equation y2=x3+axz4+bz6For (x)1,y1,z1) And (x)2,y2,z2) When there is a certain u e FpAnd u ≠ 0, such that: x is the number of1=u2x2,y1=u3y2,z1=uz2Then, the two triples are called as equivalent to represent the same point; an elliptic curve is usually represented by an affine coordinate system under which a set of points E (E) on the elliptic curvep)={(X,Y)|X,Y∈FpAnd satisfies the curve equation Y2=X3+ aX + b, wherein (4 a)2+27b2) Not equal to 0(mod p) }, let X ═ X · z2,y=Y·z3Then, the elliptic curve can be converted from an affine coordinate system to be represented by a Jacobian weighted projection coordinate system; on the contrary, when z is not equal to 0, X is recorded as X/z2,Y=y/z3Then, the Jacobian accentuation projection coordinate system can be converted into an affine coordinate system representation: y is2=X3+ aX + b; meanwhile, when z is 0, a point in the affine coordinate system corresponding to (1,1,0) is the infinity point 0.
8. The system of claim 5, wherein the modular arithmetic computational complexity optimization module comprises: an optimized modular multiplication arithmetic unit and an optimized modular division arithmetic unit, wherein: the optimized modular multiplication operation unit adopts Montgomery reduction algorithm and Montgomery multiplication to carry out optimized operation of modular multiplication operation to obtain a large-integer high-speed modular multiplication operation result; the optimization modular division operation unit adopts a mode of combining a Fermat small theorem and an extended Euclidean algorithm, and when the number of parallel signatures/signature checks is less than 2048, the Fermat small theorem is used for carrying out modular division optimization operation at a CPU end; and when the number of the parallel signatures/signature verifications is more than or equal to 2048, performing modular division optimization operation by using an extended Euclidean algorithm at a GPU end to obtain a large-integer high-speed modular division operation result.
9. The system of claim 5, wherein the compression function optimization module optimizes the hash algorithm SM3 in the SM2 algorithm by using a GPU configuration optimization technique, and the compression function optimization module comprises: an instruction optimization unit and a register multiplexing unit, wherein: the instruction optimization unit optimizes the logic operation and the cyclic shift operation of the SM3 compression function operation by using a bitselect function and a rotate function which are built in OpenCL; the register multiplexing unit adopts a register space of 16 words to multiplex 64 words to a 64-step message expansion stage in SM3 compressed function operation, and space overhead in the SM3 operation process is optimized.
10. The system of claim 5, wherein the elliptic curve multiple point operation optimization module comprises: an elliptic curve fixed point multiplication unit for digital signatures and an elliptic curve unknown point multiplication unit for signature verification, wherein: the elliptic curve fixed point multiplication unit adopts a comb method to calculate to obtain a fixed point multiplication result in the signature process aiming at the [ k ] times of the base point G in the elliptic curve group operation, and the elliptic curve unknown point multiplication unit adopts a binary expansion method to obtain an unknown point multiplication result in the signature verification process aiming at the [ k ] times of the base point G in the elliptic curves of different public keys.
11. The system of claim 5, wherein the comb method is specifically: firstly, 256 fixed point multiplications are calculated, 0 × P,1 × P, 255 × P, the length of a scalar k is set to be L-bit, and then the scalar k is divided into byte strings, namely k ═ k ·7||k6||k5||k4||k3||k2||k1||k0Each segment has a word length of L/8, and only k needs to be calculated during calculationiIn [ k ]]G pre-stored value k in look-up tablei]G, and then accumulating until the byte string k is traversed.
12. The system of claim 5, wherein the binary expansion method is specifically: will base point [ k ] of G]The k in the doubling point problem is converted into a binary byte string k ═ k (k)t-1,...k1,k0)2Let P ═ G and Q ∞ then from k0To kt-1Beginning traversal of binary byte stringsJudging the current k in each traversaliValue if kiQ ← Q + P if 1, or kiQ ← Q when 0, then P ← 2P and traverse to the next ki+1Value, after traversing byte string, there is Q ═ k]G。
CN202110613751.0A 2021-06-02 2021-06-02 SM2 digital signature and signature verification quick implementation method and system based on GPU Active CN113221193B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110613751.0A CN113221193B (en) 2021-06-02 2021-06-02 SM2 digital signature and signature verification quick implementation method and system based on GPU

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110613751.0A CN113221193B (en) 2021-06-02 2021-06-02 SM2 digital signature and signature verification quick implementation method and system based on GPU

Publications (2)

Publication Number Publication Date
CN113221193A true CN113221193A (en) 2021-08-06
CN113221193B CN113221193B (en) 2022-07-29

Family

ID=77082340

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110613751.0A Active CN113221193B (en) 2021-06-02 2021-06-02 SM2 digital signature and signature verification quick implementation method and system based on GPU

Country Status (1)

Country Link
CN (1) CN113221193B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113676335A (en) * 2021-10-21 2021-11-19 飞天诚信科技股份有限公司 Method and device for realizing signature in security chip
CN113783702A (en) * 2021-09-28 2021-12-10 南京宁麒智能计算芯片研究院有限公司 Hardware implementation method and system for elliptic curve digital signature and signature verification
CN114205085A (en) * 2021-12-03 2022-03-18 东北大学 Optimization processing method of SM2 and transformation method of super book fabric platform

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103475469A (en) * 2013-09-10 2013-12-25 中国科学院数据与通信保护研究教育中心 Method and device for achieving SM2 algorithm with combination of CPU and GPU
CN103532710A (en) * 2013-09-26 2014-01-22 中国科学院数据与通信保护研究教育中心 Implementation method and device for GPU (Graphics Processing Unit)-based SM2 (Streaming Multiprocessor 2) algorithm
CN107147488A (en) * 2017-03-24 2017-09-08 广东工业大学 A kind of signature sign test system and method based on SM2 enciphering and deciphering algorithms
US20180121388A1 (en) * 2016-11-01 2018-05-03 Nvidia Corporation Symmetric block sparse matrix-vector multiplication
CN108063758A (en) * 2017-11-27 2018-05-22 众安信息技术服务有限公司 For the node in the signature verification method of block chain network and block chain network
CN109600233A (en) * 2019-01-15 2019-04-09 西安电子科技大学 Group ranking mark based on SM2 Digital Signature Algorithm signs and issues method
CN110086602A (en) * 2019-04-16 2019-08-02 上海交通大学 The Fast implementation of SM3 cryptographic Hash algorithms based on GPU
WO2019174402A1 (en) * 2018-03-14 2019-09-19 西安西电捷通无线网络通信股份有限公司 Group membership issuing method and device for digital group signature
CN110365481A (en) * 2019-07-04 2019-10-22 上海交通大学 The optimization of the close SM2 algorithm of state is accelerated to realize system and method
CN111275605A (en) * 2018-12-04 2020-06-12 畅想科技有限公司 Buffer checker
CN112187469A (en) * 2020-09-21 2021-01-05 浙江省数字安全证书管理有限公司 SM2 multi-party collaborative digital signature method and system based on key factor
CN112887081A (en) * 2020-09-04 2021-06-01 深圳奥联信息安全技术有限公司 SM 2-based signature verification method, device and system

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103475469A (en) * 2013-09-10 2013-12-25 中国科学院数据与通信保护研究教育中心 Method and device for achieving SM2 algorithm with combination of CPU and GPU
CN103532710A (en) * 2013-09-26 2014-01-22 中国科学院数据与通信保护研究教育中心 Implementation method and device for GPU (Graphics Processing Unit)-based SM2 (Streaming Multiprocessor 2) algorithm
US20180121388A1 (en) * 2016-11-01 2018-05-03 Nvidia Corporation Symmetric block sparse matrix-vector multiplication
CN107147488A (en) * 2017-03-24 2017-09-08 广东工业大学 A kind of signature sign test system and method based on SM2 enciphering and deciphering algorithms
CN108063758A (en) * 2017-11-27 2018-05-22 众安信息技术服务有限公司 For the node in the signature verification method of block chain network and block chain network
WO2019174402A1 (en) * 2018-03-14 2019-09-19 西安西电捷通无线网络通信股份有限公司 Group membership issuing method and device for digital group signature
CN111275605A (en) * 2018-12-04 2020-06-12 畅想科技有限公司 Buffer checker
CN109600233A (en) * 2019-01-15 2019-04-09 西安电子科技大学 Group ranking mark based on SM2 Digital Signature Algorithm signs and issues method
CN110086602A (en) * 2019-04-16 2019-08-02 上海交通大学 The Fast implementation of SM3 cryptographic Hash algorithms based on GPU
CN110365481A (en) * 2019-07-04 2019-10-22 上海交通大学 The optimization of the close SM2 algorithm of state is accelerated to realize system and method
CN112887081A (en) * 2020-09-04 2021-06-01 深圳奥联信息安全技术有限公司 SM 2-based signature verification method, device and system
CN112187469A (en) * 2020-09-21 2021-01-05 浙江省数字安全证书管理有限公司 SM2 multi-party collaborative digital signature method and system based on key factor

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113783702A (en) * 2021-09-28 2021-12-10 南京宁麒智能计算芯片研究院有限公司 Hardware implementation method and system for elliptic curve digital signature and signature verification
CN113676335A (en) * 2021-10-21 2021-11-19 飞天诚信科技股份有限公司 Method and device for realizing signature in security chip
CN113676335B (en) * 2021-10-21 2021-12-28 飞天诚信科技股份有限公司 Method and device for realizing signature in security chip
CN114205085A (en) * 2021-12-03 2022-03-18 东北大学 Optimization processing method of SM2 and transformation method of super book fabric platform

Also Published As

Publication number Publication date
CN113221193B (en) 2022-07-29

Similar Documents

Publication Publication Date Title
CN113221193B (en) SM2 digital signature and signature verification quick implementation method and system based on GPU
US7904498B2 (en) Modular multiplication processing apparatus
CN113628094B (en) High-throughput SM2 digital signature computing system and method based on GPU
CN109145616B (en) SM2 encryption, signature and key exchange implementation method and system based on efficient modular multiplication
Dai et al. NTRU modular lattice signature scheme on CUDA GPUs
Chen et al. Faster multiplication for long binary polynomials
CN112134704B (en) Sm2 performance optimization implementing method
JP4423900B2 (en) Scalar multiplication calculation method, apparatus and program for elliptic curve cryptography
US20230246806A1 (en) Efficient masking of secure data in ladder-type cryptographic computations
Lin et al. Efficient parallel RSA decryption algorithm for many-core GPUs with CUDA
CN110224829B (en) Matrix-based post-quantum encryption method and device
CN111917548B (en) Elliptic curve digital signature method based on GPU and CPU heterogeneous structure
Kamal et al. Enhanced implementation of the NTRUEncrypt algorithm using graphics cards
JP4692022B2 (en) Scalar multiplication apparatus and program for elliptic curve cryptography
JP2007526513A (en) Method of element power or scalar multiplication
US11954487B2 (en) Techniques, devices, and instruction set architecture for efficient modular division and inversion
JP2011081594A (en) Data processor and data processing program
JP3796867B2 (en) Prime number determination method and apparatus
CN115276960B (en) Device and method for realizing fast modular inverse chip on SM2 Montgomery domain
CN113971015B (en) UIA2 computing circuit, data processing method, chip, electronic device and storage medium
JP2005316038A (en) Scalar multiple computing method, device, and program in elliptic curve cryptosystem
Wu et al. Modular arithmetic analyses for RSA cryptosystem
JP6614979B2 (en) Encryption apparatus, encryption method, and encryption program
JP2006309201A (en) Multiplex scalar multiplying operation device in elliptic curve cryptosystem, signature verification device, and programs
KR101775597B1 (en) High speed modulo calculation apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant