CN113221193A - SM2 digital signature and signature verification quick implementation method and system based on GPU - Google Patents
SM2 digital signature and signature verification quick implementation method and system based on GPU Download PDFInfo
- Publication number
- CN113221193A CN113221193A CN202110613751.0A CN202110613751A CN113221193A CN 113221193 A CN113221193 A CN 113221193A CN 202110613751 A CN202110613751 A CN 202110613751A CN 113221193 A CN113221193 A CN 113221193A
- Authority
- CN
- China
- Prior art keywords
- signature
- optimization
- elliptic curve
- gpu
- modular
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/64—Protecting data integrity, e.g. using checksums, certificates or signatures
Abstract
A method and a system for rapidly realizing SM2 digital signature and signature verification based on a GPU are characterized in that information to be signed or information to be verified is preprocessed at a CPU end to obtain a preprocessing result comprising a public key private key, a random number, a compression function SM3 precomputation, GPU initialization and a lookup table, then after the preprocessing result is mapped by the GPU end to a Jacobian weighted projection coordinate system, signature processing or signature verification processing of modular operation optimization processing and compression function optimization is further carried out. The invention has simple implementation and stable performance, and the operational throughput rate can reach 9.1 x 105ops, the calculation efficiency of the SM2 signature and signature verification algorithm is greatly improved.
Description
Technical Field
The invention relates to a technology in the field of information security, in particular to a method and a system for quickly realizing SM2 digital signature and signature verification based on a GPU.
Background
The key to realizing an Elliptic Curve Cryptography (ECC) algorithm on general computing hardware is to realize large integer modular multiplication and modular division operations in a finite field and point addition and point multiplication operations in Elliptic Curve group operations. One common practice is to utilize the SSE instruction set provided by Intel. However, performance is not very desirable, limited by the CPU hardware architecture. In recent years, with the performance of the GPU in general computing being improved continuously, optimization and rapid implementation technologies for asymmetric cryptographic algorithms are developed continuously, Fangyu Zheng and the like propose a method for fully utilizing the floating-point computing capability of the GPU, and discuss the implementation of finite-field large integer modular multiplication, modular division, modular exponentiation and modular inverse operation and algorithm optimization under a GPU platform respectively; based on a GPU, Wuqiong Pan, Fangyu Zheng and the like realize a high-speed ECC signature verification server and realize and optimize the point addition and multiplication operation of elliptic curve group operation. However, most of these prior arts only aim at high-speed implementation of two elliptic curve public key algorithms, namely RSA and ECDSA, on the GPU, and a high-speed implementation method and system of the SM2 elliptic curve public key algorithm on the GPU are not found at present.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a method and a system for quickly realizing SM2 digital signature and signature verification based on a GPU, which are simple to implement, stable in performance and capable of achieving the operational throughput rate of 9.1 x 105ops, the calculation efficiency of the SM2 signature and signature verification algorithm is greatly improved.
The invention is realized by the following technical scheme:
the invention relates to a rapid realization method of SM2 digital signature and signature verification based on a GPU, which comprises the steps of preprocessing information to be signed or information to be verified at a CPU end to obtain a preprocessing result comprising a public key private key, a random number, a compression function SM3 precomputation, GPU initialization and a lookup table, and then mapping a Jacobian weighted projection coordinate system on the preprocessing result at the GPU end to further perform signature processing or signature verification processing of modular operation optimization and compression function optimization.
The pretreatment is as follows: at the CPU end, performing GPU initialization and pre-calculation of a public key, a private key, a random number and a compression function SM3 on a first SM2 digital signature task or a second signature verification task so as to facilitate the operation of the subsequent GPU end; and for the signature task, pre-calculating a [ k ] time point lookup table to improve the efficiency of subsequent GPU signature processing.
The signature processing is carried out by matching a lookup table pre-generated by a CPU (Central processing Unit) end with a GPU (graphics processing Unit) end to carry out comb signature method operation so as to obtain a signature result, and the signature checking processing is carried out by carrying out binary expansion method operation at the GPU end so as to obtain a signature checking result.
The modular operation optimization processing comprises the following steps: the method comprises the steps of optimizing modular multiplication operation and modular division operation which consume a large amount of computing resources in a large amount of large-integer modular operation performed on a signature/signature verification task, and reducing the computing complexity, wherein the modular multiplication operation optimization adopts Montgomery reduction Algorithm (Montgomery Reduce Algorithm) and Montgomery multiplication (Montgomery Multiple Algorithm) as a substitute optimization Algorithm of the modular multiplication operation; the mode division operation optimization adopts a mode of combining Fermat's Little Theorem (Fermat's Little theory) with an Extended Euclidean Algorithm (Extended Euclidean Inverse Algorithm) as a substitute optimization Algorithm of the mode division operation.
The signature processing or signature verification processing optimized by the compression function refers to: optimizing a hash algorithm SM3 in an SM2 algorithm by adopting a GPU structure optimization technology, and specifically comprising the following steps: the method comprises the steps of instruction optimization and register multiplexing, wherein the instruction optimization optimizes the logic operation and the circular shift operation of SM3 hash algorithm operation adopted in an SM2 algorithm by using a bitselect function and a rotate function which are built in OpenCL; register multiplexing is carried out in a 64-step message expansion stage by SM3 hash algorithm operation, and 64 words are multiplexed by using a register space with 16 words.
The invention relates to a system for realizing the method, which comprises the following steps: the system comprises an elliptic curve expression equation coordinate system mapping module for reducing the overall operation complexity, a modular operation calculation complexity optimization module for obtaining the optimal operation efficiency of modular multiplication operation and modular division operation, a compression function optimization module for realizing the performance optimization of a message hash algorithm and an elliptic curve multi-point operation optimization module for improving the signature and signature verification operation efficiency, wherein: the elliptic curve expression equation coordinate system mapping module receives data information preprocessed by the CPU end to perform coordinate axis mapping processing on an elliptic curve and output the mapped elliptic curve to the calculation of a subsequent signature/signature verification task, the module operation computation complexity optimization module is responsible for optimizing a large number of large integer module operations contained in the elliptic curve multiple point operation and outputting results of the large integer module operations to the elliptic curve multiple point operation optimization module at a high speed, the compression function optimization module is responsible for optimizing SM3 compression function operations in an SM2 signing/signature task and outputting SM3 compression operation results to the elliptic curve multiple point operation optimization module, the elliptic curve multiple point operation optimization module receives the mapped elliptic curve information to perform optimization processing of the multiple point operations in the SM2 signing/signature task, and the large integer module operation results and the SM3 hash operation results are combined to output final SM2 signing/signature results.
The elliptic curve expression equation coordinate system mapping module performs group multiplication point operation through a Jacobian weighted projection coordinate system in the process of performing coordinate axis mapping on an elliptic curve, so that the massive modular inverse operation under an affine coordinate system is avoided.
The Jacobian weighted projection coordinate system refers to: fpThe above elliptic curve equation is simplified to y in the standard projective coordinate system2=x3+axz4+bz6Wherein a, b ∈ FpAnd 4a is3+27b2Not equal to 0mod p; the set of points on the elliptic curve is: e (F)p)={(x,y,z)|x,y,z∈FpAnd satisfies the curve equation y2=x3+axz4+bz6For (x)1,y1,z1) And (x)2,y2,z2) When there is a certain u e FpAnd u ≠ 0, such that: x is the number of1=u2x2,y1=u3y2,z1=uz2Then, these two triplets are called as equivalent, representing the same point. An elliptic curve is usually represented by an affine coordinate system under which a set of points E (E) on the elliptic curvep)={(X,Y)|X,Y∈FpAnd satisfies the curve equation Y2=X3+ aX + b, wherein (4 a)2+27b2) Not equal to 0(mod p) }, let X ═ X · z2,y=Y·z3The elliptic curve can then be converted from an affine coordinate system to a Jacobian-emphasized projective coordinate system representation. On the contrary, when z is not equal to 0, X is recorded as X/z2,Y=y/z3Then, the Jacobian accentuation projection coordinate system can be converted into an affine coordinate system representation: y is2=X3+ aX + b. Meanwhile, when z is 0, a point in the affine coordinate system corresponding to (1,1,0) is the infinity point O.
The module for optimizing the computation complexity of the modular operation comprises: an optimized modular multiplication arithmetic unit and an optimized modular division arithmetic unit, wherein: the optimization modular multiplication operation unit adopts Montgomery Reduce Algorithm and Montgomery multiplication (Montgomery Multiple Algorithm) to carry out optimization operation of modular multiplication operation to obtain a large-integer high-speed modular multiplication operation result; the optimization modular division operation unit adopts a mode of combining Fermat's Little Theorem (Fermat's Little Theorem) with an Extended Euclidean Algorithm (Extended Euclidean Inverse Algorithm), and when the number of parallel signatures/signature checks is less than 2048, the Fermat's Little Theorem is used at the CPU end for modular division optimization operation; and when the number of the parallel signatures/signature verifications is more than or equal to 2048, performing modular division optimization operation by using an extended Euclidean algorithm at a GPU end to obtain a large-integer high-speed modular division operation result.
The compression function optimization module adopts GPU structure optimization technology to optimize and realize a hash algorithm SM3 in an SM2 algorithm, and comprises: an instruction optimization unit and a register multiplexing unit, wherein: the instruction optimization unit optimizes the logic operation and the cyclic shift operation of the SM3 compression function operation by using a bitselect function and a rotate function which are built in OpenCL; the register multiplexing unit adopts a register space of 16 words to multiplex 64 words to a 64-step message expansion stage in SM3 compressed function operation, and space overhead in the SM3 operation process is optimized.
The elliptic curve multiple point operation optimization module comprises: an elliptic curve fixed point multiplication unit for digital signatures and an elliptic curve unknown point multiplication unit for signature verification, wherein: the elliptic curve fixed point multiplication unit adopts a comb method to calculate to obtain a fixed point multiplication result in the signature process aiming at the [ k ] times of the base point G in the elliptic curve group operation, and the elliptic curve unknown point multiplication unit adopts a binary expansion method to obtain an unknown point multiplication result in the signature verification process aiming at the [ k ] times of the base point G in the elliptic curves of different public keys.
The comb method specifically comprises the following steps: firstly, 256 fixed point multiplications are calculated, 0 × P,1 × P, 255 × P, the length of a scalar k is set to be L-bit, and then the scalar k is divided into byte strings, namely k ═ k ·7||k6||k5||k4||k3||k2||k1||k0Then the word length of each segment is L/8. In the calculation, only k needs to be calculatediIn [ k ]]G pre-stored value k in look-up tablei]G, and then accumulating until the byte string k is traversed.
The binary expansion method specifically comprises the following steps: will base point [ k ] of G]The k in the doubling point problem is converted into a binary byte string k ═ k (k)t-1,...k1,k0)2Let P ═ G and Q ∞ then from k0To kt-1Starting to traverse the binary byte string, judging the current k in each traversaliValue if kiQ ← Q + P if 1, or kiQ ← Q when 0, then P ← 2P and traverse to the next ki+1The value is obtained. After traversing byte string, Q ═ k is obtained]G。
Technical effects
Compared with the prior art, the SM2 elliptic curve public key algorithm is subjected to detailed analysis and optimization under a GPU platform, the positive effect of optimization is verified through an experimental mode, and finally the signature/signature verification throughput under the GPU platform is approximately 9.1 x 105ops, throughput 3 x 10 with currently common FPGA platforms3Compared with ops, the method has a great improvement effect, which means that the method processes more SM2 digital signature/signature verification requests in unit time.
Drawings
FIG. 1 is a schematic diagram of the system of the present invention;
FIG. 2 is a flow chart of the system module signature/signature verification of the present invention;
fig. 3 is a schematic diagram of a lookup table structure.
Detailed Description
As shown in fig. 1, the present embodiment relates to a fast implementation system for implementing an elliptic curve digital signature and signature verification algorithm for SM2 using a GPU platform, which includes: the device comprises an elliptic curve multi-point operation optimization module, a module operation calculation complexity optimization module, an elliptic curve expression equation coordinate system mapping module and a compression function optimization module.
As shown in fig. 2, the system performs fast implementation of the SM2 elliptic curve digital signature and signature verification optimization algorithm by the following means:
step 1) firstly, carrying out OpenCL platform initialization: selecting an OpenCL platform and device through an OpenCL Application Programming Interface (API), creating a device context, creating a Kernel, and initializing a memory space.
And step 2) pre-calculating a signature/signature verification task at the CPU end, wherein the pre-calculation comprises pre-calculating a public key and a private key, a random number and a compression function SM 3. For the signature task, the [ k ] time point lookup table needs to be additionally calculated, specifically: 256 fixed point multiplications, 0P, 1P, 255P are calculated and the data is arranged into the data structure shown in fig. 3.
And 3) carrying out data transmission between the memory and the GPU video memory by using an OpenCL interface function clenqueReadBuffer (), and carrying out operation synchronization between the CPU and the GPU by using a clFlush () function and a clFinish () function.
And 4) using an elliptic curve expression equation coordinate system mapping module contained in the system, and selecting a Jacobian weighted coordinate system to perform group point operation in the process of mapping the coordinate axes of the elliptic curve, thereby avoiding the modular inverse operation which appears in large quantities under an affine coordinate system. The mathematical representation of the Jacobian emphasis coordinate system is: at FpThe above elliptic curve equation is simplified to y in the standard projective coordinate system2=x3+axz4+bz6Wherein a, b ∈ FpAnd 4a is3+27b2Not equal to 0 modp. The set of points on the elliptic curve is: e (F)p)={(x,y,z)|x,y,z∈FpAnd satisfies the curve equation y2=x3+axz4+bz6}. For (x)1,y1,z1) And (x)2,y2,z2) When there is a certain u e FpAnd u ≠ 0, such that: x is the number of1=u2x2,y1=u3y2,z1=uz2Then, these two triplets are called as equivalent, representing the same point. An elliptic curve is usually represented by an affine coordinate system under which a set of points E (E) on the elliptic curvep)={(X,Y)|X,Y∈FpAnd satisfies the curve equation Y2=X3+ aX + b, wherein (4 a)2+27b2) Not equal to 0(mod p) }, let X ═ X · z2,y=Y·z3The elliptic curve can then be converted from an affine coordinate system to a Jacobian-emphasized projective coordinate system representation. On the contrary, when z is not equal to 0, X is recorded as X/z2,Y=y/z3Then, the Jacobian accentuation projection coordinate system can be converted into an affine coordinate system representation: y is2=X3+ aX + b. Meanwhile, when z is 0, a point in the affine coordinate system corresponding to (1,1,0) is the infinity point O.
Step 5) for the signature task, the fixed point multiplication problem is mainly faced, namely [ k ] of the base point G in the elliptic curve group operation]The point doubling problem is calculated by adopting a comb method, and the specific algorithm is as follows: let the scalar k be L-bit long, then divide the scalar value k into byte strings, i.e. k ═ k7||k6||k5||k4||k3||k2||k1||k0Then the word length of each segment is L/8. In the calculation, only k needs to be calculatediIn [ k ]]G pre-stored value k in look-up tablei]G, and then accumulating until the byte string k is traversed.
And 6) for the signature verification task, the problem of unknown point multiplication is mainly faced, and due to the fact that base points G of elliptic curves where different public keys are located are different, pre-calculation processing cannot be carried out during parallel calculation. At the moment, a binary expansion method is adopted, and the space efficiency is exchanged by time cost, namely [ k ] of the base point G]The k in the doubling point problem is converted into a binary byte string k ═ k (k)t-1,...k1,k0)2Let P ═ G and Q ∞ then from k0To kt-1Starting to traverse the binary byte string, judging the current k in each traversaliValue if kiQ ← Q + P if 1, or kiQ ← Q when 0, then P ← 2P, and traverse to the next value ki+1. After traversing byte string, Q ═ k is obtained]G。
In the process of performing known point multiplication and unknown point multiplication, a large number of large integer modular operations are required, wherein both the modular multiplication operation and the modular division operation consume a large number of computing resources. The module calculation complexity optimization module included in the system mainly optimizes the two operations, specifically:
1) for the large integer modular multiplication operation, a Montgomery reduction Algorithm (Montgomery Reduce Algorithm) and Montgomery multiplication (Montgomery Multiple Algorithm) are adopted to carry out a substitution Algorithm of the modular multiplication operation, and a modular exponentiation operation link with extremely high calculation complexity is avoided.
2) For large integer digital division operation, a mode of combining Fermat's small Theorem (Fermat's Little Theorem) with an Extended Euclidean Algorithm (Extended Euclidean Inverse Algorithm) is adopted, and when the number of parallel signatures/signature checks is less than 2048, the Fermat's small Theorem is used at a CPU end for the modular division operation; and when the number of the parallel signatures/checks is more than or equal to 2048, performing modular division operation by using an extended Euclidean algorithm at the GPU end.
When OpenCL encoding of a GPU is specifically operated, a compression function optimization module is used to optimize an SM3 hash function included in an SM2 algorithm, and the specific method includes:
1) instruction optimization: optimizing logic operation and cyclic shift operation of SM3 adopted in SM2 algorithm by using bitselect function and rotate function built in OpenCL;
2) register multiplexing: the compression function for each 16 steps of the SM3 algorithm is only related to the 16 register values of the round. Then 64 words are spatially multiplexed using a 16 word register in the 64 step message expansion phase.
Through the steps, the signature/signature verification fast operation of the SM2 algorithm can be completed, and the operation result uses an OpenCL interface function clenqueWriteBuffer ()/clenqueReadBuffer (), namely, the memory can be read back and the result can be displayed.
In summary, in the implementation process, the elliptic curve multi-point operation optimization module is used to improve the signature and signature verification operation efficiency; using a module operation calculation complexity optimization module to obtain the optimal operation efficiency of module multiplication operation and module division operation; an ellipse curve expression equation coordinate system mapping module is used, so that the overall operation complexity is reduced; and the performance optimization of the message hash algorithm is realized by using a compression function optimization module.
The embodiment is implemented on an AMD R9290 series display card, and the specific platform settings are shown in the following table:
parameter name | GPU |
GPU model | AMD Radeon R9 290 |
Framework | GCN1.1 |
Video memory capacity | 4GB |
Default dominant frequency | 947MHz |
Calculating the number of units (CU) | 40 |
Number of stream processors | 2560 |
Bandwidth of video memory | 320GB/s |
In the environment, compared with a CPU platform, the speed-up ratio of SM2 signature can reach 74.2 and the speed-up ratio of SM2 signature verification can reach 9.4 under the concurrence of 2048 threads by using the realization of an Intel Xeon E5620 series; under the concurrence of 4096 threads, the SM2 signature acceleration ratio can reach 144.3, and the SM2 signature verification acceleration ratio can reach 18.2; under the concurrence of 8192 threads, the SM2 signature acceleration ratio can reach 288.1, and the SM2 signature verification acceleration ratio can reach 37.2, so that the effectiveness of the SM2 optimized signature/verification algorithm based on the GPU platform is fully demonstrated.
In the environment, compared with an FPGA platform, the implementation of the XilinxVirtex-6 series chip is used, and under the condition of 8192 thread concurrence, the throughput speed-up ratio of the SM2 signature algorithm can reach 45.3, and the speed-up ratio of the signature verification throughput can reach 5.2, which are both superior to the implementation of the FPGA platform.
The foregoing embodiments may be modified in many different ways by those skilled in the art without departing from the spirit and scope of the invention, which is defined by the appended claims and all changes that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.
Claims (12)
1. A rapid realization method of SM2 digital signature and signature verification based on a GPU is characterized in that preprocessing is carried out on information to be signed or information to be verified at a CPU end to obtain a preprocessing result containing a public key private key, a random number and a compression function SM3 precomputation, GPU initialization and a lookup table, then after the preprocessing result is mapped by the GPU end to a Jacobian weighted projection coordinate system, further carrying out signature processing or signature verification processing of modular operation optimization and compression function optimization.
2. The method for rapidly implementing SM2 digital signature and signature verification based on GPU of claim 1, wherein the preprocessing comprises: at the CPU end, performing GPU initialization and pre-calculation of a public key, a private key, a random number and a compression function SM3 on a first SM2 digital signature task or a second signature verification task so as to facilitate the operation of the subsequent GPU end; and for the signature task, pre-calculating a [ k ] time point lookup table to improve the efficiency of subsequent GPU signature processing.
3. The method for rapidly implementing SM2 digital signature and signature verification based on GPU of claim 1, wherein the signature process is performed by matching look-up table pre-generated by CPU end with GPU end to perform comb signature operation to obtain signature result, and the signature verification process is performed by binary expansion operation at GPU end to obtain signature verification result.
4. The method for rapidly implementing SM2 digital signature and signature verification based on GPU of claim 1, wherein the modular operation optimization processing is as follows: the method comprises the steps of optimizing modular multiplication and modular division operation which consume a large amount of computing resources in a large amount of large integer modular operation performed by a signature/signature verification task, and reducing the computing complexity, wherein the modular multiplication operation optimization adopts Montgomery reduction algorithm and Montgomery multiplication as a substitute optimization algorithm of the modular multiplication operation; the mode of combining Fermat's theorem and expanding Euclidean algorithm is adopted as the substitute optimization algorithm of the modular division operation in the modular division operation optimization.
5. The method for rapidly implementing SM2 digital signature and signature verification based on GPU of claim 1, wherein the signature processing or signature verification processing optimized by the compression function is that: optimizing a hash algorithm SM3 in an SM2 algorithm by adopting a GPU structure optimization technology, and specifically comprising the following steps: the method comprises the steps of instruction optimization and register multiplexing, wherein the instruction optimization optimizes the logic operation and the circular shift operation of SM3 hash algorithm operation adopted in an SM2 algorithm by using a bitselect function and a rotate function which are built in OpenCL; register multiplexing is carried out in a 64-step message expansion stage by SM3 hash algorithm operation, and 64 words are multiplexed by using a register space with 16 words.
6. A system for implementing the GPU-based SM2 digital signature and signature verification quick implementation method of claim 1, comprising: the system comprises an elliptic curve expression equation coordinate system mapping module for reducing the overall operation complexity, a modular operation calculation complexity optimization module for obtaining the optimal operation efficiency of modular multiplication operation and modular division operation, a compression function optimization module for realizing the performance optimization of a message hash algorithm and an elliptic curve multi-point operation optimization module for improving the signature and signature verification operation efficiency, wherein: the elliptic curve expression equation coordinate system mapping module receives data information preprocessed by the CPU end to perform coordinate axis mapping processing on an elliptic curve and output the mapped elliptic curve to the calculation of a subsequent signature/signature verification task, the module operation computation complexity optimization module is responsible for optimizing a large number of large integer module operations contained in the elliptic curve multiple point operation and outputting results of the large integer module operations to the elliptic curve multiple point operation optimization module at a high speed, the compression function optimization module is responsible for optimizing SM3 compression function operations in an SM2 signing/signature task and outputting SM3 compression operation results to the elliptic curve multiple point operation optimization module, the elliptic curve multiple point operation optimization module receives the mapped elliptic curve information to perform optimization processing of the multiple point operations in the SM2 signing/signature task, and the large integer module operation results and the SM3 hash operation results are combined to output final SM2 signing/signature results.
7. The system as claimed in claim 5, wherein the coordinate system mapping module of the elliptic curve expression equation performs a group multiplication operation through a Jacobian weighted projection coordinate system during the coordinate axis mapping process of the elliptic curve, so as to avoid a large number of modular inverse operations under an affine coordinate system, wherein the Jacobian weighted projection coordinate system is: fpThe above elliptic curve equation is simplified to y in the standard projective coordinate system2=x3+axz4+bz6Wherein a, b ∈ FpAnd 4a is3+27b2Not equal to 0 modp; the set of points on the elliptic curve is: e (F)p)={(x,y,z)|x,y,z∈FpAnd satisfy the curve equation y2=x3+axz4+bz6For (x)1,y1,z1) And (x)2,y2,z2) When there is a certain u e FpAnd u ≠ 0, such that: x is the number of1=u2x2,y1=u3y2,z1=uz2Then, the two triples are called as equivalent to represent the same point; an elliptic curve is usually represented by an affine coordinate system under which a set of points E (E) on the elliptic curvep)={(X,Y)|X,Y∈FpAnd satisfies the curve equation Y2=X3+ aX + b, wherein (4 a)2+27b2) Not equal to 0(mod p) }, let X ═ X · z2,y=Y·z3Then, the elliptic curve can be converted from an affine coordinate system to be represented by a Jacobian weighted projection coordinate system; on the contrary, when z is not equal to 0, X is recorded as X/z2,Y=y/z3Then, the Jacobian accentuation projection coordinate system can be converted into an affine coordinate system representation: y is2=X3+ aX + b; meanwhile, when z is 0, a point in the affine coordinate system corresponding to (1,1,0) is the infinity point 0.
8. The system of claim 5, wherein the modular arithmetic computational complexity optimization module comprises: an optimized modular multiplication arithmetic unit and an optimized modular division arithmetic unit, wherein: the optimized modular multiplication operation unit adopts Montgomery reduction algorithm and Montgomery multiplication to carry out optimized operation of modular multiplication operation to obtain a large-integer high-speed modular multiplication operation result; the optimization modular division operation unit adopts a mode of combining a Fermat small theorem and an extended Euclidean algorithm, and when the number of parallel signatures/signature checks is less than 2048, the Fermat small theorem is used for carrying out modular division optimization operation at a CPU end; and when the number of the parallel signatures/signature verifications is more than or equal to 2048, performing modular division optimization operation by using an extended Euclidean algorithm at a GPU end to obtain a large-integer high-speed modular division operation result.
9. The system of claim 5, wherein the compression function optimization module optimizes the hash algorithm SM3 in the SM2 algorithm by using a GPU configuration optimization technique, and the compression function optimization module comprises: an instruction optimization unit and a register multiplexing unit, wherein: the instruction optimization unit optimizes the logic operation and the cyclic shift operation of the SM3 compression function operation by using a bitselect function and a rotate function which are built in OpenCL; the register multiplexing unit adopts a register space of 16 words to multiplex 64 words to a 64-step message expansion stage in SM3 compressed function operation, and space overhead in the SM3 operation process is optimized.
10. The system of claim 5, wherein the elliptic curve multiple point operation optimization module comprises: an elliptic curve fixed point multiplication unit for digital signatures and an elliptic curve unknown point multiplication unit for signature verification, wherein: the elliptic curve fixed point multiplication unit adopts a comb method to calculate to obtain a fixed point multiplication result in the signature process aiming at the [ k ] times of the base point G in the elliptic curve group operation, and the elliptic curve unknown point multiplication unit adopts a binary expansion method to obtain an unknown point multiplication result in the signature verification process aiming at the [ k ] times of the base point G in the elliptic curves of different public keys.
11. The system of claim 5, wherein the comb method is specifically: firstly, 256 fixed point multiplications are calculated, 0 × P,1 × P, 255 × P, the length of a scalar k is set to be L-bit, and then the scalar k is divided into byte strings, namely k ═ k ·7||k6||k5||k4||k3||k2||k1||k0Each segment has a word length of L/8, and only k needs to be calculated during calculationiIn [ k ]]G pre-stored value k in look-up tablei]G, and then accumulating until the byte string k is traversed.
12. The system of claim 5, wherein the binary expansion method is specifically: will base point [ k ] of G]The k in the doubling point problem is converted into a binary byte string k ═ k (k)t-1,...k1,k0)2Let P ═ G and Q ∞ then from k0To kt-1Beginning traversal of binary byte stringsJudging the current k in each traversaliValue if kiQ ← Q + P if 1, or kiQ ← Q when 0, then P ← 2P and traverse to the next ki+1Value, after traversing byte string, there is Q ═ k]G。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110613751.0A CN113221193B (en) | 2021-06-02 | 2021-06-02 | SM2 digital signature and signature verification quick implementation method and system based on GPU |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110613751.0A CN113221193B (en) | 2021-06-02 | 2021-06-02 | SM2 digital signature and signature verification quick implementation method and system based on GPU |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113221193A true CN113221193A (en) | 2021-08-06 |
CN113221193B CN113221193B (en) | 2022-07-29 |
Family
ID=77082340
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110613751.0A Active CN113221193B (en) | 2021-06-02 | 2021-06-02 | SM2 digital signature and signature verification quick implementation method and system based on GPU |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113221193B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113676335A (en) * | 2021-10-21 | 2021-11-19 | 飞天诚信科技股份有限公司 | Method and device for realizing signature in security chip |
CN113783702A (en) * | 2021-09-28 | 2021-12-10 | 南京宁麒智能计算芯片研究院有限公司 | Hardware implementation method and system for elliptic curve digital signature and signature verification |
CN114205085A (en) * | 2021-12-03 | 2022-03-18 | 东北大学 | Optimization processing method of SM2 and transformation method of super book fabric platform |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103475469A (en) * | 2013-09-10 | 2013-12-25 | 中国科学院数据与通信保护研究教育中心 | Method and device for achieving SM2 algorithm with combination of CPU and GPU |
CN103532710A (en) * | 2013-09-26 | 2014-01-22 | 中国科学院数据与通信保护研究教育中心 | Implementation method and device for GPU (Graphics Processing Unit)-based SM2 (Streaming Multiprocessor 2) algorithm |
CN107147488A (en) * | 2017-03-24 | 2017-09-08 | 广东工业大学 | A kind of signature sign test system and method based on SM2 enciphering and deciphering algorithms |
US20180121388A1 (en) * | 2016-11-01 | 2018-05-03 | Nvidia Corporation | Symmetric block sparse matrix-vector multiplication |
CN108063758A (en) * | 2017-11-27 | 2018-05-22 | 众安信息技术服务有限公司 | For the node in the signature verification method of block chain network and block chain network |
CN109600233A (en) * | 2019-01-15 | 2019-04-09 | 西安电子科技大学 | Group ranking mark based on SM2 Digital Signature Algorithm signs and issues method |
CN110086602A (en) * | 2019-04-16 | 2019-08-02 | 上海交通大学 | The Fast implementation of SM3 cryptographic Hash algorithms based on GPU |
WO2019174402A1 (en) * | 2018-03-14 | 2019-09-19 | 西安西电捷通无线网络通信股份有限公司 | Group membership issuing method and device for digital group signature |
CN110365481A (en) * | 2019-07-04 | 2019-10-22 | 上海交通大学 | The optimization of the close SM2 algorithm of state is accelerated to realize system and method |
CN111275605A (en) * | 2018-12-04 | 2020-06-12 | 畅想科技有限公司 | Buffer checker |
CN112187469A (en) * | 2020-09-21 | 2021-01-05 | 浙江省数字安全证书管理有限公司 | SM2 multi-party collaborative digital signature method and system based on key factor |
CN112887081A (en) * | 2020-09-04 | 2021-06-01 | 深圳奥联信息安全技术有限公司 | SM 2-based signature verification method, device and system |
-
2021
- 2021-06-02 CN CN202110613751.0A patent/CN113221193B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103475469A (en) * | 2013-09-10 | 2013-12-25 | 中国科学院数据与通信保护研究教育中心 | Method and device for achieving SM2 algorithm with combination of CPU and GPU |
CN103532710A (en) * | 2013-09-26 | 2014-01-22 | 中国科学院数据与通信保护研究教育中心 | Implementation method and device for GPU (Graphics Processing Unit)-based SM2 (Streaming Multiprocessor 2) algorithm |
US20180121388A1 (en) * | 2016-11-01 | 2018-05-03 | Nvidia Corporation | Symmetric block sparse matrix-vector multiplication |
CN107147488A (en) * | 2017-03-24 | 2017-09-08 | 广东工业大学 | A kind of signature sign test system and method based on SM2 enciphering and deciphering algorithms |
CN108063758A (en) * | 2017-11-27 | 2018-05-22 | 众安信息技术服务有限公司 | For the node in the signature verification method of block chain network and block chain network |
WO2019174402A1 (en) * | 2018-03-14 | 2019-09-19 | 西安西电捷通无线网络通信股份有限公司 | Group membership issuing method and device for digital group signature |
CN111275605A (en) * | 2018-12-04 | 2020-06-12 | 畅想科技有限公司 | Buffer checker |
CN109600233A (en) * | 2019-01-15 | 2019-04-09 | 西安电子科技大学 | Group ranking mark based on SM2 Digital Signature Algorithm signs and issues method |
CN110086602A (en) * | 2019-04-16 | 2019-08-02 | 上海交通大学 | The Fast implementation of SM3 cryptographic Hash algorithms based on GPU |
CN110365481A (en) * | 2019-07-04 | 2019-10-22 | 上海交通大学 | The optimization of the close SM2 algorithm of state is accelerated to realize system and method |
CN112887081A (en) * | 2020-09-04 | 2021-06-01 | 深圳奥联信息安全技术有限公司 | SM 2-based signature verification method, device and system |
CN112187469A (en) * | 2020-09-21 | 2021-01-05 | 浙江省数字安全证书管理有限公司 | SM2 multi-party collaborative digital signature method and system based on key factor |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113783702A (en) * | 2021-09-28 | 2021-12-10 | 南京宁麒智能计算芯片研究院有限公司 | Hardware implementation method and system for elliptic curve digital signature and signature verification |
CN113676335A (en) * | 2021-10-21 | 2021-11-19 | 飞天诚信科技股份有限公司 | Method and device for realizing signature in security chip |
CN113676335B (en) * | 2021-10-21 | 2021-12-28 | 飞天诚信科技股份有限公司 | Method and device for realizing signature in security chip |
CN114205085A (en) * | 2021-12-03 | 2022-03-18 | 东北大学 | Optimization processing method of SM2 and transformation method of super book fabric platform |
Also Published As
Publication number | Publication date |
---|---|
CN113221193B (en) | 2022-07-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113221193B (en) | SM2 digital signature and signature verification quick implementation method and system based on GPU | |
US7904498B2 (en) | Modular multiplication processing apparatus | |
CN113628094B (en) | High-throughput SM2 digital signature computing system and method based on GPU | |
CN109145616B (en) | SM2 encryption, signature and key exchange implementation method and system based on efficient modular multiplication | |
Dai et al. | NTRU modular lattice signature scheme on CUDA GPUs | |
Chen et al. | Faster multiplication for long binary polynomials | |
CN112134704B (en) | Sm2 performance optimization implementing method | |
JP4423900B2 (en) | Scalar multiplication calculation method, apparatus and program for elliptic curve cryptography | |
US20230246806A1 (en) | Efficient masking of secure data in ladder-type cryptographic computations | |
Lin et al. | Efficient parallel RSA decryption algorithm for many-core GPUs with CUDA | |
CN110224829B (en) | Matrix-based post-quantum encryption method and device | |
CN111917548B (en) | Elliptic curve digital signature method based on GPU and CPU heterogeneous structure | |
Kamal et al. | Enhanced implementation of the NTRUEncrypt algorithm using graphics cards | |
JP4692022B2 (en) | Scalar multiplication apparatus and program for elliptic curve cryptography | |
JP2007526513A (en) | Method of element power or scalar multiplication | |
US11954487B2 (en) | Techniques, devices, and instruction set architecture for efficient modular division and inversion | |
JP2011081594A (en) | Data processor and data processing program | |
JP3796867B2 (en) | Prime number determination method and apparatus | |
CN115276960B (en) | Device and method for realizing fast modular inverse chip on SM2 Montgomery domain | |
CN113971015B (en) | UIA2 computing circuit, data processing method, chip, electronic device and storage medium | |
JP2005316038A (en) | Scalar multiple computing method, device, and program in elliptic curve cryptosystem | |
Wu et al. | Modular arithmetic analyses for RSA cryptosystem | |
JP6614979B2 (en) | Encryption apparatus, encryption method, and encryption program | |
JP2006309201A (en) | Multiplex scalar multiplying operation device in elliptic curve cryptosystem, signature verification device, and programs | |
KR101775597B1 (en) | High speed modulo calculation apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |