CN113608718B

CN113608718B - Method for realizing prime number domain large integer modular multiplication calculation acceleration

Info

Publication number: CN113608718B
Application number: CN202110783676.2A
Authority: CN
Inventors: 郑昉昱; 高莉莉; 魏荣; 马原; 王跃武; 范广; 万立鹏
Original assignee: Institute of Information Engineering of CAS
Current assignee: Institute of Information Engineering of CAS
Filing date: 2021-07-12
Publication date: 2024-06-25
Anticipated expiration: 2041-07-12

Abstract

The invention discloses a method for realizing the acceleration of the large integer modular multiplication calculation of prime number domain, dividing the multiplicand and multiplier with the length of k bits in prime number domain into N sections, wherein each section of the former (N-1) section is w bits in length, the Nth section is r bits in length, and w is more than or equal to r; converting each section of the multiplicand and the multiplier into double-precision floating point numbers, multiplying and adding each section of the multiplicand and the multiplier after conversion by adopting a fused multiply-add operation, initializing 2N fixed point numbers, accumulating binary values of the multiplication and addition result into the initialized fixed point numbers, and carrying out bit number reduction on the fixed point numbers to obtain a final modular multiplication result. The invention fully utilizes the format characteristics of the double-precision floating point number, and improves the calculation efficiency of prime number domain modular multiplication.

Description

Method for realizing prime number domain large integer modular multiplication calculation acceleration

Technical Field

The invention belongs to the technical field of calculation, and relates to a method for realizing the acceleration of the large integer modular multiplication calculation of a prime number domain.

Background

With the continuous progress of technology, computer technology is rapidly developed, users have higher requirements on privacy protection, and cryptography is also widely applied to network communication technology. For example, the internet-derived industry, which is directed to a large number of users, e-commerce, software distribution, etc., achieves privacy protection and secure communications over the internet through key agreement and digital signature. Large integer modular multiplication is the core computational load of many asymmetric cryptographic algorithms. The main computational load of the world's mainstream asymmetric cryptographic algorithm ECC (Elliptic Curve Cryptography) is the large integer modular multiplication of the prime number domain. Therefore, the operation speed of the large integer modular multiplication in the prime number domain directly influences the speed of key negotiation and digital signature realization, and the research on the large integer modular multiplication high-performance realization in the prime number domain is very important.

GPUs (Graphics processing units) is very efficient in computer graphics and image processing and is therefore more adept at floating point arithmetic. The computational power of floating point numbers of GPUs has increased more than ten times over the past decade. In addition, the CUDA parallel computing framework developed by NVIDIA corporation allows GPUs computing resources that would otherwise be suitable only for graphics processing computing to also be used to accelerate scientific computing. Many researchers have accelerated the cryptographic primitives of the mainstream using the computational resources of GPUs. For example, pan et al accelerated ECDSA with the fixed point number computing power of GPUs, niall et al accelerated RSA with the double precision floating point number computing power of GPUs, and throughput reached a new peak. In order to adapt to the characteristic of rapid development of the computing capability of GPUs floating point numbers, the invention combines a fused multiply-add instruction based on double-precision floating point numbers and an integer domain arithmetic instruction to accelerate large integer modular multiplication operation of a prime number domain.

The basic data types of the current computer have corresponding fixed word length, large integers cannot be directly represented in the computer through the basic data types, researchers generally split the large integers, represent one large integer by a plurality of basic data types, and calculate the modular multiplication of the large integers by adopting a multi-precision calculation mode.

The double-precision floating point number format used by the invention meets the floating point number standard specified by IEEE 754. A floating point number in the IEEE754 standard consists of sign bits, a step code and a tail code, wherein the tail code comprises 1-bit implied bits and several bit fractional parts. A double-precision floating-point number comprises 1-bit sign bit, 12-bit step code, 1-bit implied bit and 52-bit fractional part, the implied bit is not displayed in a computer.

Disclosure of Invention

The invention provides a method for realizing the acceleration of the calculation of the large integer modular multiplication of a prime number domain, which can fully utilize the double-precision floating point calculation capability of calculation resources and promote the calculation speed of the large integer modular multiplication.

A method for realizing the acceleration of the prime number domain large integer modular multiplication calculation comprises the following steps:

1) Dividing large integers A and B with the length of k bits defined on a prime number domain F _p into N sections, wherein each section of the front (N-1) section is w bits, the Nth section is r bits, and w is more than or equal to r; wherein p is 2 ^k -sigma, sigma is a prime number less than 2 ^w;

2) Converting each segment of the multiplicand A and the multiplier B into double-precision floating point numbers respectively; multiplying and adding the converted multiplicand A and multiplier B by adopting a fused multiply-add operation, and converting an operation result into a certain point number R;

3) Dividing the fixed point number R into 2N segments, and setting the segment length of the front (2N-1) segment of R as w bits under the condition that the value of R is unchanged; reducing R to N-segment fixed point number using multiplication and addition operations With multiplication operations, addition operations and shift operations will/>Partial subtraction beyond k bits such that/>Fixed point number of k bits;

4) Judging Whether or not it is an integer in the selected prime number domain, if/>Is an integer over the selected prime number domain, then/>The modular multiplication result of the large integer A and the large integer B is obtained; if/>Not an integer over the selected prime number domain, will/>Subtracting p as a modular multiplication result of large integer A and large integer B.

Where "large integer" refers to an integer that cannot be represented by only one double-precision floating point number.

Further, the segment lengths of the multiplicand A and the multiplier BWherein 52 is the tail code length of the double-precision floating point number; the bit length w of the former (N-1) segment and the N-th segment of the multiplicand A and the multiplier B satisfy the equation (N-1) x w+r=k, and w-r is made as small as possible in the case that 52 is equal to or larger than w is equal to or larger than r.

Further, after segmenting the multiplicand and multiplier, A [0:N-1] represents N segments of the 0 th to (N-1) of the multiplicand A, A '0:N-1 ] is the floating-point form of A [0:N-1], B [0:N-1] represents N segments of the 0 th to (N-1) of the multiplier, and B' 0:N-1] is the floating-point form of B [0:N-1 ].

Further, the multiplying and adding operation is performed on the converted multiplicand a and multiplier B by using the fused multiply-add operation, including: firstly, initializing 2N fixed point numbers as R [0:2N-1]; secondly, according to a large integer multiplication sequence Sigma _i,j A 'i.B' j of segment scanning, calculating a segment A 'i of a multiplicand A' and a multiplication and addition result M _ij [0] of a multiplier B 'and an addition number C0, and then calculating a segment A' i of the multiplicand A 'and a multiplication and addition result M _ij [1] of a multiplier B' and an addition number C1, wherein 0 is less than or equal to i, and j is less than or equal to N; let the operation of conv_2_bin (x) be a binary form of x, accumulate conv_2_bin (M _ij [0 ]) into fixed point number R [ i+j+1], accumulate conv_2_bin (M _ij [1 ]) into R [ i+j ].

Further, the initialization method of the 2N fixed point numbers R [0:2N-1] is as follows: r < t ] = - [ (t× (0x433+w) + (t+1) ×0x433) &0xFFF ] < 52 when t epsilon [0, N-1], R < t ] = - [ ((t+1) × (0x433+w) +t×0x433) &0xFFF ] < 52 when t epsilon [ N,2N-1 ]. Wherein 0x433 is a hexadecimal form of the offset 1023 plus 52 of the double-precision floating-point number-order code bit. 0xFFF is in hexadecimal form 2 ¹² -1.

Further, the value of the addend C0 is ⁵² +w, and the value of the addend C1 is 2 ^52+w+2⁵²-M_ij [0].

Further, the method for setting the segment length of the first (2N-1) segment of R to w bits is as follows: r _t+1＝R_t+1+(R_t > w), t.epsilon.0, 2N-2. Wherein R _t represents the t+1th segment in R, and R _t+1 represents the t+2nd segment in R.

Further, the R is reduced to N-segment fixed point number by multiplication operation and addition operationComprising the following steps: Post reduction/> The value range of (1) is [0,2 ^k+σ·2^digit-r), wherein digit is the bit length of a double-precision floating point number, and because A and B are large integers, the bit length k is far greater than the bit length of the double-precision floating point number, so/>I.e./>

Further, the method comprises the steps of,Representation/>N segments of (0) to (N-1), described/>Is the carry, according to/>As can be seen from the range of (2), the value of carry is 0 or 1; the utilization of multiplication operations, addition operations and shift operations will/>Partial subtraction beyond k bits such that/>A fixed point number of k bits comprising: first order/> Wherein mask _r is 2 ^r -1; then when t is E [0, N-2], letCarry-subtracted/>The range of the values is as follows: when the carry is 0,When carry is 1,/>Since σ is a small prime number and digit is much smaller than k, carry post-reduction/>The range of values of (C) can be unified as [0,2 ^k -1].

Further, ifLess than prime number p, then/>Multiplying a large integer A and a large integer B and then modulo p; if/>Greater than prime number p, then/>And multiplying the large integer A and the large integer B and then modulo p.

Compared with the prior art, the invention has the following positive effects:

When the invention calculates the large integer modular multiplication of the prime number domain, firstly, the multiplicand and the multiplier are split and converted into a plurality of numerical values of double-precision floating point types, and in the floating point conversion process, the fraction part in the mantissa of the double-precision floating point is fully utilized; the method realizes the large integer modular multiplication of the prime number domain by using the floating point calculation instruction, has novel conception and high calculation efficiency, maximally utilizes the double-precision floating point storage format of a computer, and improves the calculation speed of the large integer modular multiplication.

Drawings

FIG. 1 is a flow chart of a method for accelerating the large integer modular multiplication calculation in the prime number domain by using floating point number calculation instructions.

Detailed Description

The technical scheme of the present invention will be described in detail, but the scope of the present invention is not limited to the embodiments.

For a given prime number domain F _p,p＝2²²¹ -3, A and B are large integers on a prime number domain F _p, when the modulus of p is calculated by multiplying A by B, a floating point number calculation instruction is utilized to realize a prime number domain large integer modulus multiplication calculation acceleration method, which mainly comprises the following steps:

1) Dividing a multiplicand A and a multiplier B with the length of 221 bits into N segments respectively, wherein N=5; wherein, each section of the first 4 sections is 45 bits, and the 5 th section is 41 bits;

2) After segmenting the multiplicand and multiplier, A [0:4] represents the 5 segments of the 0 th to 4 th of the multiplicand A, and B [0:4] represents the 5 segments of the 0 th to 4 th of the multiplier B. Each segment of A [0:4] is converted to the double-precision floating-point form denoted A '[0:4], and each segment of B [0:4] is converted to the double-precision floating-point form denoted B' [0:4].

3) According to the large integer multiplication sequence sigma _i,j A 'i.B' j, i, j E0, 4 of the segment scanning, firstly calculating a segment A 'i of the multiplicand A' and a multiplication and addition result M _ij [0] of a segment B 'j of the multiplier B' and an addition number C0, wherein C0=2 ⁹⁷; then, a segment A 'i of the multiplicand A' and a product M _ij [1] of a segment B 'j of the multiplier B' and the addend C1 are calculated, wherein C1=2 ⁹⁷+2⁵²-M_ij [0].

4) Initializing a fixed point number R, dividing the fixed point number R into 2N segments, and marking the segments as R < 0:2N < -1 >; the initialization mode of R < 0:2N-1 > is as follows:

5) Let the operation of conv_2_bin (x) be a binary form of x, accumulate conv_2_bin (M _ij [0 ]) into fixed point number R [ i+j+1], accumulate conv_2_bin (M _ij [1 ]) into R [ i+j ].

6) Setting the segment length of the first 9 segments of R [0:9] to 45 bits, the setting method is as follows:

R_t+1＝R_t+1+(R_t＞＞45),t∈[0,8]

7) Reducing the 10-segment fixed-point number R to 5-segment fixed-point number by multiplication operation and addition operation The calculation method of (1) is as follows:

I.e.

After reduction ofThe value range of (3) is [0,2 ²²¹+3·2²³).

8)Representation/>N segments of (0) to (N-1), described/>The upper 23 bits of (1) are carry, according to step 7)/>As can be seen from the range of (2), the value of carry is 0 or 1; n-segment fixed point number/>, using multiplication, addition and shift operationsThe reduction is 221 bits. Let/> Then when t is E [0,3], let/>After carry-out operation,/>

9) JudgingIf it is smaller than the prime number p, then/>Multiplying a large integer A and a large integer B and then modulo p; if/>Greater than prime number p, then/>And multiplying the large integer A and the large integer B and then modulo p.

Finally, the relevant parameters are calculated by using the floating point number calculation instruction provided by the invention to realize the large integer modular multiplication calculation acceleration method of the prime number domains through 7 prime number domains commonly used in cryptography, so as to obtain the following table 1.

TABLE 1 grouping Length and segment Length selection for the commonly used prime field

p	k	σ	N	w	r
						2²²¹-3	221	3	5	45	41
2²²²-117	222	117	5	45	42
						2²⁵¹-9	251	9	5	51	47
2²⁵⁵-19	255	19	5	51	51
						2³⁸²-105	382	105	8	48	46
2³⁸³-187	383	187	8	48	47
						2⁴¹⁴-17	414	17	8	52	50

Based on the same inventive concept, another embodiment of the present invention provides an asymmetric cryptographic method, which comprises a prime number domain large integer modulo multiplication calculation, wherein the prime number domain large integer modulo multiplication sampling is calculated by the method of the present invention.

Based on the same inventive concept, another embodiment of the present invention provides an electronic device (computer, server, smart phone, etc.) comprising a memory storing a computer program configured to be executed by the processor, and a processor, the computer program comprising instructions for performing the steps in the inventive method.

Based on the same inventive concept, another embodiment of the present invention provides a computer readable storage medium (e.g., ROM/RAM, magnetic disk, optical disk) storing a computer program which, when executed by a computer, implements the steps of the inventive method.

The above-disclosed embodiments of the present invention are intended to aid in understanding the contents of the present invention and to enable the same to be carried into practice, and it will be understood by those of ordinary skill in the art that various alternatives, variations and modifications are possible without departing from the spirit and scope of the invention. The invention should not be limited to what has been disclosed in the examples of the specification, but rather by the scope of the invention as defined in the claims.

Claims

1. An asymmetric cryptographic method comprising prime field large integer modulo multiplication computation, said prime field large integer modulo multiplication sampling being computed by:

1) A and B are large integers defined on prime number field F _p, p is 2 ^k - σ, σ is a prime number less than 2 ^w; dividing a multiplicand A and a multiplier B with the length of k bits into N sections respectively; wherein each of the front (N-1) sections is w bits, the N th section is r bits, and w is more than or equal to r;

3) Dividing the fixed point number R into 2N sections, and setting w bits for the section length of the front (2N-1) section of R under the condition that the R value is unchanged; reducing R to N-segment fixed point number using multiplication and addition operations With multiplication operations, addition operations and shift operations will/>Partial subtraction beyond k bits such that/>Fixed point number of k bits;

4) Judging Whether or not it is an integer in the selected prime number domain, if/>Is an integer over the selected prime number domain, then/>The modular multiplication result of the large integer A and the large integer B is obtained; if/>Not an integer over the selected prime number domain, will/>Subtracting p as the modular multiplication result of the large integer A and the large integer B;

wherein the segment lengths of the multiplicand A and the multiplier B Wherein 52 is the tail code length of the double-precision floating point number; the bit length w of the (N-1) segment before the multiplicand A and the multiplier B and the bit length r of the N-th segment meet the equation (N-1) x w+r=k, and w-r is made as small as possible under the condition that 52 is more than or equal to w is more than or equal to r.

2. The method of claim 1 wherein, after segmenting the multiplicand and multiplier, A [0:N-1] represents N segments of the multiplicand A from 0 th to (N-1), A '0:N-1 ] is a floating point form of A [0:N-1], B [0:N-1] represents N segments of the multiplier from 0 th to (N-1), and B' 0:N-1] is a floating point form of B [0:N-1 ].

3. The method of claim 2, wherein multiplying and adding the converted multiplicand a, multiplier B using a fused multiply-add operation comprises: firstly initializing fixed point number R, dividing it into 2N segments, and marking it as R0:2N-1; secondly, according to a large integer multiplication sequence Sigma _i,j A 'i.B' j of segment scanning, calculating a segment A 'i of a multiplicand A' and a multiplication and addition result M _ij [0] of a multiplier B 'and an addition number C0, and then calculating a segment A' i of the multiplicand A 'and a multiplication and addition result M _ij [1] of a multiplier B' and an addition number C1, wherein 0 is less than or equal to i, and j is less than or equal to N; let the operation of conv_2_bin (x) be a binary form of x, accumulate conv_2_bin (M _ij [0 ]) into fixed point number R [ i+j+1], accumulate conv_2_bin (M _ij [1 ]) into R [ i+j ].

4. The method of claim 3, wherein initializing the fixed-point number R comprises: r < t ] = - [ (t× (0x433+w) + (t+1) ×0x433) &0xFFF ] < 52 when t epsilon [0, N-1], R < t ] = - [ ((t+1) × (0x433+w) +t×0x433) &0xFFF ] < 52 when t epsilon [ N,2N-1 ].

5. A method as claimed in claim 3, characterized in that the addend C0 has a value of 2 ⁵² +w and the addend C1 has a value of 2 ^52+w+2⁵²-M_ij [0].

6. The method according to claim 1 or 5, wherein the method of setting the segment length of the preceding (2N-1) segment of R to w bits is: r _t+1＝R_t+1+(R_t > w), t.epsilon.0, 2N-2, wherein R _t represents the t+1st segment in R, and R _t+1 represents the t+2nd segment in R.

7. The method of claim 6, wherein the reducing R to N-piece fixed point numbers using a multiplication operation and an addition operationComprising the following steps: /(I)Post reduction/>The value range of (1) is [0,2 ^k+σ·2^digit-r), wherein digit is the bit length of a double-precision floating point number, and because A and B are large integers, the bit length k is far greater than the bit length of the double-precision floating point number, so/>I.e./>

8. The method of claim 7, wherein,Representation/>N segments of (0) to (N-1), recordIs the carry, according to/>As can be seen from the range of (2), the value of carry is 0 or 1; the utilization of multiplication operations, addition operations and shift operations will/>Partial subtraction beyond k bits such that/>A fixed point number of k bits comprising: first order/>Wherein mask _r is 2 ^r -1; then when t is E [0, N-2], let/>Carry-subtracted/>The range of the values is as follows: when carry is 0,/>When carry is 1,/>Since σ is a small prime number and digit is much smaller than k, carry post-reduction/>The range of values of (C) can be unified as [0,2 ^k -1].

9. The method of claim 8, wherein ifLess than prime number p, then/>Multiplying a large integer A and a large integer B and then modulo p; if/>Greater than prime number p, then/>And multiplying the large integer A and the large integer B and then modulo p.