CN114594925A - Efficient modular multiplication circuit suitable for SM2 encryption operation and operation method thereof - Google Patents

Efficient modular multiplication circuit suitable for SM2 encryption operation and operation method thereof Download PDF

Info

Publication number
CN114594925A
CN114594925A CN202210265484.7A CN202210265484A CN114594925A CN 114594925 A CN114594925 A CN 114594925A CN 202210265484 A CN202210265484 A CN 202210265484A CN 114594925 A CN114594925 A CN 114594925A
Authority
CN
China
Prior art keywords
input
output end
input terminal
bits
ext
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210265484.7A
Other languages
Chinese (zh)
Inventor
沈展
陈付龙
谢冬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Normal University
Original Assignee
Anhui Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Normal University filed Critical Anhui Normal University
Priority to CN202210265484.7A priority Critical patent/CN114594925A/en
Publication of CN114594925A publication Critical patent/CN114594925A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/60Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers
    • G06F7/72Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers using residue arithmetic
    • G06F7/722Modular multiplication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/70Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer
    • G06F21/71Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure computing or processing of information
    • G06F21/72Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure computing or processing of information in cryptographic circuits
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Abstract

The invention discloses an efficient modular multiplication operation circuit suitable for SM2 encryption operation, which expands the secondary iteration of the karatsuba algorithm by using the division idea of the karatsuba algorithm, performs large-number multiplication operation in partial parallel and uses a prime number field P recommended in the national cryptographic algorithm256And performing large digital-to-analog multiplication operation. The algorithm obtains multiplication results in 3 periods first and then utilizes P256The characteristic of (1) is to perform reduction operation. In the operation process, a divide and conquer method is used for once expansion, then three 64-bit karatsuba multipliers are used for parallel execution, three partial products can be obtained respectively (for the operation of the partial products, an improved karatsuba algorithm is adopted), and modular reduction operation is carried out after the accumulation and addition of the three parts, so that time and resources are saved. A comparison experiment shows that only 13.45k LUTs are consumed to complete one modular multiplication operation on a 100MHZ Artix-7 development board and the modular multiplication operation is completed within 0.04 us. And optimizing resource consumption and execution time.

Description

Efficient modular multiplication circuit suitable for SM2 encryption operation and operation method thereof
Technical Field
The invention belongs to the technical field of circuit operation, and particularly relates to an efficient modular multiplication circuit suitable for SM2 encryption operation and an operation method thereof.
Background
Elliptic Curve Cryptography (ECC) and RSA cryptography algorithms are two very popular and powerful public key cryptography algorithms. However, at the same security level, the number of key bits for ECC is shorter compared to RSA. The 256-bit ECC algorithm in the prime field has the same security level as the 3072-bit RSA algorithm. In addition, elliptic curve cryptography systems consume fewer hardware resources. The modular multiplication operation is the most time-consuming operation in the encryption process of the elliptic curve, so the speed of the modular multiplication becomes the bottleneck in the encryption operation process of the elliptic curve, and how to accelerate the modular multiplication operation is the key point for improving the encryption speed of the elliptic curve.
The SM2 encryption algorithm is an encryption algorithm with independent intellectual property rights and has great significance for improving the information security of China. Fig. 1 shows four levels of elliptic curve cryptography encryption, and it can be seen that since SM2 encryption is based on elliptic curve cryptography, the lowest level of modulo operation is the basis of the entire encryption operation. The time consumption of modular multiplication operation and modular inverse operation in the four modular operations is far higher than that of other 2 operations, wherein the called frequency of the modular multiplication operation is far higher than that of the modular inverse operation, so how to finish the modular multiplication operation more efficiently is the core of improving the algorithm speed of the SM 2.
In the encryption operation process, the large-digit multiplication operation (a × B mod P) is the bottom layer operation with the most serious time and resource consumption, so that many algorithms adopt a scheme of calculating the product (a × B ═ C) and taking the modulus (C mod P), and generally, comparison and judgment are performed once based on each bit cycle of a binary system, so that the 256-bit large integer can be calculated in 256 periods, and the scheme is the most time-consuming. It has also been proposed that Radix-8 interleaved modular multiplication reduces the number of cycles to 32, but even then, the speed-up effect of the algorithm is not ideal because one dot-plus requires at least 9 large modular multiplications in SM2 encryption.
For this case, it has been proposed to use the prime field value P used in SM2 encryption256To speed up the modulo operation. The modular division operation is changed into the modular addition and subtraction operation, so that the complexity of the algorithm is reduced, and the speed of the addition and subtraction operation is much faster than that of the multiplication operation, so that the modular operation can be completely finished in one period. How to quickly obtain C15~C0(divide C into 16 32-bit Cs15~C0) Is another key problem of modular multiplication designBecause the consumption of multiplication operation on resources and time is far greater than that of addition and subtraction operation, how to perform large number multiplication operation also needs to be reasonably designed, the existing Karatsuba-Ofman algorithm based on one-time expansion needs to use a 129-bit multiplier, and the consumption of the multiplier on resources is exponentially increased along with the increase of multiplication bits, so that the resource consumption of the scheme is too serious.
Disclosure of Invention
The invention provides an efficient modular multiplication circuit suitable for SM2 encryption operation, and aims to balance resource consumption and time consumption.
The invention is realized in this way, a high-efficiency modular multiplication circuit suitable for SM2 encryption operation, the high-efficiency modular multiplication circuit suitable for SM2 encryption operation comprises:
8 one-out-of-three selectors, MUX 1-MUX 8; 2 128-bit subtractors, a subtracter Sub1 and a subtracter Sub2, 2 exclusive-or gates, an exclusive-or gate 1 and an exclusive-or gate 2; 3 64-bit multipliers, MULT 1-MULT 3; 2 64-bit subtractors, SUB1 and SUB 2; 3 expanders EXT 1-EXT 3; 1 128-bit adder, ADD 1; 3 512-bit adders, ADD 2-ADD 4; 1 256-bit adder, ADD5, 1 one-out-of-two selector MUX; 1 register R512 with 512 bits, an addition and subtraction operator 1 with 128 bits, an addition and subtraction operator 2 with 512 bits, a shifter and a modulo subtraction operator.
Input terminal 1 input A of MUX13A2Input terminal 2 input A7A6Input terminal 3 input a3a2Input 1 input B of MUX23B2Input terminal 2 input B7B6Input terminal 3 input b3b2Input 1 input A of MUX31A0Input terminal 2 input A5A4Input terminal 3 input a1a0Input 1 input B of MUX41B0Input terminal 2 input B5B4Input terminal 3 input b1b0Input 1 input A of MUX51A0Input terminal 2 input A5A4Input terminal 3 input a1a0Input 1 input A of MUX63A2Input terminal 2 input A7A6Input terminal 3 input a3a2Input 1 input B of MUX73B2Input terminal 2 input B7B6Input terminal 3 input b3b2Input 1 input B of MUX81B0Input terminal 2 input B5B4Input terminal 3 input b1b0
Input terminal 1 input a of subtractor Sub13A2 A1A0Input terminal 2 input A7A6 A5A4Output end 1 is connected with input ends 3 of MUX5 and MUX3, output end 2 is connected with input ends 3 of MUX6 and MUX1, and output end 3 is connected with exclusive-OR gate 1; input terminal 1 input B of subtractor Sub23B2 B1B0Input terminal 2 input B7B6 B5B4The output end 1 is connected with the input ends 3 of the MUX7 and the MUX2, the output end 2 is connected with the input ends of the MUX8 and the MUX4, the output end 3 is connected with the XOR gate 1, and the output end of the XOR gate 1 is connected with the addition and subtraction arithmetic unit 2;
the output ends of MUX1 and MUX2 are connected with MULT1, the output ends of MUX3 and MUX4 are connected with MULT2, the output ends of MUX5 and MUX6 are connected with SUB1, and the output ends of MUX7 and MUX8 are connected with SUB 2; output ends 1 of SUB1 and SUB2 are connected with MULT3, and output ends 2 of SUB1 and SUB2 are connected with an exclusive-OR gate 2;
an output end 1 of the MULT1 is connected with EXT1, an output end 2 is connected with an adder ADD1, an output end 1 of the MULT2 is connected with EXT2, an output end 2 is connected with an adder ADD1, output ends 1 of the EXT1 and EXT2 are connected with a register ADD2, output ends of the adders ADD1, MULT3 and an exclusive-OR gate 2 are connected with an addition and subtraction arithmetic unit 1, an output end of the addition and subtraction arithmetic unit 1 is connected with EXT3, output ends of the EXT3 and ADD2 are connected with ADD3, output ends of the ADD3 and MUX are connected with the addition and subtraction arithmetic unit 2, an output end 1 of the addition and subtraction arithmetic unit 2 is connected with a register R512, an output end 2 is connected with ADD4, an output end 1 of the register R512 is connected with a modulo subtraction arithmetic unit, an output end 2 is connected with ADD4, an output end 4 is connected with the ADD4 through the adder ADD5 and a shifter, and an output end of the ADD4 is connected with the MUX;
wherein, A and B are respectively multiplier and multiplicand with 256 bits, and A is the same asA7A6A5A4A3A2A1A0,B=B7B6B5B4B3B2B1B0,Ai(7≥i≥0),Bi(7. gtoreq. i.gtoreq.0) are segments of 32-bit word length, a3a2a1a0=A3A2A1A0-A7A6A5A4,b3b2b1b0=B7B6B5B4-B3B2B1B0
The invention is realized in such a way that the operation method based on the efficient modular multiplication circuit suitable for SM2 encryption operation specifically comprises the following steps:
s1, resetting the register at initial initialization stage;
s2, the control signals of 8 one-out-of-three selectors MUX 1-MUX 8 are all 0, namely the input data of the input end 1 is selected and output; the control signals of the three expanders EXT 1-EXT 3 are all 0, namely the expanders EXT 1-EXT 3 expand the input 128 bits, 0 bits, 64 bits, and the alternative selector MUX control signal bit 1, select and output the data input by the selection register R512, and accumulate the operation result to the R512 register;
s3, the control signals of 8 one-out-of-three selectors MUX 1-MUX 8 are all 1, namely the input data of the output input end 3 is selected; the control signals of the three expanders EXT 1-EXT 3 are all 1, namely the expanders EXT 1-EXT 3 expand input 384 bits, 256 bits and 320 bits, an alternative selector MUX control signal bit 1 selects and outputs data input by the selection register R512, and the operation result is accumulated to the R512 register;
s4, the control signals of 8 one-out-of-three selectors MUX 1-MUX 8 are all 2, namely the input data of the input end 2 is selected and output; the control signals of the three expanders EXT 1-EXT 3 are all 2, namely the expanders EXT 1-EXT 3 expand the input with 256 bits, 128 bits and 192 bits, and the alternative selector MUX control signal bit 0 selects and outputs the data input by the adder ADD 4;
and in the states of S5 and MOD, the modular subtraction arithmetic unit finishes modular operation in one period according to the multiplication result.
The invention utilizes the thought of dividing and treating the karatsuba algorithm, expands the secondary iteration of the karatsuba algorithm, carries out the large number multiplication operation in local parallel and utilizes the prime number field P recommended in the national cryptographic algorithm256And performing large digital-to-analog multiplication operation. The algorithm obtains the multiplication result in 3 periods, and then utilizes P256The characteristic of (1) is to perform reduction operation. In the operation process, a divide and conquer method is used for once expansion, then three 64-bit karatsuba multipliers are used for parallel execution, three partial products can be obtained respectively (for the operation of the partial products, an improved karatsuba algorithm is adopted), and modular reduction operation is carried out after the accumulation and addition of the three parts, so that time and resources are saved.
Drawings
Fig. 1 is a schematic diagram of an architecture level of an elliptic curve cipher according to an embodiment of the present invention;
FIG. 2 is a circuit diagram of a 64-bit karatsuba multiplier according to an embodiment of the present invention;
fig. 3 is a schematic diagram illustrating a calculation process of gamma B according to an embodiment of the present invention;
fig. 4 is a schematic diagram of an efficient modular multiplication circuit suitable for SM2 encryption operation according to an embodiment of the present invention;
FIG. 5 is a state transition diagram of a controller according to an embodiment of the present invention;
fig. 6 is an internal structural diagram of a mold reducing module according to an embodiment of the present invention.
Detailed Description
The following detailed description of the embodiments of the present invention will be given in order to provide those skilled in the art with a more complete, accurate and thorough understanding of the inventive concept and technical solutions of the present invention.
Multiplication of (one) large numbers
The Karatsuba algorithm is an effective algorithm for carrying out large integer multiplication, and divides the multiplication operation of a multiplier and a multiplicand participating in the operation into a plurality of partial products with smaller scales on the basis of a divide and conquer idea, wherein the original times of the multiplication operation are reduced from 4 times to 3 times.
For example, two large integers a and B with (2 × W) bits are represented as follows:
A=A1*2W+A0
B=B1*2W+B0
the general multiplication process for a and B is as follows:
A*B=A1B122W+(A1B0+A0B1)2W+A0B0 (1)
as can be seen from equation (1), to obtain a result of a × B, 4 multiplications are required: a. the1B1,A1B0,A0B1,A0B0And A is1B0+A0B1Can be rewritten as formula (2).
A1B0+A0B1=(A0+A1)(B0+B1)-A1B1-A0B0 (2)
This multiplication process is the basic idea of the Karatsuba algorithm, and four multiplications in equation 5 are reduced to three by using addition and subtraction. It is further known that the n-partition karatsuba algorithm can reduce n (n-1)/2 multiplications, and the original multiplication is performed by n2Is reduced to (n)2+ n)/2 times.
For software algorithms, A in equation (2)0+A1The result of the operation may overflow, and the same principle B0+B1Overflow is also possible, so equation (2) is modified as follows:
A1B0+A0B1=(A0-A1)(B0-B1)+A1B1+A0B0 (3)
the implementation of equation (3) needs to be considered0-A1) Sign of operation result, and (B)1-B0) Sign of the operation, which leads to final considerations (A)0-A1)(B1-B0) If this scheme is also adopted in circuit design, devices such as a subtractor, a comparator, a multiplexer and the like are also added, which brings more resource consumption to circuit design, therefore, for the multiplication operation when w is 32, 2 multipliers with 32 bit width and one multiplier with 33 bit width can be adopted to realize the calculation process of formula (2), and 2 results of multiplication with 64 bits are obtained by calculation, which brings advantages that the problem of operation symbols in formula (2) does not need to be considered, as shown in the logic structure diagram 2 of the 64-bit Karatsuba multiplier:
the Karatsuba algorithm is a large number multiplication mode based on divide and conquer, and is more efficient by using a divide and conquer method by utilizing parallel line characteristics of a circuit. Let two 256-bit large integers a and B be represented as follows:
A[255:0]=A7A6A5A4A3A2A1A0
B[255:0]=B7B6B5B4B3B2B1B0
wherein A isi(7≥i≥0),Bi(7 ≧ i ≧ 0) are all 32-bit word-long segments, so the calculation procedure for C ═ A × B is shown in FIG. 3.
For Part1 of FIG. 3, A3A2A1A0And B3B2B1B0Two 128-bit numbers are multiplied, and a one-time karatsuba expansion can be used, transformed as follows:
Part1=A3A2*B3B2*2128+[(A1A0-A3A2)*(B3B2-B1B0)+A1A0*B1B0+A3A2*B3B2]*264+A1A0*B1B0
the expansion uses 3 64-bit karatsuba multipliers as shown in fig. 2 to achieve two 128-bit number multiplications. To use the same 64-bit basis karatsuba multiplier, Part1 is expandedThe formula adopts the improved development form (A) in the formula (3)1A0-A3A2And B3B2-B1B0None of which exceeds 64 bits).
Similarly, for Part3 of FIG. 3, A7A6A5A4And B7B6B5B4Two 128-bit numbers are multiplied, using the same expansion as described above:
Part3=A7A6*B7B6*2128+(A5A4-A7A6)*(B7B6-B5B4)+A7A6*B7B6+A5A4*B5B4)*264+A5A4*B5B4
for Part2 of FIG. 3, the following results were obtained by first performing the algorithm of karatsuba once:
Part2=(A3A2A1A0-A7A6A5A4)*(B7B6B5B4-B3B2B1B0)+Part1+Part3
let a be a3a2a1a0=A3A2A1A0-A7A6A5A4;b=b3b2b1b0=B7B6B5B4-B3B2B1B0
Here, the signs of a and b are determined to obtain the signs of the operation result a x b. Substituting a, b into Part2, resulting in the expression Part2 ', Part 2' is expressed as follows:
Part2’=a3a2a1a0*b3b2b1b0+Part1+Part3
for a in Part23a2a1a0*b3b2b1b0The part is developed by the secondary karatsuba algorithm, and the expression is as follows:
Part2’=a3a2*b3b2*2128+((a1a0-a3a2)*(b3b2-b1b0)+a1a0*b1b0+b3b2*b3b2)*264+a1a0*b1b0+Part1+Part3
part1, Part 2', Part3 are merged to obtain a final result C, wherein the expression of C is as follows:
C=A*B=Part3*2256+Part2’*2128+Part1
the 64-bit multiplication adopts a formula (2), namely the operation structure shown in figure 1, and the simplest scheme of a basic operation unit is realized; the three 128-bit operations of Part1, Part 2' and Part3 adopt the improved scheme of formula (3) to realize the multiplexing of a 64-bit karatsuba multiplier. The combined use of the two formulas can achieve the optimization of resource consumption.
Modulo two arithmetic
Using the prime field value P used in SM2 encryption256To speed up the modulo operation. P256Can be expressed as a sum or a difference of powers of 2, so that the prime field P can be expressed256Conversion to P256=2256-2224-296+2641 fast reduction form, so that the formula of the modulo result of the higher power of 2 can be derived as follows:
2256(mod P256)≡2224+296-264+1(mod P256);
2288(modP256)≡2256+2128-296+232(mod P256)≡2224+2128-264+232+1(mod P256);
2320(mod P256)≡2256+2160-296+264+232(mod P256)
≡2224+2160+232+1(mod P256);
2352(mod P256)≡2256+2192+264+232(mod P256)
≡2224+2192+296+232+1(mod P256);
2384(mod P256)≡2256+2224+2128+264+232(mod P256)
≡2*2224+2128+296+232+1(mod P256);
2416(mod P256)≡2*2256+2160+2128+264+232(mod P256)
≡2*2224+2160+2128+2*296-264+232+2(mod P256);
2448(mod P256)≡2*2256+2192+2160+2*2128-296+264+2*232(mod P256)
≡2*2224+2192+2160+2*2128+296-264+2*232+2(mod P256);
2480(mod P256)≡2*2256+2224+2192+2*2160+2128-296+2*264+2*232(mod P256)
≡3*2224+2192+2*2160+2128+296+2*232+2(mod P256);
the operation result C for a × B can be written in the following expression form:
C=C15*2480+C14*2448+C13*2416+C12*2384+C11*2352+C10*2320+C9*2288+C8*2256+C7*2224+C6*2192+C5*2160+C4*2128+C3*296+C2*264+C1*232+C0
by grouping the above equations, the C mod P can be calculated as follows256The algorithm of (1):
C mod P256
3*C15*2224+C15*2192+2*C15*2160+C15*2128+C15*296+2*C15*232+2*C15+2*C14*2224+C14*2192+C14*2160+2*C14*2128+C14*296C14*264+2*C14*232+2*C14+2*C13*2224+C13*2160+C13*2128+2*C13*296C13*264+C13*232+2*C13+2*C12*2224+C12*2128+C12*296+C12*232+C12+C11*2224+C11*2192+C11*296+C11*232+C11+C10*2224+C10*2160+C10*232+C10+C9*2224+C9*2128+C9*264+C9*232+C9+C8*2224+C8*296+C8*264+C8+C7*2224+C6*2192+C5*2160+C4*2128+C3*296+C2*264+C1*232+C0(mod P256);
using C for 256-bit C value16 C15....C1 C0Denotes each Ci(15 ≧ i ≧ 0) are all segments of 32-bit word length.
Defining the expression of 256-bit integers S1-S14 as follows:
S1=(C7,C6,C5,C4,C3,C2,C1,C0);S2=(C8,C11,C10,C9,C8,0,C13,C12);
S3=(C9,0,0,0,C15,0,C9,C8);S4=(C10,0,0,C15,C14,0,C10,C9);
S5=(C15,C14,C13,C12,C11,0,C12,C11);S6=(C11,C15,C14,C13,C12,0,C11,C10);
S7=(C12,0,0,0,0,0,0,0);S8=(C13,0,0,0,0,0,0,C13);
S9=(C14,0,0,0,0,0,C14,C14);S10=(C15,0,C15,C14,C13,0,C15,C15);
S11=(0,0,0,0,0,C8,0,0);S12=(0,0,0,0,0,C9,0,0);
S13=(0,0,0,0,0,C13,0,0);S14=(0,0,0,0,0,C14,0,0)
the expressions S1-S14 are the same as the expression of S1, and S1 is taken as an example for explanation:
S1=C7*2224+C6*2192+C5*2160+C4*2128+C3*296+C2*264+C1*232+C0
the return value is:
Result=(S1+S2+S3+S4+S5+S6+2S7+2S8+2S9+2S10-S11-S12-S13-S14)mod P256
(III) with P256Modulo large digital-to-analog multiplier data path design
As can be seen from the analysis of equations (1), (2) and (3), each expression needs 3 64-bit multiplications, so that the final result C of the large number multiplication needs 9 64-bit multipliers, and if the 64-bit karatsuba multiplier designed in fig. 2 is used, the result is equivalent to 27 32-bit multipliers. If the operation times are calculated according to the traditional 8-point karatsuba algorithm, the use times of the 32-bit multiplier are known to be (8)2The +8)/2 is 36, which shows that the circuit design method provided by the invention greatly reduces the times of multiplication operations, and the multiplication operations are the operations which consume time and resources most in large-number multiplication, and the optimization of the multiplication operations can achieve the maximum optimization effect on reducing the operation time and resource consumption.
Further analysis shows that the operation of A and B can be completed in one time, the method can be completed in only one period, the consumed time is the shortest, but the resource consumption is extremely high, and 9 64-bit basic karatsuba multipliers are needed to work simultaneously. The method can also be completed in a multi-period mode in a periodic mode, the number of multi-period schemes is 2, a 9-period scheme is adopted, only one 64-bit basic karatsuba multiplier is needed, and the scheme has the minimum resource consumption but the maximum time consumption; with the 3-cycle scheme, 3 64-bit basic karatsuba multipliers are needed, and resource consumption and time consumption are balanced. TABLE 1 shows
TABLE 1 comparison of resource consumption and time consumption
Figure BDA0003551624900000101
Comparing the three schemes, it can be seen that, compared to the most resource-saving 9-cycle scheme, if the 3-cycle scheme is adopted, the time consumption is reduced from 9 cycles to 3 cycles, the cycle number is reduced by 6, the consumption of the 32-bit multiplier is increased by 6, and the increase/decrease ratio is 6/6 to 1; if the 1-cycle scheme is adopted, the time consumption is reduced from 9 cycles to 1 cycle, the cycle number is reduced by 8, the 32-bit multiplier consumption is increased by 24, and the increasing ratio is 8/24 to 0.33, so the cost efficiency is poor. Therefore, the 3-period scheme is balanced, and the resource consumption and the time consumption can be effectively considered.
Fig. 4 is a schematic structural diagram of an efficient modular multiplication circuit suitable for SM2 encryption operation according to an embodiment of the present invention, which is only shown in relevant parts according to an embodiment of the present invention for convenience of description, and the efficient modular multiplication circuit suitable for SM2 encryption operation includes:
8 one-out-of-three selectors, MUX 1-MUX 8,
input terminal 1 input A of MUX13A2Input terminal 2 input A7A6Input terminal 3 input a3a2
Input terminal 1 input B of MUX23B2Input terminal 2 input B7B6Input terminal 3 input b3b2
Input terminal 1 input A of MUX31A0Input terminal 2 input A5A4Input terminal 3 input a1a0
Input terminal 1 input B of MUX41B0Input terminal 2 input B5B4Input terminal 3 input b1b0
Input terminal 1 input A of MUX51A0Input terminal 2 input A5A4Input terminal 3 input a1a0
Input terminal 1 input A of MUX63A2Input terminal 2 input A7A6Input terminal 3 input a3a2
Input terminal 1 input B of MUX73B2Input terminal 2 input B7B6Input terminal 3 input b3b2
Input terminal 1 input B1B of MUX80Input terminal 2 input B5B4Input terminal 3 input b1b0
2 128 bit subtracters, a subtracter Sub1 and a subtracter Sub2, 2 exclusive or gates, an exclusive or gate 1 and an exclusive or gate 2;
input terminal 1 input a of subtractor Sub13A2 A1A0Input terminal 2 input A7A6 A5A4The output end 1 is connected with the input ends 3 of the MUX5 and the MUX3, the output end 2 is connected with the input ends 3 of the MUX6 and the MUX1, and the output end 3 is connected with the exclusive-OR gate 1;
input terminal 1 input B of subtractor Sub23B2B1B0Input terminal 2 input B7B6 B5B4The output end 1 is connected with the input ends 3 of the MUX7 and the MUX2, the output end 2 is connected with the input ends of the MUX8 and the MUX4, the output end 3 is connected with the XOR gate 1, and the output end of the XOR gate 1 is connected with the addition and subtraction arithmetic unit 2;
3 64-bit multipliers, MULT 1-MULT 3; 2 64-bit subtractors, SUB1 and SUB 2;
the output ends of MUX1 and MUX2 are connected with MULT1, the output ends of MUX3 and MUX4 are connected with MULT2, the output ends of MUX5 and MUX6 are connected with SUB1, and the output ends of MUX7 and MUX8 are connected with SUB 2; output ends 1 of SUB1 and SUB2 are connected with MULT3, and output ends 2 of SUB1 and SUB2 are connected with an exclusive-OR gate 2;
3 expanders EXT 1-EXT 3; 1 128-bit adder, ADD 1; 3 512-bit adders, ADD 2-ADD 4; 1 one-out-of-two selector, MUX;
the output end 1 of MULT1 is connected with EXT1, the output end 2 is connected with an adder ADD1, the output end 1 of MULT2 is connected with EXT2, the output end 2 is connected with an adder ADD1, the output ends 1 of EXT1 and EXT2 are connected with a register ADD2, the output ends of adders ADD1, MULT3 and an XOR gate 2 are connected with an addition and subtraction arithmetic unit 1, the output end of the addition and subtraction arithmetic unit 1 is connected with EXT3, the output ends of EXT3 and ADD2 are connected with ADD3, the output ends of ADD3 and MUX are connected with the addition and subtraction arithmetic unit 2, the output end 1 of the addition and subtraction arithmetic unit 2 is connected with a register R512, the output end 2 is connected with ADD4,
1 register R512 with 512 bits, an addition and subtraction arithmetic unit 1 with 128 bits, an addition and subtraction arithmetic unit 2 with 512 bits, 1 adder with 256 bits, ADD5, a shifter and a modulo subtraction arithmetic unit;
the output end 1 of the register R512 is connected with the modulo reduction arithmetic unit, the output end 2 is connected with ADD4, the output end 3 is connected with MUX, the output end 4 is connected with ADD4 through an adder ADD5 and a shifter, and the output end of ADD4 is connected with MUX.
The subtracter subtracts two input paths of data, the adder adds the two input paths of data, the multiplier multiplies the two input paths of data, the one-out-of-three selector selects one input path of data to output, the one-out-of-two selector selects one input path of data to output, and the expander expands data bits of the input data; an adder-subtractor for performing addition when the input signal is 0 and subtraction when the input signal is 1, an exclusive-or gate for outputting the signal 0 when the input data is positive or negative at the same time and outputting the signal 1 when the input data is positive or negative, a register for storing the input data, and a modulo-subtractor composed of a plurality of modulo-adders whose internal structures are shown in fig. 6, wherein s1 to s14 in the figure are s1 to s14 of the return value Result in the modulo operation, and a shifter for shifting the input data to 128 bits high.
In the embodiment of the present invention, the multi-cycle controller part for generating the control signal is shown in fig. 5, and the operation process of the efficient modular multiplication circuit for SM2 encryption operation is specifically as follows:
s1, clearing the register at the initial initialization stage;
s2, Part of calculating Part 1: the control signals of the 8 one-out-of-three selectors MUX 1-MUX 8 are all 0, namely, the input data of the input end 1 is selected and output; the control signals of the three expanders EXT 1-EXT 3 are all 0, namely the expanders EXT 1-EXT 3 expand the input 128 bits, 0 bits, 64 bits, and the alternative selector MUX control signal bit 1, select and output the data input by the selection register R512, and accumulate the operation result to the R512 register;
s3, Part of calculating Part 3: the control signals of the 8 one-out-of-three selectors MUX 1-MUX 8 are all 1, namely, the input data of the output input end 3 is selected; the control signals of the three expanders EXT 1-EXT 3 are all 1, namely the expanders EXT 1-EXT 3 expand input 384 bits, 256 bits and 320 bits, an alternative selector MUX control signal bit 1 selects and outputs data input by the selection register R512, and the operation result is accumulated to the R512 register;
s4, calculating Part 2: the control signals of the 8 one-out-of-three selectors MUX 1-MUX 8 are all 2, namely, the input data of the input end 2 is selected and output; the control signals of the three expanders EXT1 to EXT3 are all 2, that is, the expanders EXT1 to EXT3 expand the input with 256 bits, 128 bits and 192 bits, and the alternative selector MUX control signal bit 0 selects and outputs the data input by the adder ADD4, and the result is an accumulated value (Part1+ Part3) which is shifted to the left by 128 bits and then partially accumulated in the register R512 (the upper 256 bits in R512 are Part3, and the lower 256 bits are Part 1).
In the states of S5 and MOD, the modulo reduction operator completes the modulo operation in one cycle according to the multiplication result, the calculation of the Part1, the calculation of the Part3, the calculation of the Part2, and the modulo operation cycle are all one cycle, and four cycles are required for completing the whole modulo multiplication operation.
The invention can complete modular multiplication operation in 4 periods, consumes 0.04us on an Artix-7 hardware platform and consumes 13.45k LUTs. The optimization of resource consumption and the optimization of time are basic principles of circuit design, for comparison with other schemes, the product of the resource consumption quantity and the time is calculated, then for comparison, the main frequency is unified at 100Mhz, so that each operation result is multiplied by the main frequency of a hardware platform under the scheme, and then divided by 100, obviously, the smaller the value is, the better the performance is. By contrast, this scheme is significantly superior to other schemes.
Table 2 comparison of other protocols
Figure BDA0003551624900000141
Note that: scheme 1: liu Yang. national cryptographic algorithm SM2 cipher logic accelerator design and implementation [ D ]. Anhui university, 2021. scheme 2: marzouqi H, Al-Qutayr M, Salah K.A High-Speed FPGA Implementation of an RSD-Based ECC Processor [ J ]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems,2016,24(1): 151-: md S H, Yinan K, FPGA-based electronic modulation for electrolytic Current Cryptographic [ C ].2015International electronic communication Networks and Applications reference (ITNAC), Sydney, NSW,2015: 191-195-scheme 4: khalid J, Wang Xiaojun, Mike S, High performance hardware support for encapsulating curved cryptography over general prime field [ J ]. Microprocessors and microspheres, 2017, Volume 51: 331-: ali S Y, Khalid J, Shoaib A, et al.reduced Signal based High Speed electric concrete cryptographical Processor [ J ]. Journal of Circuits Systems & Computers,2018: s0218126619500816. scheme 6: ali S Y, Khalid J, Shoaib a, et al.a high-speed RSD-based flexible ECC processor for the allocation of circuits over general field J. 1858-1878 scheme 7: islam M M, Hossain M S, Shahjalal M, et al, area-Time Efficient Hardpower Implementation of Modular multiplexing for elastic Current Cryptography [ J ]. IEEE Access,2020, vol.8:73898 + 73906. scheme 8: kudithi T, Potdar M, Saktive R.Radix-4 Interleaved modulation Applications [ C ].2019International Conference on Vision algorithms in Communication and Networking (ViTECON), Vellore, India,2019:1-5. scheme 9: T.Zhang, J.Zhu, Y.Liu and F.Chen, The Novel efficiency Dual-field FIPS modulation [ J ], Internet Transactions on and Information Systems,2020, vol.14, No.2:738 756-.
The invention utilizes the thought of dividing and treating the karatsuba algorithm, expands the secondary iteration of the karatsuba algorithm, carries out the large number multiplication operation in local parallel and utilizes the prime number field P recommended in the national cryptographic algorithm256And performing large digital-to-analog multiplication operation. The algorithm obtains multiplication results in 3 periods first and then utilizes P256The characteristic of (1) is to perform reduction operation. In the operation process, a divide and conquer method is used for once expansion, then three 64-bit karatsuba multipliers are used for parallel execution, three partial products can be obtained respectively (for the operation of the partial products, an improved karatsuba algorithm is adopted), and modular reduction operation is carried out after the accumulation and addition of the three parts, so that time and resources are saved. A comparison experiment shows that only 13.45kLUTs are consumed to complete one modular multiplication operation on a 100MHZ Artix-7 development board and the operation is completed within 0.04 us. And optimizing resource consumption and execution time.
The invention has been described by way of example, and it is to be understood that its specific implementation is not limited to the details of construction and arrangement shown, but is within the scope of the invention.

Claims (2)

1. An efficient modular multiplication circuit suitable for SM2 encryption operations, the efficient modular multiplication circuit suitable for SM2 encryption operations comprising:
8 one-out-of-three selectors, MUX 1-MUX 8; 2 128-bit subtractors, a subtracter Sub1 and a subtracter Sub2, 2 exclusive-or gates, an exclusive-or gate 1 and an exclusive-or gate 2; 3 64-bit multipliers, MULT 1-MULT 3; 2 64-bit subtractors, SUB1 and SUB 2; 3 expanders EXT 1-EXT 3; 1 128-bit adder, ADD 1; 3 512-bit adders, ADD 2-ADD 4; 1 256-bit adder, ADD5, 1 one-out-of-two selector MUX; 1 register R512 with 512 bits, an addition and subtraction operator 1 with 128 bits, an addition and subtraction operator 2 with 512 bits, a shifter and a modulo subtraction operator.
Input terminal 1 input A of MUX13A2Input terminal 2 input A7A6Input terminal 3 input a3a2Input 1 input B of MUX23B2Input terminal 2 input B7B6Input terminal 3 input b3b2Input 1 input A of MUX31A0Input terminal 2 input A5A4Input terminal 3 input a1a0Input 1 input B of MUX41B0Input terminal 2 input B5B4Input terminal 3 input b1b0Input 1 input A of MUX51A0Input terminal 2 input A5A4Input terminal 3 input a1a0Input 1 input A of MUX63A2Input terminal 2 input A7A6Input terminal 3 input a3a2Input 1 input B of MUX73B2Input terminal 2 input B7B6Input terminal 3 input b3b2Input 1 input B of MUX81B0Input terminal 2 input B5B4Input terminal 3 input b1b0
Input terminal 1 input a of subtractor Sub13A2 A1A0Input terminal 2 input A7A6 A5A4Output end 1 is connected with input ends 3 of MUX5 and MUX3, output end 2 is connected with input ends 3 of MUX6 and MUX1, and output end 3 is connected with exclusive-OR gate 1; input terminal 1 input B of subtractor Sub23B2 B1B0Input terminal 2 input B7B6 B5B4The output end 1 is connected with the input ends 3 of the MUX7 and the MUX2, the output end 2 is connected with the input ends of the MUX8 and the MUX4, the output end 3 is connected with the XOR gate 1, and the output end of the XOR gate 1 is connected with the addition and subtraction arithmetic unit 2;
the output ends of MUX1 and MUX2 are connected with MULT1, the output ends of MUX3 and MUX4 are connected with MULT2, the output ends of MUX5 and MUX6 are connected with SUB1, and the output ends of MUX7 and MUX8 are connected with SUB 2; output ends 1 of SUB1 and SUB2 are connected with MULT3, and output ends 2 of SUB1 and SUB2 are connected with an exclusive-OR gate 2;
an output end 1 of the MULT1 is connected with EXT1, an output end 2 is connected with an adder ADD1, an output end 1 of the MULT2 is connected with EXT2, an output end 2 is connected with an adder ADD1, output ends 1 of the EXT1 and EXT2 are connected with a register ADD2, output ends of the adders ADD1, MULT3 and an exclusive-OR gate 2 are connected with an addition and subtraction arithmetic unit 1, an output end of the addition and subtraction arithmetic unit 1 is connected with EXT3, output ends of the EXT3 and ADD2 are connected with ADD3, output ends of the ADD3 and MUX are connected with the addition and subtraction arithmetic unit 2, an output end 1 of the addition and subtraction arithmetic unit 2 is connected with a register R512, an output end 2 is connected with ADD4, an output end 1 of the register R512 is connected with a modulo subtraction arithmetic unit, an output end 2 is connected with ADD4, an output end 4 is connected with the ADD4 through the adder ADD5 and a shifter, and an output end of the ADD4 is connected with the MUX;
wherein A and B are a multiplier and multiplicand of 256 bits respectively, and A ═ A7A6A5A4A3A2A1A0,B=B7B6B5B4B3B2B1B0,Ai(7≥i≥0),Bi(7 is more than or equal to i and more than or equal to 0) are all segments with the word length of 32 bits,a3a2a1a0=A3A2A1A0-A7A6A5A4,b3b2b1b0=B7B6B5B4-B3B2B1B0
2. the operation method of the efficient modular multiplication circuit suitable for SM2 encryption operation according to claim 1, wherein the method specifically comprises the following steps:
s1, resetting the register at initial initialization stage;
s2, the control signals of 8 one-out-of-three selectors MUX 1-MUX 8 are all 0, namely the input data of the input end 1 is selected and output; the control signals of the three expanders EXT 1-EXT 3 are all 0, namely the expanders EXT 1-EXT 3 expand the input 128 bits, 0 bits, 64 bits, and the alternative selector MUX control signal bit 1, select and output the data input by the selection register R512, and accumulate the operation result to the R512 register;
s3, the control signals of 8 one-out-of-three selectors MUX 1-MUX 8 are all 1, namely the input data of the output input end 3 is selected; the control signals of the three expanders EXT 1-EXT 3 are all 1, namely the expanders EXT 1-EXT 3 expand input 384 bits, 256 bits and 320 bits, an alternative selector MUX control signal bit 1 selects and outputs data input by the selection register R512, and the operation result is accumulated to the R512 register;
s4, the control signals of 8 one-out-of-three selectors MUX 1-MUX 8 are all 2, namely the input data of the input end 2 is selected and output; the control signals of the three expanders EXT 1-EXT 3 are all 2, namely the expanders EXT 1-EXT 3 expand the input with 256 bits, 128 bits and 192 bits, and the alternative selector MUX control signal bit 0 selects and outputs the data input by the adder ADD 4;
and in the states of S5 and MOD, the modular subtraction arithmetic unit finishes modular operation in one period according to the multiplication result.
CN202210265484.7A 2022-03-17 2022-03-17 Efficient modular multiplication circuit suitable for SM2 encryption operation and operation method thereof Pending CN114594925A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210265484.7A CN114594925A (en) 2022-03-17 2022-03-17 Efficient modular multiplication circuit suitable for SM2 encryption operation and operation method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210265484.7A CN114594925A (en) 2022-03-17 2022-03-17 Efficient modular multiplication circuit suitable for SM2 encryption operation and operation method thereof

Publications (1)

Publication Number Publication Date
CN114594925A true CN114594925A (en) 2022-06-07

Family

ID=81810088

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210265484.7A Pending CN114594925A (en) 2022-03-17 2022-03-17 Efficient modular multiplication circuit suitable for SM2 encryption operation and operation method thereof

Country Status (1)

Country Link
CN (1) CN114594925A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117896067A (en) * 2024-03-13 2024-04-16 杭州金智塔科技有限公司 Parallel modular reduction method and device suitable for SM2 cryptographic algorithm

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117896067A (en) * 2024-03-13 2024-04-16 杭州金智塔科技有限公司 Parallel modular reduction method and device suitable for SM2 cryptographic algorithm

Similar Documents

Publication Publication Date Title
Öztürk et al. Low-power elliptic curve cryptography using scaled modular arithmetic
Erdem et al. A general digit-serial architecture for montgomery modular multiplication
Li et al. High-Performance Pipelined Architecture of Elliptic Curve Scalar Multiplication Over GF (${2}^{m} $)
Chung et al. A high-performance elliptic curve cryptographic processor over GF (p) with SPA resistance
CN115344237B (en) Data processing method combining Karatsuba and Montgomery modular multiplication
Tian et al. Ultra-fast modular multiplication implementation for isogeny-based post-quantum cryptography
US7046800B1 (en) Scalable methods and apparatus for Montgomery multiplication
Hossain et al. Efficient fpga implementation of modular arithmetic for elliptic curve cryptography
CN114594925A (en) Efficient modular multiplication circuit suitable for SM2 encryption operation and operation method thereof
Li et al. Research in fast modular exponentiation algorithm based on FPGA
Kodali et al. Fpga implementation of 160-bit vedic multiplier
Hu et al. Low-power reconfigurable architecture of elliptic curve cryptography for IoT
Mahapatra et al. RSA cryptosystem with modified Montgomery modular multiplier
Ratnaparkhi et al. Lead: Logarithmic exponent approximate divider for image quantization application
Namin et al. Power efficiency of digit level polynomial basis finite field multipliers in GF (2 283)
Wen et al. A Length-Scalable Modular Multiplier Implemented with Multi-bit Scanning
CN115276960B (en) Device and method for realizing fast modular inverse chip on SM2 Montgomery domain
Yan et al. Modified modular inversion algorithm for vlsi implementation
Shiyang et al. A Time-Area-Efficient and Compact ECSM Processor over GF (p)
Abdul-Hadi et al. Performance evaluation of scalar multiplication in elliptic curve cryptography implementation using different multipliers over binary field GF (2233)
Varma et al. Design a low-latency novel fpga based signed multiplier for communication applications
Wang et al. High radix montgomery modular multiplier on modern fpga
Yang Implementation of RSA Based on Modified Montgomery Modular Multiplication Algorithm
Kodali et al. Implementations of Sunar-Koc multiplier using FPGA platform and wsn node
Abd-Elkader et al. A compact FPGA-based montgomery modular multiplier

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination