CN111722833B

CN111722833B - SM2 algorithm parallel modular multiplier

Info

Publication number: CN111722833B
Application number: CN202010557989.1A
Authority: CN
Inventors: 陈付龙; 刘扬; 李宗平; 张亭亭; 谢冬; 沈展; 齐学梅; 程桂花; 徐晟�
Original assignee: Anhui Normal University
Current assignee: Anhui Normal University
Priority date: 2020-06-18
Filing date: 2020-06-18
Publication date: 2023-06-02
Anticipated expiration: 2040-06-18
Also published as: CN111722833A

Abstract

The embodiment of the invention provides a parallel SM2 algorithm modular multiplier, belonging to the technical field of data encryption calculation. The modular multiplier comprises: at least one modified multiplier and a finite state machine controller. The SM2 algorithm parallel modular multiplier provided by the invention executes parallel multiplication by adopting an improved multiplier consisting of a preprocessing circuit, a multiplexer, a multiplier and a post-processing circuit, and executes modular multiplication and reduction segmentation in parallel by utilizing the characteristic that a large number multiplication karatsuba algorithm and a national password administration recommended prime number P256 reduction algorithm are based on divide and conquer, so that the speed is improved, and the consumption of resources is reduced.

Description

SM2 algorithm parallel modular multiplier

Technical Field

The invention relates to the technical field of data encryption computation, in particular to an SM2 algorithm parallel modular multiplier.

Background

With the popularization of computers and the development of internet technology, society has entered an informatization age, and information communication plays an increasingly important role, so that it is a primary concern for users that information security is not stolen. In order to ensure the security of the information during the transmission of the common channel, the information needs to be encrypted. The advantage of the key shortness of the public key cryptographic algorithm makes the related public key cryptographic algorithm more and more popular, and the elliptic cryptographic algorithm is an algorithm with the characteristics of short key, small calculated amount and high speed under the same security level compared with other public key algorithms in the public key cryptographic algorithm. The SM2 algorithm is an elliptic curve encryption algorithm which is independently researched and developed in China, the hardware implementation is simpler, the method is suitable for some scenes with high requirements on speed, and meanwhile, the domestic cryptographic algorithm SM2 has significance of updating and popularizing the current public key cryptographic algorithm. Therefore, the domestic cryptographic algorithm SM2 has wide market prospect.

The large digital-to-analog multiplication operation is used as one of the core operations of the elliptic curve algorithm, and affects the efficiency of algorithm execution. The improvement of the speed of modular multiplication operation or the reduction of the consumption of resources while ensuring the speed has profound significance to the advanced execution of the algorithm, and the search for better modular multiplication implementation is also a hotspot problem for research by various research institutions.

Disclosure of Invention

The invention aims to provide an SM2 algorithm parallel modular multiplier which can complete modular multiplication operation of a large number under the condition of reducing resource consumption.

To achieve the above object, an embodiment of the present invention provides an SM2 algorithm parallel modulo multiplier, the modulo multiplier including:

at least one modified multiplier, the modified multiplier comprising:

a preprocessing circuit with a first end for receiving an input value A _i A second end for receiving an input value A _j The third terminal is used for receiving an input value B _i The fourth terminal is used for receiving the input value B _j ；

The first multiplexer is connected with the first end of the preprocessing circuit, the second end of the first multiplexer is connected with the fifth end of the preprocessing circuit, the third end of the first multiplexer is connected with the sixth end of the preprocessing circuit, and the fourth end of the first multiplexer is connected with the fourth end of the preprocessing circuit;

a multiplier, a first end of which is connected with a fifth end of the first multiplexer, and a second end of which is connected with a sixth end of the first multiplexer;

the first end of the post-processing circuit is connected with the third end of the multiplier, and the second end is used for outputting an output value A _i B _j +A _j B _i The third end is connected with the seventh end of the preprocessing circuit, and the fourth end is connected with the eighth end of the preprocessing circuit;

a second multiplexer, a first end of which is connected to a node between the multiplier and the post-processing circuit, a second end of which is connected to a fifth end of the post-processing circuit, a third end of which is connected to a seventh end of the first multiplexer, and a fourth end of which is used for outputting a calculation result;

a finite state machine controller, a first terminal for selecting the input value A _i Input value A _j Input value B _i Input value B _j The second end is connected with the seventh end of each first multiplexer and the third end of each second multiplexer, and the third end is used for controlling and adjusting the bit number arrangement of each result so as to obtain a final result.

Optionally, the number of the modified multipliers is 4.

Optionally, the multiplier is used to perform a normal multiplication or a karatsuba algorithm.

Optionally, the preprocessing circuit comprises a comparator and a subtractor.

Optionally, the post-processing circuit comprises an adder-subtractor and an exclusive-or gate.

Optionally, the finite state machine controller includes 11 states executed in sequence, wherein:

in the 0 state, all inputs and control signals of the finite state machine controller are 0;

in state 1, the finite state machine controller sends 4 en=0 signals to the modified multiplier to complete the first round of parallel multiplication to obtain partial product C ₀ ；

In state 2, the finite state machine controller multiplies the partial product C ₀ Performing a first reduction operation, and sending 3 EN=0 signals and 1 EN=1 signal to the improved multiplier, thereby completing 4 parallel multiplications of a second round and obtaining a partial product C ₁ ；

In the 3 state, the finite state machine controller multiplies the partial product C ₁ Performing a second reduction operation and sending 1 EN=0 and 3 EN=1 signals to the improved multiplier to complete the third round of 4-time parallel multiplication to obtain a partial product C ₂ Sum partial product C ₃ ；

In the 4 state, the finite state machine controller multiplies the partial product C ₂ Sum partial product C ₃ Performing a third reduction operation and sending 4 EN=1 signals to the improved multiplier to complete the fourth round of 4-time parallel multiplication to obtain a partial product C ₄ ；

In the 5 state, the finite state machine controller multiplies the partial product C ₄ Performing a fourth reduction operation, and sending 4 EN=1 signals to the improved multiplier to complete the fifth round of 4-time parallel multiplication to obtain a partial product C ₅ Sum partial product C ₆ ；

In state 6, the finite state machine controller sums the partial product C ₅ Sum partial product C ₆ Performing a fifth reduction operation, and sending 4 EN=1 signals to the improved multiplier to obtain a partial product C by a sixth round of 4 parallel multiplications ₇ ；

In the 7 state, the finite state machine controller calculates the partial product C ₇ Performing a sixth reduction operation, and sending 4 en=1 signals to the modified multiplier to perform a seventh round of 4 parallel multiplications to obtain a partial product C ₈ ；

In the 8 state, the finite state machine controller multiplies the partial product C ₈ Performing a seventh reduction operation, and sending 4 EN=1 signals to the improved multiplier to perform an eighth round of 4-time parallel multiplication to obtain a partial product C ₉ Sum partial product C ₁₀ ；

In the 9 state, the finite state machine controller calculates the partial product C ₉ Sum partial product C ₁₀ Performing an eighth reduction operation, and sending 4 EN=1 signals to the improved multiplier to obtain a partial product C by 4 parallel multiplications of a ninth round ₁₁ To partial product C ₁₅ ；

In state 10, the finite state machine controller sums the partial product C ₁₁ To partial product C ₁₅ A ninth reduction operation is performed.

Optionally, the improved multiplier is configured to:

in the case of receiving a signal of en=1, a karatsuba algorithm multiplication is performed;

in the case where a signal of en=0 is received, ordinary multiplication is performed.

Optionally, the finite state machine controller is configured to:

when any state is executed, a sequence number needing modular multiplication is obtained from a preset multiplication allocation table;

acquiring the input value A from a preset 8-division karatsuba algorithm expression table according to the sequence number _i Input value A _j Input value B _i Input value B _j Is the number of (2);

selecting the input value A from the numbers to be multiplied according to the number _i Input value A _j Input value B _i Input value B _j To input a corresponding modified multiplier.

Through the technical scheme, the SM2 algorithm parallel modular multiplier provided by the invention executes parallel multiplication by adopting the improved multiplier formed by the preprocessing circuit, the multiplexer, the multiplier and the post-processing circuit, and the modular multiplication and the reduction segmentation are executed in parallel by utilizing the characteristic that the majority multiplication karatsuba algorithm and the national password administration recommended prime number P256 reduction algorithm are based on divide-and-conquer, so that the speed is improved, and the consumption of resources is reduced.

Additional features and advantages of embodiments of the invention will be set forth in the detailed description which follows.

Drawings

The accompanying drawings are included to provide a further understanding of embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain, without limitation, the embodiments of the invention. In the drawings:

FIG. 1 is a block diagram of an SM2 algorithm parallel modulo multiplier according to one embodiment of the invention;

FIG. 2 is a block diagram of an improved multiplier according to one embodiment of the present invention;

FIG. 3 is a block diagram of an improved multiplier according to one embodiment of the present invention;

FIG. 4 is a flow chart of an operation of an improved multiplier according to one embodiment of the invention;

FIG. 5 is a flow chart of a manner in which a finite state machine controller FSM selects input values according to one embodiment of the present invention; and

fig. 6 is a partially superimposed exploded view of a karatsuba algorithm according to one embodiment of the present invention.

Detailed Description

The following describes specific embodiments of the present invention in detail with reference to the drawings. It should be understood that the detailed description and specific examples, while indicating and illustrating the invention, are not intended to limit the invention.

In the embodiments of the present invention, unless otherwise indicated, terms of orientation such as "upper, lower, top, bottom" are used generally with respect to the orientation shown in the drawings or with respect to the positional relationship of the various components with respect to one another in the vertical, vertical or gravitational directions.

In addition, if there is a description of "first", "second", etc. in the embodiments of the present invention, the description of "first", "second", etc. is for descriptive purposes only and is not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present invention.

A block diagram of an SM2 algorithm parallel modulo multiplier according to an embodiment of the invention is shown in fig. 1. In fig. 1, the modulo multiplier may include at least one modified multiplier mul_muti and a finite state machine controller FSM. The specific structure of the modified multiplier mul_muti may be, for example, as shown in fig. 2. In fig. 2, the modified multiplier mul_muti may include a pre-processing circuit pre, a first multiplexer mux1, a multiplier Muti, a post-processing circuit post, and a post-processing circuit mux2.

In FIG. 2, a first end of the preprocessing circuit pre may be used to receive an input value A _i A second end can be used for receiving the input value A _j A third terminal can be used for receiving an input value B _i The fourth terminal may be used to receive an input value B _j . The first multiplexer mux1 may have a first terminal connected to the first terminal of the pre-processing circuit pre, a second terminal connected to the fifth terminal of the pre-processing circuit pre, a third terminal connected to the sixth terminal of the pre-processing circuit prre, and a fourth terminal connected to the fourth terminal of the pre-processing circuit pre. The first terminal of the multiplier muti may be connected to the fifth terminal of the first multiplexer mux1 and the second terminal may be connected to the sixth terminal of the first multiplexer mux 1. A first terminal of the post-processing circuit post may be connected to the third terminal of the multiplier muti, and a second terminal may be used for outputting the output value A _i B _j + _j B _i The third terminal is connected to the seventh terminal of the pre-processing circuit pre, and the fourth terminal is connected to the eighth terminal of the pre-processing circuit pre. The first terminal of the second multiplexer mux2 may be connected to a node between the multiplier muti and the post-processing circuit post, the second terminal may be connected to the fifth terminal of the post-processing circuit post, the third terminal may be connected to the seventh terminal of the first multiplexer mux1, and the fourth terminal may be used to output the result of the calculation. A first end of the finite state machine controller FSM may be used to select the input value a _i Input value A _j Input value B _i Input value B _j The second terminal may be connected to the seventh terminal of each first multiplexer mux1 and to the third terminal of the second multiplexer mux2, which may be used for controlling the adjustment of the bit arrangement of each result to obtain the final result.

In one embodiment of the invention, the number of modified multipliers mul_muti may also be different for different numbers of bits. In this embodiment, in the case where the number to be modulo-multiplied is 256 bits, the number of the modified multipliers mul_muti may be 4.

In one embodiment of the invention, the multiplier muti may be a multiplier known to those skilled in the art for performing a normal multiplication or a karatsuba algorithm.

In this embodiment, the specific structures of the pre-processing circuit pre and post-processing circuit post may be, for example, as shown in fig. 3 (the multiplexer is omitted from fig. 3 because it mainly serves as a connection). In fig. 3, the preprocessing circuit pre may include a comparator and a subtractor. The post-processing circuit post may comprise an adder-subtractor add/sub and an exclusive or gate+. The comparator may include a first comparator cmp0 and a second comparator cmp1. The subtracter may include a first subtracter sub0 and a second subtracter sub1. A first end of the first comparator cmp0 may be used to receive the input value A _i A second end can be used for receiving the input value A _j The third terminal may be connected to the first terminal of the exclusive or gate+. A first end of the second comparator may be used to receive the input value B _i A second end can be used for receiving the input value B _j The third terminal may be connected to the second terminal of the exclusive or gate+. A first end of the first subtractor sub0 may be arranged to receive an input value a _i A second end can be used for receiving the input value A _j The third terminal may be connected to the first terminal of the exclusive-or gate + and the fourth terminal may be connected to the first terminal of the multiplier muti. A first end of the second subtractor sub1 may be arranged to receive an input value B _i A second end can be used for receiving the input value B _j The third terminal may be connected to the second terminal of the exclusive-or gate + and the fourth terminal may be connected to the second terminal of the multiplier muti. The third terminal of the multiplier muti may be connected to a first terminal of an adder-subtractor add/sub, the second terminal of the adder-subtractor add/sub may be connected to a third terminal of an exclusive-OR gate +, and the fourth terminal of the adder-subtractor add/sub may be used to output the output value A _i B _j +A _j B _i The fifth end can be used for outputting an output value A _i B _i +A _j B _j . In fig. 3, cmpA and cmpB are judgment signals, OP is an exclusive or value of cmpA and cmpB, and temp are intermediate values of a and B, respectively.

In this embodiment, the specific operation of the modified multiplier mul_muti may be, for example, as shown in fig. 4. In fig. 4:

with input value A _i 、A _j 、B _i 、B _j In the case of en=1, the first comparator cmp0 and the second comparator cmp1 are respectively for a _i And A _j 、B _i And B _j A comparison operation is performed. In A way _i And A _j For example, at A _i >A _j In the case of (1), cmpa=1, and the intermediate value temp=a outputted from the first subtractor _i -A _j The method comprises the steps of carrying out a first treatment on the surface of the Otherwise cmpa=0, intermediate value temp=a _j -A _i 。B _i And B _j Similarly, the description is omitted.

On the other hand, the multiplier Muti performs a multiplication operation on temp and temp, i.e., temp=temp×temp. Exclusive OR gate+ performs exclusive OR calculation on tempA and tempB, thereby obtaining op, namely: op=temp=temp. The adder-subtractor performs an operation on the op and temp to obtain an output value out. Specifically, in the case where op=1, out is a _i B _i +A _j B _j And temp; in the case of op=0, out is a _i B _i +A _j B _j And temp.

In the case of en=0, then temp=a _i ，tempB＝B _i ，temp＝tempA*tempB，out＝temp。

In this embodiment, the finite state machine controller FSM may include 11 states that execute in sequence, wherein:

in the 0 state, all inputs and control signals of the finite state machine controller FSM are 0;

in state 1, the finite state machine controller FSM sends 4 en=0 signals to the modified multiplier to complete the first round of parallel multiplication to obtain partial product C ₀ ；

In state 2, finite state machine controller FSM pairs partial product C ₀ Performing the first reduction operation, and sending 3 EN=0 signals and 1 EN=1 signal to the improved multiplier to complete the second round of 4-time parallel multiplication to obtain partial product C ₁ ；

In state 3, finite state machine controller FSM pairs partial product C ₁ Execute the secondThe reduction operation is performed once, and signals of 1 EN=0 and 3 EN=1 are sent to the improved multiplier, so that 4 parallel multiplications of a third round are completed, and a partial product C is obtained ₂ Sum partial product C ₃ ；

In state 4, finite state machine controller FSM pairs partial product C ₂ Sum partial product C ₃ Performing a third reduction operation, and sending 4 EN=1 signals to the improved multiplier to complete the fourth-round 4-time parallel multiplication to obtain a partial product C ₄ ；

In state 5, finite state machine controller FSM pairs partial product C ₄ Performing a fourth reduction operation, and sending 4 EN=1 signals to the improved multiplier to complete the fifth round of 4-time parallel multiplication to obtain a partial product C ₅ Sum partial product C ₆ ；

In state 6, finite state machine controller FSM pairs partial product C ₅ Sum partial product C ₆ Performing a fifth reduction operation, and sending 4 EN=1 signals to the improved multiplier to perform a sixth round of 4 parallel multiplications to obtain a partial product C ₇ ；

In state 7, finite state machine controller FSM pairs partial product C ₇ Performing a sixth reduction operation, and sending 4 en=1 signals to the modified multiplier for a seventh round of 4 parallel multiplications to obtain a partial product C ₈ ；

In state 8, finite state machine controller FSM pairs partial product C ₈ Performing a seventh reduction operation, and sending 4 EN=1 signals to the improved multiplier for the eighth round of 4-time parallel multiplication to obtain a partial product C ₉ Sum partial product C ₁₀ ；

In state 9, finite state machine controller FSM pairs partial product C ₉ Sum partial product C ₁₀ Performing the eighth reduction operation, and sending 4 EN=1 signals to the improved multiplier to perform the ninth round of 4-time parallel multiplication to obtain a partial product C ₁₁ To partial product C ₁₅ ；

In state 10, finite state machine controller FSM pairs partial product C ₁₁ To partial product C ₁₅ A ninth reduction operation is performed.

In this embodiment, in the case of receiving a signal of en=1, karatsuba algorithm multiplication is performed; in the case where a signal of en=0 is received, ordinary multiplication is performed.

In addition, since the input values of each parallel multiplication and subtraction operation are different, the manner in which the FSM selects the input values for the finite state machine controller may be in a variety of forms known to those skilled in the art. In a preferred example of the present invention, taking the example of a large number of 256 bits for both the number a and the number B, this selection may be, for example, the steps shown in fig. 5. Specifically, in fig. 5, the selection manner may include:

in step S10, when any one of the states is executed, the sequence number to be multiplied is acquired from the preset multiplication allocation table. In particular, the preset multiplication allocation table may be, for example, as shown in table 1,

table 1 multiplication allocation table

In step S11, an input value A is obtained from a preset 8-minute karatsuba algorithm expression table according to the selected sequence number _i Input value A _j Input value B _i Input value B _j Is a number of (3). Specifically, the 8-component karatsuba algorithm representation may be, for example, as shown in Table 2,

table 28-Table of the partial karatsuba Algorithm

In step S12, an input value a is selected from the numbers to be modulo multiplied according to the number _i Input value A _j Input value B _i Input value B _j To input a corresponding modified multiplier.

In this embodiment, in the case where the large integers a and B are 256 bits, the partial product stack exploded view of the above-described karatsuba algorithm is shown in fig. 6. C (C) ₁₅ To C ₀ The calculation formula of (1) is as formula (1),

C ₀ ＝S ₀ _l，C _n ＝S _n _l+S _(n-1) _h+Cin _(n-1) (n∈[1,15])， (1)

as can be seen from table 2, 36 multiplication calculations are required to complete the calculations for the large numbers a and B. In the case where 4 multipliers are used, a total of 9 times need to be performed, and specific multiplication allocation modes are shown in tables 1 and 2.

P ₂₅₆ The prime numbers recommended by the national institutes of cryptography for use in applying the SM2 algorithm, the specific values are as follows, and can also be written in the form of formula (2).

p ₂₅₆ ＝fffffffe ffffffff ffffffff ffffffff fffffff f00000000 ffffffff ffffffff

p ₂₅₆ ＝2 ²⁵⁶ -2 ²²⁴ -2 ⁹⁶ +2 ⁶⁴ -1 (2)，

Taking two large integers a and B of 256 as an example, to perform a modular multiplication operation, the following operations may be performed:

1. due to 2 ²⁵⁶ modp ₂₅₆ ≡2 ²²⁴ +2 ⁹⁶ -2 ⁶⁴ +1(modp ₂₅₆ ) Thus, there are:

2 ²⁸⁸ modp ₂₅₆ ≡2 ²⁵⁶ *2 ³² (modp ₂₅₆ )≡2 ²²⁴ +2 ¹²⁸ -2 ⁶⁴ +2 ³² +1(modp ₂₅₆ )

2 ³²⁰ modp ₂₅₆ ≡2 ²⁸⁸ *2 ³² (modp ₂₅₆ )≡2 ²²⁴ +2 ¹⁶⁰ +2 ³² +1(modp ₂₅₆ )

2 ³⁵² modp ₂₅₆ ≡2 ³²⁰ *2 ³² (modp ₂₅₆ )≡2 ²²⁴ +2 ¹⁹² +2 ⁹⁶ +2 ³² +1(modp ₂₅₆ )

2 ³⁸⁴ modp ₂₅₆ ≡2 ³⁵² *2 ³² (modp ₂₅₆ )≡2*2 ²²⁴ +2 ¹²⁸ +2 ⁹⁶ +2 ³² +1(modp ₂₅₆ )

2 ⁴¹⁶ modp ₂₅₆ ≡2 ³⁸⁴ *2 ³² (modp ₂₅₆ )≡2*2 ²²⁴ +2 ¹⁶⁰ +2 ¹²⁸ +2*2 ⁹⁶ -2 ⁶⁴ +2 ³² +2(modp ₂₅₆ )

2 ⁴⁴⁸ modp ₂₅₆ ≡2 ⁴¹⁶ *2 ³² (modp ₂₅₆ )≡2*2 ²²⁴ +2 ¹⁹² +2 ¹⁶⁰ +2*2 ¹²⁸ +2 ⁹⁶ -2 ⁶⁴ +2*2 ³² +2(modp ₂₅₆ )

2 ⁴⁸⁰ modp ₂₅₆ ≡2 ⁴⁴⁸ *2 ³² (modp ₂₅₆ )≡3*2 ²²⁴ +2 ¹⁹² +2*2 ¹⁶⁰ +2 ¹²⁸ +2 ⁹⁶ +2*2 ³² +2(modp ₂₅₆ )

modulo p of each power of 2 ₂₅₆ The result of (2) is taken into formula (2), and is obtained:

C ₈ *2 ²⁵⁶ (modp ₂₅₆ )≡C ₈ *2 ²²⁴ (modp ₂₅₆ )+C ₈ (2 ⁹⁶ -2 ⁶⁴ +1)

C ₉ *2 ²⁸⁸ (modp ₂₅₆ )≡C ₉ *2 ²²⁴ (modp ₂₅₆ )+C ₉ (2 ¹²⁸ -2 ⁶⁴ +2 ³² +1)

C ₁₀ *2 ³²⁰ (modp ₂₅₆ )≡C ₁₀ *2 ²²⁴ (modp ₂₅₆ )+C ₁₀ (2 ¹⁶⁰ +2 ³² +1)

C ₁₁ *2 ³⁵² (modp ₂₅₆ )≡C ₁₁ *2 ²²⁴ (modp ₂₅₆ )+C ₁₁ (2 ¹⁹² +2 ⁹⁶ +2 ³² +1)

C ₁₂ *2 ³⁸⁴ (modp ₂₅₆ )≡2*C ₁₂ *2 ²²⁴ (modp ₂₅₆ )+C ₁₂ (2 ¹²⁸ +2 ⁹⁶ +2 ³² +1)

C ₁₃ *2 ⁴¹⁶ (modp ₂₅₆ )≡2*C ₁₃ *2 ²²⁴ (modp ₂₅₆ )+C ₁₃ (2 ¹⁶⁰ +2 ¹²⁸ +2*2 ⁹⁶ -2 ⁶⁴ +2 ³² +2)

C ₁₄ *2 ⁴⁴⁸ (modp ₂₅₆ )≡2*C ₁₄ *2 ²²⁴ (modp ₂₅₆ )+C ₁₄ (2 ¹⁹² +2 ¹⁶⁰ +2*2 ¹²⁸ +2 ⁹⁶ -2 ⁶⁴ +2*2 ³² +2)

C ₁₅ *2 ⁴⁸⁰ (modp ₂₅₆ )≡3*C ₁₅ *2 ²²⁴ (modp ₂₅₆ )+C ₁₅ (2 ¹⁹² +2*2 ¹⁶⁰ +2 ¹²⁸ +2 ⁹⁶ +2*2 ³² +2)

thus, the result of the multiplication of large integers A and B, C Mod p ₂₅₆ Can be expressed as formula (3) or formula (4),

Cmodp ₂₅₆ ≡3*C ₁₅ *2 ²²⁴ (modp ₂₅₆ )+2*C ₁₄ *2 ²²⁴ (modp ₂₅₆ )+2*C ₁₃ *2 ²²⁴ (modp ₂₅₆ )

+2*C ₁₂ *2 ²²⁴ (modp ₂₅₆ )+C ₁₁ *2 ²²⁴ (modp ₂₅₆ )+C ₁₀ *2 ²²⁴ (modp ₂₅₆ )

+C ₉ *2 ²²⁴ (modp ₂₅₆ )+C ₈ *2 ²²⁴ (modp ₂₅₆ )+C ₇ *2 ²²⁴ (modp ₂₅₆ )

+2 ¹⁹² (C ₆ +C ₁₁ +C ₁₄ +C ₁₅ )+2 ¹⁶⁰ (C ₅ +C ₁₀ +C ₁₃ +C ₁₄ +2*C ₁₅ )

+2 ¹²⁸ (C ₄ +C ₉ +C ₁₂ +C ₁₃ +2·C ₁₄ +C ₁₅ )+2 ⁹⁶ (C ₃ +C ₈ +C ₁₁ +C ₁₂ +2*C ₁₃ +C ₁₄ +C ₁₅ )

-2 ⁶⁴ (C ₈ +C ₉ +C ₁₃ +C ₁₄ )+2 ⁶⁴ C ₂ +2 ³² (C ₁ +C ₉ +C ₁₀ +C ₁₁ +C ₁₂ +C ₁₃ +2*C ₁₄ +2*C ₁₅ )

+C ₀ +C ₈ +C ₉ +C ₁₀ +C ₁₁ +C ₁₂ +2(C ₁₃ +C ₁₄ +C ₁₅ )(modp ₂₅₆ )

，(3)

in addition, the reduction operation may be assigned in the manner shown in Table 3,

TABLE 3 Table 3

The operations in the rows S8, S10, S19, and S20 are multiplied by 2, and the operations in the rows S13 to S16 are subtracted. The reduction and parallel multiplication may be performed in parallel, the specific manner of parallel operation may be as shown in table 4,

TABLE 4 Table 4

Namely:

Result＝S1+S2+S3+S4+S5+S6+S7+S9+S11+S12+S17+S18

+2(S13+S14+S15+S16)-S8-S10-S19-S20(modp ₂₅₆ )， (4)

through the technical scheme, the SM2 algorithm parallel modular multiplier provided by the invention performs parallel multiplication by adopting an improved multiplier consisting of a preprocessing circuit, a multiplexer, a multiplier and a post-processing circuit, and recommends prime numbers P by using a large number multiplication karatsuba algorithm and a national password administration ₂₅₆ The reduction algorithm is based on the characteristic of divide and conquer, the multiplication of modular multiplication and the reduction are executed in parallel, the speed is improved, and the consumption of resources is reduced.

The optional embodiments of the present invention have been described in detail above with reference to the accompanying drawings, but the embodiments of the present invention are not limited to the specific details of the foregoing embodiments, and various simple modifications may be made to the technical solutions of the embodiments of the present invention within the scope of the technical concept of the embodiments of the present invention, and all the simple modifications belong to the protection scope of the embodiments of the present invention.

In addition, the specific features described in the above embodiments may be combined in any suitable manner without contradiction. In order to avoid unnecessary repetition, the various possible combinations of embodiments of the invention are not described in detail.

Those skilled in the art will appreciate that all or part of the steps in implementing the methods of the embodiments described above may be implemented by a program stored in a storage medium, including instructions for causing a single-chip microcomputer, chip or the like or processor (processor) to perform all or part of the steps of the methods of the embodiments described herein. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In addition, any combination of the various embodiments of the present invention may be made between the various embodiments, and should also be regarded as disclosed in the embodiments of the present invention as long as it does not deviate from the idea of the embodiments of the present invention.

Claims

1. An SM2 algorithm parallel modulo multiplier, comprising:

at least one modified multiplier, the modified multiplier comprising:

2. The modulo multiplier of claim 1, wherein the number of modified multipliers is 4.

3. A multiplier as claimed in claim 2, characterized in that the multiplier is adapted to perform a normal multiplication or a karatsuba algorithm.

4. The modular multiplier of claim 1, wherein the preprocessing circuit comprises a comparator and a subtractor.

5. The modulo multiplier of claim 1, wherein the post-processing circuit comprises an adder-subtractor and an exclusive-or gate.

6. A modular multiplier as claimed in claim 3, in which the finite state machine controller comprises 11 states which execute in sequence, wherein:

In the 7 state, the finite state machine controller calculates the partial product C ₇ A sixth reduction operation is performed and 4 en=1 messages are sent to the modified multiplierThe number is multiplied by 4 times of parallel of a seventh round to obtain a partial product C ₈ ；

7. The modular multiplier of claim 6, wherein the modified multiplier is configured to:

8. The modular multiplier of claim 6, wherein the finite state machine controller is configured to: